Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling stricter workflows for destructive migrations #27897

Closed
dazinator opened this issue Apr 27, 2022 · 7 comments
Closed

Enabling stricter workflows for destructive migrations #27897

dazinator opened this issue Apr 27, 2022 · 7 comments
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported

Comments

@dazinator
Copy link

dazinator commented Apr 27, 2022

It's very easy today for a developer working on a feature to generate a "destructive" migration which can go unnoticed or undiscussed through to deployment to various environments.

In some situations it makes sense for teams to have more formal workflows (or tooling) around recognition of Destructive vs non destructive migrations.

One reason is that if a deployment is pushed to a given environment and it doesn't contain any destructive database migrations, then that could be a factor in choosing a deployment strategy where not all of the replicas of the service need be brought down whilst the database migrations are applied - catering towards part of a zero downtime deployment strategy. Where as if the release does contain destructive changes, then it may be more prudent to do additional auditing checks before selecting the approach. If the team is following "expand/contract" approach then it may still be safe to do a zero downtime deployment assuming the destructive changes are a planned "contraction" of the schema for objects no longer being utilised.

The "expand/contract" pattern has been documented as a formal pattern for rolling out significant schema changes to the database in a series of steps allowing services to transition onto an adjusted schema whilst still supporting the old schema in parallel for some period of time, before a release with a "contraction" of the database schema is made finally to deprecate the old schema once no parts of the system are using it any longer.

For teams that wish to enable these types of worklows - like expand / contract, it would be helpful to for Destructive migrations to be flagged as soon as they enter the code base as an asset.

The following is taken from another issue which talks about the mechanics of how this might work in more depth:

@roji yes if we could make dotnet ef migration add error by default for destructive migrations that would be safer for us. With a way to override when needed, for example suppose we pass in an additional command line arg, something like --allow-destructive. Finally if the destructive migrations themselves were generated with an informational attribute added to the top of the migration class, they could be more easily seen and discussed during code / peer review phase

[Destructive()]
public partial class SomeMigration
{

}

A Destructive attribute (or similar) on the migration could also allow for deployment time checks. If you can detect that the migrations you are about to apply to a database are marked as destructive then you might want to do things differently than when there are only non destructive migrations to apply. For example, when there are destructive changes you might want to:

  • Take all application instances down whilst you upgrade the database and then all the application instances.
  • Generate the migration sql scripts at the start of the upgrade to capture against the deployment for audit or approval purposes.

If there are no destructive migrations however then you might do a zero downtime deployment and care less about the above.

dotnet ef database upgrade -connection [conn string] --has-destructive
True

As a final safety net, dotnet ef database upgrade could fail by default if migrations to be applied were destructive unless --allow-destructive was specified. This way someone can't apply destructive changes to the database without being explicit in this intent.

dotnet ef database upgrade -connection [conn string] --allow-destructive

Originally posted by @dazinator in #19587 (comment)

@dazinator
Copy link
Author

And this comment of mine probably most explains motives behind this feature suggestion:

Just to be clear, my proposition with the [Destructive] stuff was for EF to support a workflow that prevents teams from accidentally including Destructive migrations in a release. The value is that by deterring this it allows deployment flows where services need not be brought down whilst the database is being upgraded. It allows support of workflows where releases containing destructive migrations can be done, but behind a safety net that lends itself to better team communication and planning around it, as it can no longer be done "accidentally".

@dazinator
Copy link
Author

dazinator commented Apr 27, 2022

Also I'm aware that it isn't advisable for EF to apply migrations at all, but better for EF to generate SQL scripts which are reviewed and audited before being applied. I beleive even in this scenario it should be possible for SQL scripts that are generated by EF for Destructive migrations to carry some similar metadata indicating they contain Destructive changes. Whether this be metadata supported by the underlying database, or resorting to a comment in the Sql script that could be potentially parsed by tooling similar to the proposed [Destructive] attribute on the c# migration class. A human auditing these scripts would not rely on this ofcourse. However for teams trusting to automation, this metadata could be leveraged in supporting workflows.

@roji
Copy link
Member

roji commented Apr 27, 2022

I beleive even in this scenario it should be possible for SQL scripts that are generated by EF for Destructive migrations to carry some similar metadata indicating they contain Destructive changes. Whether this be metadata supported by the underlying database, or resorting to a comment in the Sql script that could be potentially parsed by tooling similar to the proposed [Destructive] attribute on the c# migration class.

I'm not aware of any feature like this in either the database or in DB SQL tools. At most, the EF Core tools to generate the script (i.e. dotnet ef migrations script) could refuse to do so if the resulting SQL would contain a potentially destructive migration.

Stepping back, I'm still unclear on the actual value this would bring, and how an overall workflow would look like here. Anyone doing zero-downtime migrations is already carefully crafting and applying their migrations; e.g. instead of renaming a column, they add a new column in one migration, and some time later they add a second migration to remove the old one. Sure, I guess this feature could prevent accidental application of the second migration prematurely; but at that point all migrations are blocked, since no migrations after the potentially destructive one can be applied. And again, as above anyone with a critical-enough system to do zero-downtime migrations is already reviewing their migrations carefully and not blindly updating.

@dazinator
Copy link
Author

dazinator commented Apr 27, 2022

Stepping back, I'm still unclear on the actual value this would bring, and how an overall workflow would look like here.

Some ideas of how we'd plan to utilise this feature:

  • Developers would no longer be able to generate destructive migrations when using the EF commands without the specific flag. This helps protect them from accidentally introducing a destructive change if they use the default commands as part of their daily workflow. Yes it should be obvious to most developers that they are causing a destructive migration, but consider less experienced developers or even just human mistakes where changes to the EF model happen throughout the course of the day, then the EF command to generate the migration is run by the developer some time later, a lot may have happened in between that time, and the developer being human may have lost track that there is destructive change in the migration. In any case this feature would help surface this event.
  • Easier during peer review to pick up on [Destructive] migrations by scanning for the attribute, as opposed to checking build logs for warnings, or looking through potentially swathes of migration code trying to work out which things might be destructive changes.
  • Easier for planning purposes to determine if the next release contains destructive database changes. If there are no new migrations with this attribute then the answer is no. If there are the answer is yes. We could easily plug in a tool into our pipeline to determine this and set this variable for example, on a release.
  • With the above variable established, when it comes to deployment pipeline, we may opt to deploy the assets in a different way. For example, if there are no destructive changes we may one deployment script A, if there are, we may run manual intervention step B for someone to manually approve after verifying the destructive changes have been planned.

These are just some ideas. This would mostly be possible with just the additional output of the attribute, however the CLI changes to prevent generating or applying destructive changes without explicit opt-in would provide a safety net, such that the default mode of operation for developers (and deployments) wishing to work "safely" within this model, they would not use the opt in flag in the commands for destructive changes. Generating, or applying destructive changes through EF would require cognizance.

Anyone doing zero-downtime migrations is already carefully crafting and applying their migrations; e.g. instead of renaming a column, they add a new column in one migration, and some time later they add a second migration to remove the old one.

The keyword for me here is carefully. This process relies on humans to:-

  1. Be aware of the destructive change in the migration (possible to miss it, due to inexperience or human error)
  2. Plan to use expand / contract, by planning to release the expansion part first, and contraction part later.

I am thinking there is room here for tooling to assist these processes. I am thinking if EF has the ability / smarts to know if a migration it is generating can be considered "destructive" its in a prime position to help ensure the above 2 points are raised, in the mind of the developer (via the CLI commands) introducing a destructive change, and in teams that practice peer review where destructive changes could be more easily visible (via [Desctructive] attribute), ensuring less likely that they are released unplanned, or undiscussed.

And again, as above anyone with a critical-enough system to do zero-downtime migrations is already reviewing their migrations carefully and not blindly updating.

Not necessarily true. Some teams rely on tests in a dev and staging environment to gain confidence in a release, perhaps restoring staging from prod data. If the release works there and passes all tests this is good enough, it does not necessarily need a human to review all database changes prior to a prod deployment - it just needs all tests to pass. I think that the exact process here would very much depend on the organisation and the product involved.

@roji
Copy link
Member

roji commented Apr 29, 2022

Possible duplicate of #8932 (or very related).

@ajcvickers
Copy link
Member

We discussed this in triage, and this is not something we plan to implement.

@ajcvickers ajcvickers added customer-reported closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. labels May 2, 2022
@dazinator
Copy link
Author

Ok - thanks for considering

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported
Projects
None yet
Development

No branches or pull requests

3 participants