Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop all migrations and roll back everything on failure #22616

Closed
Tracked by #19587
roji opened this issue Sep 19, 2020 · 34 comments · Fixed by #34206
Closed
Tracked by #19587

Stop all migrations and roll back everything on failure #22616

roji opened this issue Sep 19, 2020 · 34 comments · Fixed by #34206
Labels
area-aspire area-migrations closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. punted-for-6.0 punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-enhancement
Milestone

Comments

@roji
Copy link
Member

roji commented Sep 19, 2020

When looking at #22613, I realized something... In general, if an earlier migration fails, later migrations will still be attempted and may succeed, leaving the database in a potentially inconsistent and confusing state. I understand that the idea is to allow the idempotent migration to be re-executed later and "fill in" the missing migrations, but if there's any dependency between one migration and another, this doesn't work (think about data migrations where a later migration implicitly depends on an earlier one).

Our new 5.0 behavior isn't a regression (or even worse) compared to the previous behavior. However, if we wrapped all migrations in a single transaction (and set XACT_ABORT to ON in SQL Server), we should get the safer behavior above, while at the same time addressing the original ask in #7681. The problem here may be with transaction suppression - but isn't that always problematic (and also SQL Server-specific).

/cc @bricelam

@ajcvickers
Copy link
Member

ajcvickers commented Sep 21, 2020

Consider as part of #19587

@bricelam
Copy link
Contributor

Triage: Wrap all the migrations inside a single transaction (instead of individual transactions per migration)

@bricelam bricelam modified the milestones: Backlog, 6.0.0 Sep 21, 2020
@bricelam bricelam self-assigned this Sep 21, 2020
@ChristopherHaws
Copy link
Contributor

@bricelam From what I can tell, there are only two types of migrations operations that are always TransactionSuppressed:

  • SqlServerCreateDatabaseOperation
  • SqlServerDropDatabaseOperation

The rest of the operations that are sometimes TransactionSuppressed are when the resource is memory optimized:

  • AddColumnOperation
  • AddForeignKeyOperation
  • AddPrimaryKeyOperation
  • AlterColumnOperation
  • RenameIndexOperation
  • CreateTableOperation
  • DropTableOperation
  • CreateIndexOperation
  • DropPrimaryKeyOperation
  • AlterDatabaseOperation
  • AlterTableOperation
  • DropForeignKeyOperation
  • DropIndexOperation
  • DropColumnOperation

Could this be a way of achieving this?

  • The "up" migration specifically places the SqlServerCreateDatabaseOperation before the migration transaction
  • The "down" migration specifically places the SqlServerDropDatabaseOperation after the migration transaction
  • If --no-transaction is not specified and any of the migrations have TransactionSuppressed = true then return an error stating that "Migrations in a transaction are only supported when not using memory optimized tables"

@bricelam
Copy link
Contributor

bricelam commented Sep 23, 2020

Don't worry about SqlServerCreateDatabaseOperation and DropDatabase. The fact that they're migration operations is merely an implementation detail. No migration will contain them.

TransactionSuppressed is orthogonal to --no-transaction. TransactionSuppressed means it will commit the current transaction, execute the operation, and begin a new one. --no-transactions means the script will not include those COMMIT and BEGIN TRANSACTION statements.

For this issue, we just need to lift the transaction management logic inside MigrationCommandExecutor into Migrator--similar to how GenerateScript works. Then update both to stop committing and beginning new transactions between migrations (but continue doing it for TransactionSuppressed).

@ChristopherHaws
Copy link
Contributor

ChristopherHaws commented Sep 23, 2020

@bricelam Ok, that mostly makes sense to me. I wasn't aware of the create and drop db commands being outside the migrations, so that's good to know.

If we continue generating the begin tran and commit's for the TransactionSuppressed commands, won't it sort of defeat the purpose of the whole migration executing inside of a single transaction since now there would be committed schema/data? I'm not sure of any other way around this, but maybe a warning would be useful.

Also, should the creation of the migration history table be inside or outside of the transaction? I don't see any reason it can't be inside.

@bricelam
Copy link
Contributor

bricelam commented Sep 23, 2020

I like the idea of warning about TransactionSuppressed. I'm just not sure where we can do it that isn't already too late (i.e. during Update-Database). And the script plainly shows where the transaction will be committed, obviating the need for a warning. But maybe extra warnings about data loss and suppressed transactions would actually help; not sure.

Should the creation of the migration history table be inside or outside of the transaction?

I'd keep it outside. It's more of a prerequisite than part of the migration.

@Zero3
Copy link

Zero3 commented Sep 27, 2020

Triage: Wrap all the migrations inside a single transaction (instead of individual transactions per migration)

Just a thought: When a migration fails to apply, rolling back the already successfully applied ones is not necessarily desirable.

It is not unusual for a migration to take quite a while to execute on a large database (think 10+ GB), and it is not unusual to apply a batch of migrations all at once, for example during a scheduled release to production. If a trivial mistake in the last migration makes it fail, it would be quite annoying to lose the many minutes/hours of time spent applying the migrations that succeeded. In such cases, it would be much nicer to be able to simply fix the last migration and re-apply that one only, just as you could in EF6.

@roji
Copy link
Member Author

roji commented Sep 27, 2020

@Zero3 that's definitely a valid argument... The problem is that if you leave the database with that last migration un-applied, that could mean that your program can't start at all, because the schema is in some intermediate unsupported state. At that point users have to figure out what happened, what state they're in, and how to fix it - all under the pressures of downtime; leaving a production database in such a state seems very problematic, even more so than the (possible) loss of time.

Note that it's always possible to edit a migration script and manage transaction manually - which one can definitely do around a migration that's expected to run for very long.

@Zero3
Copy link

Zero3 commented Sep 27, 2020

@roji

The problem is that if you leave the database with that last migration un-applied, that could mean that your program can't start at all, because the schema is in some intermediate unsupported state.

Indeed!

At that point users have to figure out what happened, what state they're in, and how to fix it - all under the pressures of downtime

Given the use case I presented, this can be preferred over losing hours of migration time. And given the long migration time, the downtime aspect should already be handled already anyway.

leaving a production database in such a state seems very problematic, even more so than the (possible) loss of time.

I think that depends on the use case. In the one I presented, I would definitely prefer EF not rolling back the successfully applied migrations.

Note that it's always possible to edit a migration script and manage transaction manually - which one can definitely do around a migration that's expected to run for very long.

I was actually thinking of how it will be done when using context.Database.Migrate();.

Either way, I don't mind whatever the default strategy is, as long as it is possible to make EF not roll back the successfully applied ones :).

@ChristopherHaws
Copy link
Contributor

ChristopherHaws commented Sep 28, 2020

Ultimately it would probably be nice to have hooks before and after the entire migration process and before and after each individual migration. This would allow people to customize the the generation of migrations to fit their needs. This is how I handled adding the pre and post migration scripts in my open PR: https://github.com/dotnet/efcore/pull/22654/files#diff-a98268716bc62993b273028b6b462e39

Perhaps I could modify my PR to have the following (currently the Pre and Post commands in my PR run before and after the entire migration, not each):

  • GenerateInitializeMigrationCommands: Run once before all the migrations
  • GeneratePreMigrationCommands: Runs before each migration
  • GeneratePostMigrationCommands: Runs after each migration
  • GenerateFinalizeMigrationCommands: Runs once after all the migrations

I do believe that the default behavior should be wrapping the entire script in a transaction and anyone who wants to change that behavior could do so via the IMigrationsSqlGenerator interface. This seems like it would be an advanced use case that the majority of users would never encounter.

@roji
Copy link
Member Author

roji commented Sep 28, 2020

@Zero3 and others, for transaction-per-migration there's also another problem: the problematic migration may fail and leave all previous migrations intact, but subsequent migrations will still get executed as well. In effect, what you get is a database with all migrations applied except one in the middle. There may ways to avoid this later migration execution effect in some database, but it's definitely going to be tricky cross-database.

If the database is left in this state and migrations aren't idempotent, running them again will immediately fail. Even for idempotent migrations, there are still possible dependencies between migrations that can wreak havoc - idempotency means you can safely execute the migration script twice, but not necessarily if the first run had a "hole" in the middle.

So while I do see the case for per-migration transactions, the complications seem very... complicated. I tend to agree with @ChristopherHaws that we should expose hooks to allow users to customize their migration process, and allow them to do transaction-per-migration that way instead of attempting to provide it as a built-in option.

But I think all this needs more thought and design in 6.0.

@ChristopherHaws
Copy link
Contributor

ChristopherHaws commented Sep 28, 2020

@roji

for transaction-per-migration there's also another problem: the problematic migration may fail and leave all previous migrations intact, but subsequent migrations will still get executed as well

This could be avoided with the same measure we are taking here: #22613 (comment)
All it would require is removing this if statement:

This would allow the script to fail and not move on if there is an error in any batch regardless of if transactions will be used or not.

@roji
Copy link
Member Author

roji commented Sep 28, 2020

@ChristopherHaws that's purely SQL Server-specific, isn't it? We have to think about other databases as well...

@roji
Copy link
Member Author

roji commented Sep 28, 2020

Unless I'm mistaken, even within SQL Server this also depends on sqlcmd mode. That may be fine - I don't know, I'm trying to still think of migration scripts as something that doesn't depend on the specific tooling which executes them. But that may not be viable.

@ChristopherHaws
Copy link
Contributor

@roji Agreed. My understanding was that the other DB's that are supported (PostgreSQL, SQLite, etc) already have the behavior of "stop executing on first failure" and that it was only SQL server that behaves in this way.

@Zero3
Copy link

Zero3 commented Oct 4, 2020

@roji

for transaction-per-migration there's also another problem: the problematic migration may fail and leave all previous migrations intact, but subsequent migrations will still get executed as well.

I don't understand why EF would want to do this, instead of stopping after the first failed migration (like EF6 did). I don't think EF should consider skipping a failed migration a valid thing to do.

Maybe some of these issues arose from the introduction of migration scripts? I'm only thinking in the context of context.Database.Migrate(), of which I am a happy user and appreciated the way EF6 worked (one transaction per migration, and error instead of skipping failed migrations).

@roji
Copy link
Member Author

roji commented Oct 5, 2020

I don't understand why EF would want to do this, instead of stopping after the first failed migration (like EF6 did). I don't think EF should consider skipping a failed migration a valid thing to do.

I don't think anyone wants this to be the behavior - but it's a possible consequence of doing transaction-per-migration (as opposed to transaction-for-all-migrations).

@ajcvickers
Copy link
Member

@roji But only when scripting and only when the script keeps running inappropriately. We don't have this behavior for Update-Database, for example. It will stop[ after the first failed migration.

@roji
Copy link
Member Author

roji commented Oct 5, 2020

Yeah, this is indeed an SQL script problem (as opposed to when applying migrations programmatic or via CLI). That could warrant different transactional strategies based on the mechanism we use.

@bricelam bricelam removed their assignment Jul 8, 2023
@AndriySvyryd AndriySvyryd self-assigned this Jun 20, 2024
@AndriySvyryd AndriySvyryd modified the milestones: Backlog, 9.0.0 Jun 20, 2024
@AndriySvyryd AndriySvyryd added closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. and removed needs-design area-aspire labels Jul 10, 2024
@AndriySvyryd AndriySvyryd removed their assignment Jul 10, 2024
AndriySvyryd added a commit that referenced this issue Jul 11, 2024
Use a single transaction for all migrations in the script

Fixes #17578
Fixes #22616
AndriySvyryd added a commit that referenced this issue Jul 12, 2024
Use a single transaction for all migrations in the script

Fixes #17578
Fixes #22616
AndriySvyryd added a commit that referenced this issue Jul 16, 2024
Use a single transaction for all migrations in the script

Fixes #17578
Fixes #22616
AndriySvyryd added a commit that referenced this issue Jul 26, 2024
Use a single transaction for all migrations in the script

Fixes #17578
Fixes #22616
@ajcvickers ajcvickers modified the milestones: 9.0.0, 9.0.0-rc1 Aug 21, 2024
@AlaRaies
Copy link

AlaRaies commented Sep 2, 2024

in the official documentation we have the following :

...The transaction handling and continue-on-error behavior of these tools are inconsistent and sometimes unexpected. This can leave your database in an undefined state if a failure occurs when applying migrations...

I'am confused about strategy used by the bundle while applying migrations. from what i understand the expected behaviour : the bundle will attemp to apply migrations ( a set of them), if it fails it wont continue and revert what is applied. Is this correct ?

@roji
Copy link
Member Author

roji commented Sep 2, 2024

@AlaRaies the sentence you quote above are about applying migrations via SQL scripts, e.g. using sqlcmd for SQL Server. Migration bundles are meant precisely to solve these shortcomings.

Starting with 9.0, migration bundles will execute all migrations inside a single transaction, so yes, if any error occurs at some point everything will be rolled back. The only exception is a very limited set of migration operations which cannot be executed inside a transaction (e.g. anything related to a SQL Server in-memory table).

@bachratyg
Copy link

bachratyg commented Sep 2, 2024

Starting with 9.0, migration bundles will execute all migrations inside a single transaction

A welcome change. But still won't work for Oracle. Sigh.

@roji
Copy link
Member Author

roji commented Sep 2, 2024

@bachratyg can you provide a bit more info? Does Oracle not support running DDL inside transactions?

@bachratyg
Copy link

It does not fail per se when DDL is inside tranasctions, it just won't be rolled back. See https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/COMMIT.html

Oracle Database issues an implicit COMMIT under the following circumstances:

  • Before any syntactically valid data definition language (DDL) statement, even if the statement results in an error
  • After any data definition language (DDL) statement that completes without an error

No reasonable way to roll back DDL apart from maybe some juggling with pdb snapshots (somewhat akin to mssql's database snapshots) or doing a full database backup/restore. AFAIK MySQL has the same limitation.

@roji
Copy link
Member Author

roji commented Sep 3, 2024

@bachratyg thanks, I didn't know that... I do know that rolling back DDL works in PostgreSQL and SQL Server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-aspire area-migrations closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. punted-for-6.0 punted-for-7.0 Originally planned for the EF Core 7.0 (EF7) release, but moved out due to resource constraints. type-enhancement
Projects
None yet