Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.Data.Sqlite: Support deferred transactions #20786

Closed
tzehetner opened this issue Apr 30, 2020 · 20 comments · Fixed by #21212
Closed

Microsoft.Data.Sqlite: Support deferred transactions #20786

tzehetner opened this issue Apr 30, 2020 · 20 comments · Fixed by #21212
Assignees
Labels
area-adonet-sqlite closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported good first issue This issue should be relatively straightforward to fix. type-enhancement
Milestone

Comments

@tzehetner
Copy link

tzehetner commented Apr 30, 2020

What problem are you trying to solve?

I would like to execute multiple read transactions in parallel, while each transaction sees a consistent snapshot of the data (no read uncommited) and the transactions do not block each other. This would be a huge improvement and could be implemented by changing one line of code in SqliteTransaction.cs.

Describe the solution you'd like

At the moment SqliteConnection.BeginTransaction() supports isolation levels “ReadUncommitted” and “Serializable”.

  • Serializable provides a consistent data snapshot but blocks other serializable transactions. One reader would have to wait until the other reader has finished his transaction, which is quite bad for performance, if you have multiple long running transactions.

  • ReadUncommited requires SqliteCacheMode.Shared and suffers from dirty reads, non repeatable reads and phantoms.

Solution: Allow IsolationLevel.Snapshot by changing line 43 in SqliteTransaction.cs, so that it doesn’t throw an exception. The remaining code will then automatically start a deferred transaction, which starts a SQLite read transaction on the first select statement.

SQLite read transactions provide a consistent data snapshot without suffering from dirty reads, non repeatable reads and phantoms and without blocking other read transactions (see https://www.sqlite.org/lang_transaction.html).

Additonal information

  • In SQL Server the isolation level “Snapshot” specifies that data read by any statement in a transaction will be the transactionally consistent version of the data that existed at the start of the transaction (see https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-ver15).

    The proposed implementation is different in that data read will be the transactionally consistent version of the data that existed at the first select statement – not at the start of the transactions. I tried to execute a “select 1;” statement after the “begin;” statement, but SQLite still started the underlying read transaction only at the first “real” select statement, that accessed a table in the database.

  • The System.Data.SQLite provider requires you to specify IsolationLevel.ReadCommitted instead of IsolationLevel.Snapshot to start a read transaction, but I think that Snapshot is more correct, because SQLite read transactions do not suffer from non repeatable reads and phantoms as ReadCommited does (see isolation level table on this page: https://docs.microsoft.com/en-us/sql/connect/jdbc/understanding-isolation-levels?view=sql-server-ver15).

@ajcvickers
Copy link
Member

/cc @bricelam

@bricelam
Copy link
Contributor

bricelam commented May 1, 2020

SQLite doesn't support snapshot isolation. It ensures isolation via locking.

For serialized, as soon as a write operation is performed, all other SQL statements must wait for the write operation's transaction to complete (they actually fail with SQLITE_BUSY and the app needs to retry them).

For read uncommitted, additional reads are allowed, and yes, they will be affected by the write operation (dirty reads, non-repeatable reads, and phantoms). But again, other read operations are blocked.

Snapshot isolation works by taking a snapshot of the data when the transaction begins so that read operations are completely isolated from write operations. No locking is required. But SQLite simply doesn't implement this behavior.

Deferred transactions aren't exactly related to isolation levels. It means that the locking is deferred until the first write operation is performed. As soon as one takes the lock, the other transaction fails entirely and needs to be retried since any data it read earlier might now be invalid. This gives the perception of greater concurrency but the cost of recovering from concurrency issues also goes up.

We tried to switch to deferred transactions by default in version 2.0, but several users started seeing deadlocks in their apps (aspnet/Microsoft.Data.Sqlite#474).

I'm open to the idea of adding an option like BeginTransaction(deferred: true), but I'd need to see a scenario where it really adds value. I feel like doing most reads outside of a transaction and only the ones that really need consistency inside an immediate transaction will probably give the best experience for most apps.

See also Transactions in Microsoft.Data.Sqlite.

@tzehetner
Copy link
Author

tzehetner commented May 2, 2020

BeginTransaction(bool deferred) would be exactly what I need. I probably should have chosen a different title for this issue: "Support deferred transactions".

The scenario I am facing:

My application has to generate multiple complicated reports in parallel on background threads. Each report requires multiple select statements and the data must be consistent, therefore a transaction is required. Generating one of those reports could take minutes, so it makes a huge difference, if it happens sequentially or in parallel.

With the old System.Data.SQLite provider I simply started deferred transactions. As long as you execute no updates or inserts but only selects in a deferred transaction, SQLite uses a read transaction instead of a write transaction. This way the reports can be generated in parallel, because SQLite supports multiple simultaneous read transactions but only one simultaneous write transaction.

In general, I follow these two rules:

  • Select-only transactions should be started as deferred because it allows concurrent execution.
  • Transactions, that contain updates and inserts, should be started as immediate, otherwise you risk deadlocks and other problems.

I think it’s correct to NOT make deferred the default, because if two deferred transactions execute a select (which starts a read transaction) and then execute an insert or update (which upgrades to a write transaction) it could easily lead to deadlocks, because none of the transactions can upgrade to a write transaction before the other transaction finishes its read transaction.

@bricelam
Copy link
Contributor

bricelam commented May 2, 2020

Sounds like a great scenario. I’m ok adding API for advanced scenarios like this.

The workaround is to manually send the BEGIN TRANSACTION statement:

command.CommandText = “BEGIN DEFERRED TRANSACTION”;
command.ExecuteNonQuery();
try
{
    // ...

    command.CommandText = “COMMIT”;
    command.ExecuteNonQuery();
}
catch (SqliteException ex)
    when (ex.SqliteErrorCode == SQLitePCL.raw.SQLITE_BUSY)
{
    // TODO: Retry
}

@bricelam bricelam changed the title Microsoft.Data.Sqlite: Support concurrent read transactions Microsoft.Data.Sqlite: Support deferred transactions May 2, 2020
@bricelam bricelam added area-adonet-sqlite good first issue This issue should be relatively straightforward to fix. labels May 2, 2020
@tzehetner
Copy link
Author

Thanks, that's great! How long does it normally take, until new features like this one are available?

@bricelam
Copy link
Contributor

bricelam commented May 4, 2020

Our next release is 5.0 in November. Are you interested in contributing?

@ArnaudDebaene
Copy link

Snapshot isolation works by taking a snapshot of the data when the transaction begins so that
read operations are completely isolated from write operations. No locking is required. But SQLite
simply doesn't implement this behavior.

It seems to me than Sqlite can support parallel readers when in WAL mode (see https://www.sqlite.org/wal.html, § 2.2). Wouldn't that be an interesting addon to the provider? It shall be possible to support Snapshot isolation level only after checking the DB is in WAL mode...

@tzehetner
Copy link
Author

@bricelam Why do you ask? Do you want me to implement the deferred transaction feature?

@bricelam
Copy link
Contributor

bricelam commented May 7, 2020

@tzehetner Only if you’re interested. We might not be able to get to it for 5.0

@bricelam
Copy link
Contributor

bricelam commented May 7, 2020

@ArnaudDebaene I don’t think the semantics are quite the same as snapshot isolation. That said, once we enable deferred transactions, you can get even greater throughput by enabling WAL. Note, EF Core automatically enables WAL on the databases it creates.

@ajcvickers ajcvickers added this to the Backlog milestone May 8, 2020
@bricelam
Copy link
Contributor

bricelam commented May 9, 2020

cc @AlexanderTaeschner

@AlexanderTaeschner
Copy link
Contributor

I just read the SQLite documentation. If I understand the page correct, than all transactions are deferred transactions by default ("Transactions can be DEFERRED, IMMEDIATE, or EXCLUSIVE. The default transaction behavior is DEFERRED."). From that page, the only mode which is not yet accessible via BeginTransaction would be EXCLUSIVE.

@tzehetner
Copy link
Author

tzehetner commented May 10, 2020

@AlexanderTaeschner At the moment the C# code starts transactions mostly as IMMEDIATE and in one case as DEFERRED, but with the read_uncommitted PRAGMA. There is no way to start a deferred transaction without the read_uncommitted PRAGMA.

@bricelam AlexanderTaeschner mentioned the third transaction mode: EXCLUSIVE. Perhaps the new BeginTransaction overload should have an enum parameter instead of a boolean parameter, so that you can pass either DEFERRED, IMMEDIATE or EXCLUSIVE?

@bricelam Regarding the implementation: I don't have time now, but I will look into it and perhaps provide an implementation later on.

@bricelam
Copy link
Contributor

I don’t think the difference between IMMEDIATE and EXCLUSIVE is significant. I’d like to see a real-world scenario for it before we start designing for it.

A bool to defer when serialized is enough for now. Long-running reads in the background is a valid real-world scenario to enable.

@tzehetner
Copy link
Author

OK, I understand.

@AlexanderTaeschner
Copy link
Contributor

I just took a look at the actual code and I don't see what you describe above. If one calls BeginTransaction() on a SqliteConnection this calls BeginTransaction(IsolationLevel.Unspecified) and this creates a new SqliteTransaction in which connection.ExecuteNonQuery("BEGIN;"); is invoked for IsolationLevel.Unspecified, but the PRAGMA calls are skipped since the isolation level is neither ReadUncommitted or Serializable. I'm a bit confused at the moment.

@tzehetner
Copy link
Author

It's easy to miss, but before connection.ExecuteNonQuery("BEGIN;"); the IsolationLevel propertey is checked instead of the isolationLevel constructor parameter. The property upgrades IsolationLevel.Unspecified to IsolationLevel.Serializable and the constructor therefore executes connection.ExecuteNonQuery("BEGIN IMMEDIATE;");.

@bricelam bricelam added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jun 12, 2020
@bricelam bricelam modified the milestones: Backlog, 5.0.0 Aug 11, 2020
@ghost ghost closed this as completed in #21212 Aug 11, 2020
@tzehetner
Copy link
Author

tzehetner commented Aug 11, 2020

@AlexanderTaeschner @bricelam Thanks a lot!

@marchy
Copy link

marchy commented Jun 1, 2023

@ArnaudDebaene I don’t think the semantics are quite the same as snapshot isolation. That said, once we enable deferred transactions, you can get even greater throughput by enabling WAL. Note, EF Core automatically enables WAL on the databases it creates.

Can you explain why the semantics are not the same as snapshot isolation?
The Sqlite Isolation page under the 'Isolation and Concurrency' section states particularly that "In WAL mode, SQLite exhibits "snapshot isolation".".

Is there a divergent definition of "snapshot isolation" here or a misunderstanding of WAL mode?
We use snapshot isolation for our EF provider (cannot for the life of me remember why, but we had set it as our default after setting some issues). We'd love to be able to write unit tests against in-memory Sqlite for our transaction-scoped code but the lack of support for this is blocking us from matching the same semantics --> perhaps we can disable this setting for testing if it's irrelevant (ie: no expected parallelism)

@bricelam
Copy link
Contributor

bricelam commented Jun 5, 2023

It shall be possible to support Snapshot isolation level only after checking the DB is in WAL mode

This sounds reasonable. Please file a new issue.

@marchy AFAIK, you can't mix in-memory databases and WAL

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-adonet-sqlite closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported good first issue This issue should be relatively straightforward to fix. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants