Prepared query execution flow redesign #675

vietj · 2020-06-04T12:45:23Z

Currently one shot prepared query is a composite execution of a PrepareStatementCommand and ExtendedQueryCommand. The first command will prepare the statement and the later will execute it.

When prepare statement are cached, the PrepareStatementCommand is only executed once (assuming the statement is not evicted from the cache).

When prepare statement are not cached then it is unclear what should happen and likely this depends on the database.

What we have now

PostgreSQL

What is happening

Todo (study)

What should happen

A two steps executions issues the following commands

PrepareCommand
- PARSE
- DESCRIBE
- SYNC
ExtendedPreparedQuery
- BIND
- EXECUTE
- SYNC

One shot prepared query should be executed using un-named prepared statement, so the sequence above is replaced by:

PARSE
BIND
EXECUTE
SYNC

MySQL

What is happening

two-step executions(i.e. prepare->execute) & one-shot preparedQuery

PrepareCommand
send COM_STMT_PREPARE to server
ExtendedQueryCommand
locally bind the parameters and set the metadata
send COM_STMT_EXECUTE to server
CloseStatementCommand
send COM_STMT_CLOSE to server

What should happen

There is no one-shot prepared-query protocol specific command in MySQL. so we can't change the command flow.

MSSQL

What is happening

two-step executions(i.e. prepare->execute) & one-shot preparedQuery

PrepareCommand
do nothing but store the SQL string in the client
ExtendedQueryCommand
locally bind the parameters and set the metadata
send sp_executesql request to the server
CloseStatementCommand
do nothing because we use sp_executesql

What should happen

two-step executions(i.e. prepare->execute)

PrepareCommand
send sp_prepare request to the server
ExtendedQueryCommand
send sp_execute request to the server
CloseStatementCommand
send sp_unprepare request to the server

one-shot preparedQuery
send sp_executesql request to the server (using T-SQL)

DB2

What is happening

PrepareCommand
Send PREPARE and DESCRIBE commands
ExtendedQueryCommand
Send OPEN_QUERY passing statement params in bound according to column metadata retrieved from PREPARE and DESCRIBE
CloseStatementCommand
No-op

What should happen

TBD: I can tell from tracing JDBC on Wireshark that it's possible to flow PREPARE, DESCRIBE, and OPEN_QUERY all at once (i.e. 1 db round-trip instead of 2), but I will need to investigate this more. It's not immediately obvious how this is possible because I need to read the response from the PREP+DESC in order to bind params properly for the OPNQRY

Proposal

Here is the proposal for changes.

The current commands PrepareStatement and ExtendedQueryCommand will remain but they will be used differently.

The main difference with the current scheme is that executing a one-shot query is achieved with an ExtendedQueryCommand with a null internal PreparedStatement.

In the current flow, a PrepareStatement command is issued and then and then ExtendedQueryCommand is executed. Which can lead to out of order command execution.

Two steps prepared query

the sequence SqlConnection.prepare(sql) method triggers the execution of a PrepareStatement command and gets back an internal PreparedStatement that is cached in the API PreparedStatement object
the sequence PreparedStatement.query().execute(tuple) method triggers the execution of a ExtendedQueryCommand with the internal PreparedStatement for execution

This is more or less like before.

One step cached prepared query

the sequence SqlClient#preparedQuery(sql).execute(tuple) triggers the execution of an ExtendedQueryCommand without an internal PreparedStatement and with the cached boolean set to true.
when there is no cached PreparedStatement then
- the socket pauses the pipeline
- the socket executes a PrepareStatementCommand
- when this command completes
  - the PreparedStatement will be cached
  - the ExtendedQueryCommand executes using this PreparedStatement
  - the pipeline execution resumes
when there is a cached PreparedStatement then the codec executes a single step query as we fall back in the usual case

One step prepared query (no caching)

the sequence SqlClient#preparedQuery(sql).execute(tuple) triggers the execution of an ExtendedQueryCommand without an internal PreparedStatement and with the cached boolean set to false
the socket pauses the pipeline
the sockets executes a PrepareStatementCommand
when the preparation completes
- the ExtendedQueryCommand executes with the PreparedStatement
- the codec executes the command and should clean the prepared statement to avoid leaking
- the pipeline resumes

Note for PostgreSQL, preparing a non cached one-shot prepared query uses the unnamed statement and does not require a close after the statement it used.

PRs

master: Rework prepared queries #677
3.9: Backport prepared queries rework #686

NOTE: this PR unifies also ExtendedQueryCommand and ExtendedBatchQueryCommand in a single command but it was a non initial goal

The text was updated successfully, but these errors were encountered:

vietj · 2020-06-04T12:45:49Z

@aguibert @BillyYccc can you fill the corresponding sections in this document ?

vietj · 2020-06-04T12:48:44Z

I would like to get a clear understanding of what is currently happening and what should ideally happen for all databases. We might have currently some bugs related to this.

BillyYccc · 2020-06-04T13:49:06Z

I also find having PreparedStatement cached by the user directly is a poor design by me :( because that cached statement is bound with the connection and users have no way to know if the statement is still valid.

vietj · 2020-06-04T13:55:19Z

I think it is fine to allow control over PreparedStatement as long as user will not use them after the connection has been closed

vietj · 2020-06-04T13:57:16Z

@BillyYccc we need to determine how it works for one-shot queries when caching is disabled (other than the execution sequence)

for MySQL does it mean that we should have the sequence

send COM_STMT_PREPARE to server
locally bind the parameters and set the metadata
send COM_STMT_EXECUTE to server
send COM_STMT_CLOSE to server

for un-cached one shot queries ?

BillyYccc · 2020-06-04T14:02:26Z

@BillyYccc we need to determine how it works for one-shot queries when caching is disabled (other than the execution sequence)

for MySQL does it mean that we should have the sequence

send COM_STMT_PREPARE to server

locally bind the parameters and set the metadata

send COM_STMT_EXECUTE to server

send COM_STMT_CLOSE to server

for un-cached one shot queries ?

yes that's how it works

vietj · 2020-06-04T14:05:20Z

can you update the doc with that ?

BillyYccc · 2020-06-04T14:06:25Z

which part of doc do you mean?

vietj · 2020-06-04T14:06:56Z

for MySQL you wrote "There is no one-shot prepared-query protocol specific command in MySQL. so we can't change the command flow." We need to add the actual flow that I described I guess

BillyYccc · 2020-06-04T16:11:05Z

We need also consider if we need to autoimatically close the statement after executing one-shot prepared-queries.
For PostgreSQL and MSSQL one-shot preparedQuery it seems we don't have to clean the resources.

But for MySQL we have to send a close command otherwise the statements never get deallocated. Currently there's no such mechanism to clean the uncached one-shot statements so these are leaked, enabling the cache would let us send a close command when the statement gets evicted from the cache.

vietj · 2020-06-04T16:32:47Z

@BillyYccc yes which is why I am asking for a flow definition of what happens for one-shot statement so we can implement it cleanly without leaking :-)

vietj · 2020-06-04T16:33:45Z

I will likely start working on this for PostgreSQL soon and push a branch where you can push for MySQL and MSSQL.

aguibert · 2020-06-05T03:53:08Z

I think it is fine to allow control over PreparedStatement as long as user will not use them after the connection has been closed

We should do what JDBC did here and raise an exception if a user attempts to use a prepared statement from a closed connection

BillyYccc · 2020-06-05T15:24:13Z

If we let users hold references to the prepared statement and cache it themselves, how would a user know which connection the cached prepared statement is running? In that way it's impossible to chain a cached prepared statement execution and other operations in the same stateful connection.

I believe we need to cache the PreparedStatement in the driver layer but not directly use the cache which is currently being used by the one-shot prepared queries. Moreover, I don't know if we need cache for one-shot prepared queries after the changes(to one-step command flow), but it seems at least MySQL client needs that :-)

aguibert · 2020-06-05T16:12:51Z

If we let users hold references to the prepared statement and cache it themselves, how would a user know which connection the cached prepared statement is running?

I don't expect users will keep a cache of multiple prepared statements (e.g. a map/set of PS), rather, I expect users will want to cache individual prepared statements and throwing an exception would just be a way to guard against application programming errors where the user caches the PS beyond the scope of the connection.

vietj · 2020-06-05T16:17:06Z

I updated the document @BillyYccc @aguibert I will share a branch so you can look at the changes in practice

aguibert · 2020-06-05T16:19:08Z

@vietj can you elaborate a bit on the goal of this issue? Is it for correctness in order to resolve bugs with certain usage patterns? Or is this more of a performance improvement?

vietj · 2020-06-05T16:28:34Z

it is for correctness (with pipelining there are some out of order command execution happening), we are currently also having a regression for PostgreSQL under high load but this is a side effect.

vietj · 2020-06-05T16:31:34Z

the fact that a cached one-shot query implicitly creates a PreparedStatement and then executes can create out of order command execution, this is solved by having one-shot queries to be executed without a prepared statement and let the codec handle it

vietj · 2020-06-05T16:37:30Z

this also improves a few things I was not comfortable with such as the BiCommand.

vietj · 2020-06-05T16:39:32Z

I added a link to the draft PR

vietj · 2020-06-05T16:45:05Z

@BillyYccc we can discuss the caching of API PreparedStatement in another issue after we finish this one

vietj · 2020-06-15T15:37:22Z

Updated the proposal

vietj · 2020-06-15T15:39:50Z

Some implementations need to close the prepared statement after the execution of the ExtendedQueryCommand when the PreparedStatement has the cached field set to false.

cc @BillyYccc @aguibert

vietj added the enhancement label Jun 4, 2020

vietj self-assigned this Jun 4, 2020

vietj changed the title ~~One shot prepared query improvements~~ Prepared query execution flow redesign Jun 5, 2020

vietj mentioned this issue Jun 5, 2020

Rework prepared queries #677

Merged

vietj mentioned this issue Jun 19, 2020

Backport prepared queries rework #686

Merged

vietj closed this as completed Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepared query execution flow redesign #675

Prepared query execution flow redesign #675

vietj commented Jun 4, 2020 •

edited

Loading

vietj commented Jun 4, 2020

vietj commented Jun 4, 2020

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

vietj commented Jun 4, 2020 •

edited

Loading

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

vietj commented Jun 4, 2020

aguibert commented Jun 5, 2020

BillyYccc commented Jun 5, 2020

aguibert commented Jun 5, 2020

vietj commented Jun 5, 2020

aguibert commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020 •

edited

Loading

vietj commented Jun 15, 2020

vietj commented Jun 15, 2020

Prepared query execution flow redesign #675

Prepared query execution flow redesign #675

Comments

vietj commented Jun 4, 2020 • edited Loading

What we have now

PostgreSQL

What is happening

What should happen

MySQL

What is happening

What should happen

MSSQL

What is happening

What should happen

DB2

What is happening

What should happen

Proposal

Two steps prepared query

One step cached prepared query

One step prepared query (no caching)

vietj commented Jun 4, 2020

vietj commented Jun 4, 2020

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

vietj commented Jun 4, 2020 • edited Loading

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

BillyYccc commented Jun 4, 2020

vietj commented Jun 4, 2020

vietj commented Jun 4, 2020

aguibert commented Jun 5, 2020

BillyYccc commented Jun 5, 2020

aguibert commented Jun 5, 2020

vietj commented Jun 5, 2020

aguibert commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020

vietj commented Jun 5, 2020 • edited Loading

vietj commented Jun 15, 2020

vietj commented Jun 15, 2020

vietj commented Jun 4, 2020 •

edited

Loading

vietj commented Jun 4, 2020 •

edited

Loading

vietj commented Jun 5, 2020 •

edited

Loading