Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepared query execution flow redesign #675

Closed
vietj opened this issue Jun 4, 2020 · 24 comments
Closed

Prepared query execution flow redesign #675

vietj opened this issue Jun 4, 2020 · 24 comments
Assignees

Comments

@vietj
Copy link
Member

vietj commented Jun 4, 2020

Currently one shot prepared query is a composite execution of a PrepareStatementCommand and ExtendedQueryCommand. The first command will prepare the statement and the later will execute it.

When prepare statement are cached, the PrepareStatementCommand is only executed once (assuming the statement is not evicted from the cache).

When prepare statement are not cached then it is unclear what should happen and likely this depends on the database.

What we have now

PostgreSQL

What is happening

Todo (study)

What should happen

A two steps executions issues the following commands

  • PrepareCommand
    • PARSE
    • DESCRIBE
    • SYNC
  • ExtendedPreparedQuery
    • BIND
    • EXECUTE
    • SYNC

One shot prepared query should be executed using un-named prepared statement, so the sequence above is replaced by:

  • PARSE
  • BIND
  • EXECUTE
  • SYNC

MySQL

What is happening

two-step executions(i.e. prepare->execute) & one-shot preparedQuery

  • PrepareCommand
    send COM_STMT_PREPARE to server

  • ExtendedQueryCommand
    locally bind the parameters and set the metadata
    send COM_STMT_EXECUTE to server

  • CloseStatementCommand
    send COM_STMT_CLOSE to server

What should happen

There is no one-shot prepared-query protocol specific command in MySQL. so we can't change the command flow.

MSSQL

What is happening

two-step executions(i.e. prepare->execute) & one-shot preparedQuery

  • PrepareCommand
    do nothing but store the SQL string in the client

  • ExtendedQueryCommand
    locally bind the parameters and set the metadata
    send sp_executesql request to the server

  • CloseStatementCommand
    do nothing because we use sp_executesql

What should happen

two-step executions(i.e. prepare->execute)

  • PrepareCommand
    send sp_prepare request to the server

  • ExtendedQueryCommand
    send sp_execute request to the server

  • CloseStatementCommand
    send sp_unprepare request to the server

one-shot preparedQuery
send sp_executesql request to the server (using T-SQL)

DB2

What is happening

  • PrepareCommand
    Send PREPARE and DESCRIBE commands

  • ExtendedQueryCommand
    Send OPEN_QUERY passing statement params in bound according to column metadata retrieved from PREPARE and DESCRIBE

  • CloseStatementCommand
    No-op

What should happen

TBD: I can tell from tracing JDBC on Wireshark that it's possible to flow PREPARE, DESCRIBE, and OPEN_QUERY all at once (i.e. 1 db round-trip instead of 2), but I will need to investigate this more. It's not immediately obvious how this is possible because I need to read the response from the PREP+DESC in order to bind params properly for the OPNQRY

Proposal

Here is the proposal for changes.

The current commands PrepareStatement and ExtendedQueryCommand will remain but they will be used differently.

The main difference with the current scheme is that executing a one-shot query is achieved with an ExtendedQueryCommand with a null internal PreparedStatement.

In the current flow, a PrepareStatement command is issued and then and then ExtendedQueryCommand is executed. Which can lead to out of order command execution.

Two steps prepared query

  • the sequence SqlConnection.prepare(sql) method triggers the execution of a PrepareStatement command and gets back an internal PreparedStatement that is cached in the API PreparedStatement object
  • the sequence PreparedStatement.query().execute(tuple) method triggers the execution of a ExtendedQueryCommand with the internal PreparedStatement for execution

This is more or less like before.

One step cached prepared query

  • the sequence SqlClient#preparedQuery(sql).execute(tuple) triggers the execution of an ExtendedQueryCommand without an internal PreparedStatement and with the cached boolean set to true.
  • when there is no cached PreparedStatement then
    • the socket pauses the pipeline
    • the socket executes a PrepareStatementCommand
    • when this command completes
      • the PreparedStatement will be cached
      • the ExtendedQueryCommand executes using this PreparedStatement
      • the pipeline execution resumes
  • when there is a cached PreparedStatement then the codec executes a single step query as we fall back in the usual case

One step prepared query (no caching)

  • the sequence SqlClient#preparedQuery(sql).execute(tuple) triggers the execution of an ExtendedQueryCommand without an internal PreparedStatement and with the cached boolean set to false
  • the socket pauses the pipeline
  • the sockets executes a PrepareStatementCommand
  • when the preparation completes
    • the ExtendedQueryCommand executes with the PreparedStatement
    • the codec executes the command and should clean the prepared statement to avoid leaking
    • the pipeline resumes

Note for PostgreSQL, preparing a non cached one-shot prepared query uses the unnamed statement and does not require a close after the statement it used.

PRs

NOTE: this PR unifies also ExtendedQueryCommand and ExtendedBatchQueryCommand in a single command but it was a non initial goal

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

@aguibert @BillyYccc can you fill the corresponding sections in this document ?

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

I would like to get a clear understanding of what is currently happening and what should ideally happen for all databases. We might have currently some bugs related to this.

@BillyYccc
Copy link
Member

I also find having PreparedStatement cached by the user directly is a poor design by me :( because that cached statement is bound with the connection and users have no way to know if the statement is still valid.

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

I think it is fine to allow control over PreparedStatement as long as user will not use them after the connection has been closed

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

@BillyYccc we need to determine how it works for one-shot queries when caching is disabled (other than the execution sequence)

for MySQL does it mean that we should have the sequence

  • send COM_STMT_PREPARE to server
  • locally bind the parameters and set the metadata
  • send COM_STMT_EXECUTE to server
  • send COM_STMT_CLOSE to server

for un-cached one shot queries ?

@BillyYccc
Copy link
Member

@BillyYccc we need to determine how it works for one-shot queries when caching is disabled (other than the execution sequence)

for MySQL does it mean that we should have the sequence

  • send COM_STMT_PREPARE to server
  • locally bind the parameters and set the metadata
  • send COM_STMT_EXECUTE to server
  • send COM_STMT_CLOSE to server

for un-cached one shot queries ?

yes that's how it works

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

can you update the doc with that ?

@BillyYccc
Copy link
Member

which part of doc do you mean?

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

for MySQL you wrote "There is no one-shot prepared-query protocol specific command in MySQL. so we can't change the command flow." We need to add the actual flow that I described I guess

@BillyYccc
Copy link
Member

We need also consider if we need to autoimatically close the statement after executing one-shot prepared-queries.
For PostgreSQL and MSSQL one-shot preparedQuery it seems we don't have to clean the resources.

But for MySQL we have to send a close command otherwise the statements never get deallocated. Currently there's no such mechanism to clean the uncached one-shot statements so these are leaked, enabling the cache would let us send a close command when the statement gets evicted from the cache.

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

@BillyYccc yes which is why I am asking for a flow definition of what happens for one-shot statement so we can implement it cleanly without leaking :-)

@vietj
Copy link
Member Author

vietj commented Jun 4, 2020

I will likely start working on this for PostgreSQL soon and push a branch where you can push for MySQL and MSSQL.

@vietj vietj self-assigned this Jun 4, 2020
@aguibert
Copy link
Member

aguibert commented Jun 5, 2020

I think it is fine to allow control over PreparedStatement as long as user will not use them after the connection has been closed

We should do what JDBC did here and raise an exception if a user attempts to use a prepared statement from a closed connection

@BillyYccc
Copy link
Member

If we let users hold references to the prepared statement and cache it themselves, how would a user know which connection the cached prepared statement is running? In that way it's impossible to chain a cached prepared statement execution and other operations in the same stateful connection.

I believe we need to cache the PreparedStatement in the driver layer but not directly use the cache which is currently being used by the one-shot prepared queries. Moreover, I don't know if we need cache for one-shot prepared queries after the changes(to one-step command flow), but it seems at least MySQL client needs that :-)

@aguibert
Copy link
Member

aguibert commented Jun 5, 2020

If we let users hold references to the prepared statement and cache it themselves, how would a user know which connection the cached prepared statement is running?

I don't expect users will keep a cache of multiple prepared statements (e.g. a map/set of PS), rather, I expect users will want to cache individual prepared statements and throwing an exception would just be a way to guard against application programming errors where the user caches the PS beyond the scope of the connection.

@vietj vietj changed the title One shot prepared query improvements Prepared query execution flow redesign Jun 5, 2020
@vietj
Copy link
Member Author

vietj commented Jun 5, 2020

I updated the document @BillyYccc @aguibert I will share a branch so you can look at the changes in practice

@aguibert
Copy link
Member

aguibert commented Jun 5, 2020

@vietj can you elaborate a bit on the goal of this issue? Is it for correctness in order to resolve bugs with certain usage patterns? Or is this more of a performance improvement?

@vietj
Copy link
Member Author

vietj commented Jun 5, 2020

it is for correctness (with pipelining there are some out of order command execution happening), we are currently also having a regression for PostgreSQL under high load but this is a side effect.

@vietj
Copy link
Member Author

vietj commented Jun 5, 2020

the fact that a cached one-shot query implicitly creates a PreparedStatement and then executes can create out of order command execution, this is solved by having one-shot queries to be executed without a prepared statement and let the codec handle it

@vietj
Copy link
Member Author

vietj commented Jun 5, 2020

this also improves a few things I was not comfortable with such as the BiCommand.

@vietj
Copy link
Member Author

vietj commented Jun 5, 2020

I added a link to the draft PR

@vietj
Copy link
Member Author

vietj commented Jun 5, 2020

@BillyYccc we can discuss the caching of API PreparedStatement in another issue after we finish this one

@vietj
Copy link
Member Author

vietj commented Jun 15, 2020

Updated the proposal

@vietj
Copy link
Member Author

vietj commented Jun 15, 2020

Some implementations need to close the prepared statement after the execution of the ExtendedQueryCommand when the PreparedStatement has the cached field set to false.

cc @BillyYccc @aguibert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants