Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: support SQL parameters #390

Closed
jba opened this issue Oct 7, 2016 · 10 comments
Closed

bigquery: support SQL parameters #390

jba opened this issue Oct 7, 2016 · 10 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Milestone

Comments

@jba
Copy link
Contributor

jba commented Oct 7, 2016

See parameterMode and queryParameters in the query job reference.

@jba jba added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the BigQuery API. labels Oct 7, 2016
@jba jba added this to the 2016 Q4 milestone Oct 7, 2016
@jba jba self-assigned this Oct 7, 2016
@c0b
Copy link

c0b commented Nov 10, 2016

+1 on this

since https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/0.4.0/bigquery?method=query allows any additional parameters in the options, we can put parameterMode and queryParameters in the options,

but the problem is even the reference isn't very clear,
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query#request-body

this is what it says, but what are the possible values for the string typed parameterMode? like positional or named ?

parameterMode string [Experimental] Standard SQL only. Whether to use positional (?) or named (@myparam) query parameters in this query.

if I choose named what's the syntax I can use in the query?

query string [Required] A query string, following the BigQuery query syntax, of the query to execute. Example: "SELECT count(f1) FROM [myProjectId:myDatasetId.myTableId]".

how to request additional elaboration on the reference itself?

@c0b
Copy link

c0b commented Nov 14, 2016

I've figured out an example how to run without special support from this library, from the python bg tool's apilog (from the google-cloud-sdk), also from ./opt/google-cloud-sdk/platform/bq/bq.py source code (googleapis/google-cloud-python#2342 (comment) says the google-cloud-sdk is not open source, but you can read the source to see how queryParameters is constructed):

$ bq --apilog - query --use_legacy_sql=false --parameter arr:'ARRAY<INTEGER>':'[2,3,4]' 'SELECT 3 IN UNNEST(@arr)'
[...]

INFO:root:body: {"configuration": {"query": {"query": "SELECT 3 IN UNNEST(@arr)", "useLegacySql": false, "queryParameters": [{"parameterType": {"type": "ARRAY", "arrayType": {"type": "INTEGER"}}, "parameterValue": {"arrayValues": [{"value": 2}, {"value": 3}, {"value": 4}]}, "name": "arr"}]}}, "jobReference": ...}

INFO:root:{
 "kind": "bigquery#job",
 "etag": "...",
 "id": "...",
 "selfLink": "...",
 "jobReference": {
    ...
 },
 "configuration": {
  "query": {
   "query": "SELECT 3 IN UNNEST(@arr)",
   "destinationTable": {
      ...
   },
   "createDisposition": "CREATE_IF_NEEDED",
   "writeDisposition": "WRITE_TRUNCATE",
   "useLegacySql": false,
   "queryParameters": [
    {
     "name": "arr",
     "parameterType": {
      "type": "ARRAY",
      "arrayType": {
       "type": "INT64"
      }
     },
     "parameterValue": {
      "arrayValues": [
       {
        "value": "2"
       },
       {
        "value": "3"
       },
       {
        "value": "4"
       }
      ]
     }
    }
   ]
  }
 },
 "status": {
  "state": "RUNNING"
 },
 "statistics": {
  "creationTime": "1479104934691",
  "startTime": "1479104934979"
 },
}
INFO:root:--response-end--
+------+
| f0_  |
+------+
| true |
+------+

this is saying:

  1. parameterMode is optional: I can set it to 'positional' or 'named' but any other value results an error, but if not sending this, it's totally fine; you see bq tool is not sending parameterMode at all
  2. it's also a good example of how ARRAY type should be sent, it's in a recursive structure, from this I inferred how to construct a STRUCT type and it works

or saying if I am doing queryParameters in this structure on myself, that just works fine, but maintainers of this library can make an API for users to use queryParameters easier

   "queryParameters": [
    {
     "name": "arr",
     "parameterType": {
      "type": "ARRAY",
      "arrayType": {
       "type": "INT64"
      }
     },
     "parameterValue": {
      "arrayValues": [
       {
        "value": "2"
       },
       {
        "value": "3"
       },
       {
        "value": "4"
       }
      ]
     }

@jba
Copy link
Contributor Author

jba commented Nov 25, 2016

@c0b, thanks for your explorations! They have been a big help as we start to implement this.

@jba
Copy link
Contributor Author

jba commented Nov 28, 2016

This is our design for query parameters, which is implemented in CLs 9557, 9558, 9559 and 9561:

As shown in c0b's comments above, we can ignore parameter mode. Also, specifying a query parameter requires a value, expressed in a particular way that doesn't match anything else in the API, and a corresponding type, which also has a unique way of being expressed.

Instead of requiring users to provide values and types in this way, we accept query parameters as ordinary Go values. We use reflection to encode the Go value as a query parameter value, as well as to determine and encode its type.

We can easily deal with most BigQuery types this way:

  • The various flavors of Go integer map to BigQuery type INT64. We disallow uint, uint64 and uintptr, because they are (potentially) 64-bit unsigned values, so they cannot be accurately represented as a signed 64-bit integer.
  • Go floating-point values map to FLOAT64.
  • Go strings map to STRING.
  • Go bools map to BOOL.
  • time.Time maps to TIMESTAMP.
  • Slices and Arrays map to ARRAY<T>.
  • Structs and pointers to structs map to STRUCT<...>. We use only the exported fields of the struct, which we can now obtain accurately from the new cloud.google.com/go/internal/fields package.

However, that leaves the three civil time types DATE, TIME and DATETIME, which have no Go equivalent.

We could define these types in Go easily enough. But they are too general to live in this client package. Also, although Date is clearly useful and need for it has come up before, the other two don't seem to be generally missed. Given that the standard library is frozen and golang.org/x is in flux, there doesn't seem to be any great place to put Date.

We also don't really need full-fledged Go types for these BigQuery types, because BigQuery always treats them as strings, on both input and output. (BigQuery Standard SQL provides operations on these types that include componentwise access, but that doesn't concern this client.) We only need Go types for DATE, TIME and DATETIME so that a query parameter can be typed as DATE instead of STRING.

So in the bigquery package we define

type (
    Date string
    Time string
    DateTime string
)

To provide a query parameter of type DATE, one writes Date("2016-11-26").

For consistency, we also use these wrappers when reading fields of these types. (This is not a breaking change; formerly we did not recognize these types and returned an error.)

/cc @mcgreevy @zombiezen @adams-sarah @pongad @nightlyone

@zombiezen
Copy link
Contributor

This seems fine, except I'd like to push back a little on having the three date types be strings. Could we use structs instead?

@jba
Copy link
Contributor Author

jba commented Nov 28, 2016

Then we're going down what I called the "full-fledged types" route. Nothing wrong with that per se, but it's a slippery slope: if we have a bigquery.Date type with month, day and year, we'll need methods to convert to/from a string, so we should export those. Maybe also a method to produce a time.Time given a time.Location. Now this is getting to be a useful class. Maybe it shouldn't be in bigquery, but some place more general. But where?

And should we do the same for Time and DateTime? It seems we should, by symmetry. So should we write a civil package for all these civil time concepts?

I'm actually in favor of that, but now we've traveled a long way from bigquery types. So I thought I would keep things as simple as possible.

@derekperkins
Copy link
Contributor

Maybe this kills your mapping, but it seems like it would be easier to use a time�.Time everywhere, just discarding the unnecessary parts based on the underlying type.

I don't have a horse in this race, we use TIMESTAMP everywhere

@zombiezen
Copy link
Contributor

I agree that there is a general need for civil dates and times. I think for now that we can start with simple structured types (with no operations defined) and then eventually use the smarter type. Since we're using reflect, there isn't an API signature that would need changing, we would just document "method X supports the civil.Date type in structs" when it becomes available. However, if there is trickery in what can be put in the SQL strings for civil dates, then I would agree with the string type.

@derekperkins: It's a possibility, but time.Time conceptually represents an instant, and will behave poorly (badly formatted strings, wrong comparisons, etc.) when the other fields are zeroed out.

@jba
Copy link
Contributor Author

jba commented Nov 28, 2016

@derekperkins: As Ross says, it's considered a Bad Idea to confuse time-at-an-instant with civil time. You can't actually construct a time.Time without a location by design.

@jba
Copy link
Contributor Author

jba commented Nov 29, 2016

@zombiezen: A change of types wouldn't break the API signature, but it would break behavior when reading in a way we couldn't make backwards-compatible. If we switched Date types and you read a DATE column, you'd get a different type.

jba added a commit that referenced this issue Dec 1, 2016
Initial support for query parameters. This CL supports only scalar
parameters (no arrays or structs).

Query parameters are cumbersome to use in the underlying API for two
reasons: all scalar values must be encoded as strings, and each value
must be accompanied by a type. To improve the ergonomics, this client
accepts Go values for parameters, converting them to strings and
determining their types.

As @c0b pointed out, the parameter mode is unnecessary, so we omit it
from the surface and the RPC request.

See #390.

Change-Id: Id4c1ba740bdee00979efbffe7e0f8276ed80ff00
Reviewed-on: https://code-review.googlesource.com/9557
Reviewed-by: Ross Light <light@google.com>
jba added a commit that referenced this issue Dec 1, 2016
See #390.

Change-Id: I4006c8f17dc7131997d096701420ea3fc1e13174
Reviewed-on: https://code-review.googlesource.com/9559
Reviewed-by: Ross Light <light@google.com>
@jba jba closed this as completed in 3713d73 Dec 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

4 participants