Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery insert_data() fails with record type columns (nested objects) #2951

Closed
tswast opened this issue Jan 20, 2017 · 11 comments
Closed

BigQuery insert_data() fails with record type columns (nested objects) #2951

tswast opened this issue Jan 20, 2017 · 11 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p0 Highest priority. Critical issue. P0 implies highest priority. release blocking Required feature/issue must be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@tswast
Copy link
Contributor

tswast commented Jan 20, 2017

BigQuery tabledata.insertAll supports inserting rows with columns that take record types (nested objects).

A customer reports that

My table structure has nested schemas. The [google-cloud-python] docs don't say how to handle nested structures. All we get is, "rows (list of tuples) – Row data to be inserted. Each tuple should contain data for each schema field on the current table and in the same order as the schema fields." A list of tuples. That's it. If I have nested data do I make sub-tuples? Is it simply not supported at this time?

I dug into the BQ code to find out what was going on. BQ does,

zip(table._schema, rows)

which looks like it won't support nested inserts (or could -- with sub-tuples, but who knows).

Example in the API explorer, with the following table schema:

image

Try it out in the API Explorer for bigquery.tabledata.insertAll. Enter the following for the request body (after creating a table with the same schema). You need to switch the editor to free-form mode

{
  "rows": 
  [
    {
      "insert_id": "12345",
      "json": 
      {
        "string_col": "Some value",
        "record_col": {
          "nested_string": "another string value",
          "nested_repeated": ["0", "1", "2"],
          "nested_record": {
            "nested_nested_string": "some deep insight"
          }
        }
      }
    }
  ]
}

I can verify that this works with a SQL query.

#standardSql
SELECT * FROM `swast-scratch.hello_world.nested` LIMIT 1000

image

It is not clear from the docs how to do a similar call from our client libraries (and it likely isn't possible).

@tswast tswast added api: bigquery Issues related to the BigQuery API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Jan 20, 2017
@Ngg971
Copy link

Ngg971 commented Jan 31, 2017

I'm having the same issue with the BigQuery library. It would be much easier if we could use an actual dictionary.

I tried using sub-tuples (with parentheses), brackets, etc. but it keeps returning invalid_argument: 'Array specified for non-repeated field.

@Ngg971
Copy link

Ngg971 commented Feb 1, 2017

I was able to insert nested data by wrapping the RECORD (nested) field in a dict() object.

For example if I have the following schema:

    SCHEMA_NAME = [
        bigquery.table.SchemaField('some_filename', 'STRING', mode='REQUIRED'),
        bigquery.table.SchemaField('some_timestamp', 'TIMESTAMP', mode='REQUIRED'),
        bigquery.table.SchemaField('some_file_info', 'RECORD', mode='REQUIRED', fields = [
            bigquery.table.SchemaField('some_number', 'INTEGER', mode='REQUIRED'),
            bigquery.table.SchemaField('some_other_number', 'INTEGER', mode='NULLABLE'),
            bigquery.table.SchemaField('some_float', 'FLOAT', mode='NULLABLE')])
    ]

The following data structure works:

    ROWS_TO_INSERT = [
    ("this_is_my_filename", datetime.datetime.utcnow(),\
        dict(some_number = 455, some_other_number= 6, some_float= 5.11))
    ]

@tswast
Copy link
Contributor Author

tswast commented Feb 9, 2017

Thanks @Ngg971 for following up. I think there still might be a bug here. I think it's likely that timestamp fields within the nested records might not be sent over the wire correctly by using a dict like this, but I might be wrong.

We need a system test to verify.

@danoscarmike danoscarmike added docs priority: p0 Highest priority. Critical issue. P0 implies highest priority. release blocking Required feature/issue must be fixed prior to next release. labels Feb 28, 2017
@tseaver
Copy link
Contributor

tseaver commented Mar 20, 2017

@tswast Like the back-end API, we do not flatten nested records into tuples, but preserve them as dicts. PR #3171 adds a system test showing how to insert your original example.

richkadel pushed a commit to richkadel/google-cloud-python that referenced this issue May 6, 2017
@tswast tswast marked this as a duplicate of #3695 Jul 28, 2017
@melissachang
Copy link

I also see this problem. I'm confused about the status of this bug. This is a dupe of 3695; and 3695 is a dupe of this. Should one of these bugs be open?

@chemelnucfin
Copy link
Contributor

@melissachang It seems like the other one was closed because it was a duplicate. This one seemed to have been fixed with PR #3171. Could you try following the example given in that PR and see if you still see the problem?

@melissachang
Copy link

That PR only adds tests, so it couldn't have fixed the problem.

@chemelnucfin
Copy link
Contributor

Sorry, I meant that the PR shows an example which solved the OP's problem which may also fix yours. Please do keep us posted. Thanks.

@chemelnucfin
Copy link
Contributor

@melissachang Hello, were you able to resolve your issue following the example?

@melissachang
Copy link

Hi, I don't remember where I saw this problem and how I worked around it, sorry.

@chemelnucfin
Copy link
Contributor

No problem, glad it worked out.

parthea pushed a commit that referenced this issue Jun 19, 2024
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.7 to 2.2.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>2.2.2</h2>
<h2>🚀 urllib3 is fundraising for HTTP/2 support</h2>
<p><a
href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3
is raising ~$40,000 USD</a> to release HTTP/2 support and ensure
long-term sustainable maintenance of the project after a sharp decline
in financial support for 2023. If your company or organization uses
Python and would benefit from HTTP/2 support in Requests, pip, cloud
SDKs, and thousands of other projects <a
href="https://opencollective.com/urllib3">please consider contributing
financially</a> to ensure HTTP/2 support is developed sustainably and
maintained for the long-haul.</p>
<p>Thank you for your support.</p>
<h2>Changes</h2>
<ul>
<li>Added the <code>Proxy-Authorization</code> header to the list of
headers to strip from requests when redirecting to a different host. As
before, different headers can be set via
<code>Retry.remove_headers_on_redirect</code>.</li>
<li>Allowed passing negative integers as <code>amt</code> to read
methods of <code>http.client.HTTPResponse</code> as an alternative to
<code>None</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3122">#3122</a>)</li>
<li>Fixed return types representing copying actions to use
<code>typing.Self</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3363">#3363</a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2">https://github.com/urllib3/urllib3/compare/2.2.1...2.2.2</a></p>
<h2>2.2.1</h2>
<h2>🚀 urllib3 is fundraising for HTTP/2 support</h2>
<p><a
href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3
is raising ~$40,000 USD</a> to release HTTP/2 support and ensure
long-term sustainable maintenance of the project after a sharp decline
in financial support for 2023. If your company or organization uses
Python and would benefit from HTTP/2 support in Requests, pip, cloud
SDKs, and thousands of other projects <a
href="https://opencollective.com/urllib3">please consider contributing
financially</a> to ensure HTTP/2 support is developed sustainably and
maintained for the long-haul.</p>
<p>Thank you for your support.</p>
<h2>Changes</h2>
<ul>
<li>Fixed issue where <code>InsecureRequestWarning</code> was emitted
for HTTPS connections when using Emscripten. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3331">#3331</a>)</li>
<li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically
casting non-proxy headers to <code>HTTPHeaderDict</code>. This change
was premature as it did not apply to proxy headers and
<code>HTTPHeaderDict</code> does not handle byte header values correctly
yet. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3343">#3343</a>)</li>
<li>Changed <code>ProtocolError</code> to
<code>InvalidChunkLength</code> when response terminates before the
chunk length is sent. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/2860">#2860</a>)</li>
<li>Changed <code>ProtocolError</code> to be more verbose on incomplete
reads with excess content. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3261">#3261</a>)</li>
</ul>
<h2>2.2.0</h2>
<h2>🖥️ urllib3 now works in the browser</h2>
<p>:tada: <strong>This release adds experimental support for <a
href="https://urllib3.readthedocs.io/en/stable/reference/contrib/emscripten.html">using
urllib3 in the browser with Pyodide</a>!</strong> 🎉</p>
<p>Thanks to Joe Marshall (<a
href="https://github.com/joemarshall"><code>@​joemarshall</code></a>)
for contributing this feature. This change was possible thanks to work
done in urllib3 v2.0 to detach our API from <code>http.client</code>.
Please report all bugs to the <a
href="https://github.com/urllib3/urllib3/issues">urllib3 issue
tracker</a>.</p>
<h2>🚀 urllib3 is fundraising for HTTP/2 support</h2>
<p><a
href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3
is raising ~$40,000 USD</a> to release HTTP/2 support and ensure
long-term sustainable maintenance of the project after a sharp decline
in financial support for 2023. If your company or organization uses
Python and would benefit from HTTP/2 support in Requests, pip, cloud
SDKs, and thousands of other projects <a
href="https://opencollective.com/urllib3">please consider contributing
financially</a> to ensure HTTP/2 support is developed sustainably and
maintained for the long-haul.</p>
<p>Thank you for your support.</p>
<h2>Changes</h2>
<ul>
<li>Added support for <a
href="https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html">Emscripten
and Pyodide</a>, including streaming support in cross-origin isolated
browser environments where threading is enabled. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/2951">#2951</a>)</li>
<li>Added support for <code>HTTPResponse.read1()</code> method. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3186">#3186</a>)</li>
<li>Added rudimentary support for HTTP/2. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3284">#3284</a>)</li>
<li>Fixed issue where requests against urls with trailing dots were
failing due to SSL errors
when using proxy. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/2244">#2244</a>)</li>
<li>Fixed <code>HTTPConnection.proxy_is_verified</code> and
<code>HTTPSConnection.proxy_is_verified</code> to be always set to a
boolean after connecting to a proxy. It could be <code>None</code> in
some cases previously. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3130">#3130</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>2.2.2 (2024-06-17)</h1>
<ul>
<li>Added the <code>Proxy-Authorization</code> header to the list of
headers to strip from requests when redirecting to a different host. As
before, different headers can be set via
<code>Retry.remove_headers_on_redirect</code>.</li>
<li>Allowed passing negative integers as <code>amt</code> to read
methods of <code>http.client.HTTPResponse</code> as an alternative to
<code>None</code>.
(<code>[#3122](urllib3/urllib3#3122)
&lt;https://github.com/urllib3/urllib3/issues/3122&gt;</code>__)</li>
<li>Fixed return types representing copying actions to use
<code>typing.Self</code>.
(<code>[#3363](urllib3/urllib3#3363)
&lt;https://github.com/urllib3/urllib3/issues/3363&gt;</code>__)</li>
</ul>
<h1>2.2.1 (2024-02-16)</h1>
<ul>
<li>Fixed issue where <code>InsecureRequestWarning</code> was emitted
for HTTPS connections when using Emscripten.
(<code>[#3331](urllib3/urllib3#3331)
&lt;https://github.com/urllib3/urllib3/issues/3331&gt;</code>__)</li>
<li>Fixed <code>HTTPConnectionPool.urlopen</code> to stop automatically
casting non-proxy headers to <code>HTTPHeaderDict</code>. This change
was premature as it did not apply to proxy headers and
<code>HTTPHeaderDict</code> does not handle byte header values correctly
yet. (<code>[#3343](urllib3/urllib3#3343)
&lt;https://github.com/urllib3/urllib3/issues/3343&gt;</code>__)</li>
<li>Changed <code>InvalidChunkLength</code> to
<code>ProtocolError</code> when response terminates before the chunk
length is sent.
(<code>[#2860](urllib3/urllib3#2860)
&lt;https://github.com/urllib3/urllib3/issues/2860&gt;</code>__)</li>
<li>Changed <code>ProtocolError</code> to be more verbose on incomplete
reads with excess content.
(<code>[#3261](urllib3/urllib3#3261)
&lt;https://github.com/urllib3/urllib3/issues/3261&gt;</code>__)</li>
</ul>
<h1>2.2.0 (2024-01-30)</h1>
<ul>
<li>Added support for <code>Emscripten and Pyodide
&lt;https://urllib3.readthedocs.io/en/latest/reference/contrib/emscripten.html&gt;</code><strong>,
including streaming support in cross-origin isolated browser
environments where threading is enabled.
(<code>[#2951](urllib3/urllib3#2951)
&lt;https://github.com/urllib3/urllib3/issues/2951&gt;</code></strong>)</li>
<li>Added support for <code>HTTPResponse.read1()</code> method.
(<code>[#3186](urllib3/urllib3#3186)
&lt;https://github.com/urllib3/urllib3/issues/3186&gt;</code>__)</li>
<li>Added rudimentary support for HTTP/2.
(<code>[#3284](urllib3/urllib3#3284)
&lt;https://github.com/urllib3/urllib3/issues/3284&gt;</code>__)</li>
<li>Fixed issue where requests against urls with trailing dots were
failing due to SSL errors
when using proxy.
(<code>[#2244](urllib3/urllib3#2244)
&lt;https://github.com/urllib3/urllib3/issues/2244&gt;</code>__)</li>
<li>Fixed <code>HTTPConnection.proxy_is_verified</code> and
<code>HTTPSConnection.proxy_is_verified</code>
to be always set to a boolean after connecting to a proxy. It could be
<code>None</code> in some cases previously.
(<code>[#3130](urllib3/urllib3#3130)
&lt;https://github.com/urllib3/urllib3/issues/3130&gt;</code>__)</li>
<li>Fixed an issue where <code>headers</code> passed in a request with
<code>json=</code> would be mutated
(<code>[#3203](urllib3/urllib3#3203)
&lt;https://github.com/urllib3/urllib3/issues/3203&gt;</code>__)</li>
<li>Fixed <code>HTTPSConnection.is_verified</code> to be set to
<code>False</code> when connecting
from a HTTPS proxy to an HTTP target. It was set to <code>True</code>
previously.
(<code>[#3267](urllib3/urllib3#3267)
&lt;https://github.com/urllib3/urllib3/issues/3267&gt;</code>__)</li>
<li>Fixed handling of new error message from OpenSSL 3.2.0 when
configuring an HTTP proxy as HTTPS
(<code>[#3268](urllib3/urllib3#3268)
&lt;https://github.com/urllib3/urllib3/issues/3268&gt;</code>__)</li>
<li>Fixed TLS 1.3 post-handshake auth when the server certificate
validation is disabled
(<code>[#3325](urllib3/urllib3#3325)
&lt;https://github.com/urllib3/urllib3/issues/3325&gt;</code>__)</li>
<li>Note for downstream distributors: To run integration tests, you now
need to run the tests a second
time with the <code>--integration</code> pytest flag.
(<code>[#3181](urllib3/urllib3#3181)
&lt;https://github.com/urllib3/urllib3/issues/3181&gt;</code>__)</li>
</ul>
<h1>2.1.0 (2023-11-13)</h1>
<ul>
<li>Removed support for the deprecated urllib3[secure] extra.
(<code>[#2680](urllib3/urllib3#2680)
&lt;https://github.com/urllib3/urllib3/issues/2680&gt;</code>__)</li>
<li>Removed support for the deprecated SecureTransport TLS
implementation.
(<code>[#2681](urllib3/urllib3#2681)
&lt;https://github.com/urllib3/urllib3/issues/2681&gt;</code>__)</li>
<li>Removed support for the end-of-life Python 3.7.
(<code>[#3143](urllib3/urllib3#3143)
&lt;https://github.com/urllib3/urllib3/issues/3143&gt;</code>__)</li>
<li>Allowed loading CA certificates from memory for proxies.
(<code>[#3065](urllib3/urllib3#3065)
&lt;https://github.com/urllib3/urllib3/issues/3065&gt;</code>__)</li>
<li>Fixed decoding Gzip-encoded responses which specified
<code>x-gzip</code> content-encoding.
(<code>[#3174](urllib3/urllib3#3174)
&lt;https://github.com/urllib3/urllib3/issues/3174&gt;</code>__)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/urllib3/urllib3/commit/27e2a5c5a7ab6a517252cc8dcef3ffa6ffb8f61a"><code>27e2a5c</code></a>
Release 2.2.2 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3406">#3406</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/accff72ecc2f6cf5a76d9570198a93ac7c90270e"><code>accff72</code></a>
Merge pull request from GHSA-34jh-p97f-mpxf</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/34be4a57e59eb7365bcc37d52e9f8271b5b8d0d3"><code>34be4a5</code></a>
Pin CFFI to a new release candidate instead of a Git commit (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3398">#3398</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/da410581b6b3df73da976b5ce5eb20a4bd030437"><code>da41058</code></a>
Bump browser-actions/setup-chrome from 1.6.0 to 1.7.1 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3399">#3399</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/b07a669bd970d69847801148286b726f0570b625"><code>b07a669</code></a>
Bump github/codeql-action from 2.13.4 to 3.25.6 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3396">#3396</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/b8589ec9f8c4da91511e601b632ac06af7e7c10e"><code>b8589ec</code></a>
Measure coverage with v4 of artifact actions (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3394">#3394</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/f3bdc5585111429e22c81b5fb26c3ec164d98b81"><code>f3bdc55</code></a>
Allow triggering CI manually (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3391">#3391</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/52392654b30183129cf3ec06010306f517d9c146"><code>5239265</code></a>
Fix HTTP version in debug log (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3316">#3316</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/b34619f94ece0c40e691a5aaf1304953d88089de"><code>b34619f</code></a>
Bump actions/checkout to 4.1.4 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3387">#3387</a>)</li>
<li><a
href="https://github.com/urllib3/urllib3/commit/9961d14de7c920091d42d42ed76d5d479b80064d"><code>9961d14</code></a>
Bump browser-actions/setup-chrome from 1.5.0 to 1.6.0 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3386">#3386</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/urllib3/urllib3/compare/2.0.7...2.2.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=pip&previous-version=2.0.7&new-version=2.2.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/googleapis/google-cloud-python/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p0 Highest priority. Critical issue. P0 implies highest priority. release blocking Required feature/issue must be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

7 participants