Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: Use TableListItem for table listing. #4427

Merged
merged 6 commits into from
Nov 20, 2017

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Nov 20, 2017

The table list response only includes a subset of all table properties.
This commit adds a new type to document explicitly which properties are
included, but also make it clear that this object should not be used in
place of a full Table object.

Towards #4373 (needs a similar change for dataset listing before closing that issue)

The table list response only includes a subset of all table properties.
This commit adds a new type to document explicitly which properties are
included, but also make it clear that this object should not be used in
place of a full Table object.
Copy link
Contributor

@dhermes dhermes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few design questions:

  1. How many of these @property-s are copy-pasta?
  2. I don't think the class name really conveys what it is (a read-only version of Table).
  3. Is there a reason you might be opposed to just subclassing Table or just an arbitrary decision to make a new class?
  4. Are you opposed to adding a read_only attribute / member to Table?

@@ -38,6 +38,7 @@
from google.cloud.bigquery.dataset import Dataset
from google.cloud.bigquery.dataset import DatasetReference
from google.cloud.bigquery.table import Table, _TABLE_HAS_NO_SCHEMA
from google.cloud.bigquery.table import TableListItem

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@tswast
Copy link
Contributor Author

tswast commented Nov 20, 2017

  1. How many of these @property-s are copy-pasta?

Basically all of them. I made a slight change to view_use_legacy_sql to use the type property to check if a table is a view. This is because the query property of the view is not included in the table list response, so the type property is the only way to check if the table is a view when useLegacySql is left at the default value.

  1. I don't think the class name really conveys what it is (a read-only version of Table).

It being read-only is more of a side-effect than the primary purpose of the class.

  1. Is there a reason you might be opposed to just subclassing Table or just an arbitrary decision to make a new class?

The primary purpose of TableListItem is to indicate that there is a different set of properties available on the list response than a full Table object. Notable properties that aren't there include schema and total rows (#4373).

A sub-class would allow this object to be used in any place a table is allowed, such as update or create requests, which is not something I want people to be doing (because not all properties are present).

  1. Are you opposed to adding a read_only attribute / member to Table?

I hadn't really considered it, because it wouldn't solve the problem of schema or num_rows not being populated from a list operation.

@dhermes
Copy link
Contributor

dhermes commented Nov 20, 2017

@tswast I'm just trying to avoid code duplication, because the copy-pasta will drift over time.

If you did a subclass, you could over-ride and raise an AttributeError for properties not contained in the response (or you could do it in the Table class by using self._read_only to check).

The same applies to methods like Table.update() and Table.create().

I think having a different class is probably the right move as far as "principle of least astonishment" is concerned. (Or rather, if you returned a Table that was limited, users might be surprised/astonished if they never checked table.read_only.) But I'd strongly suggest making the other class have much less code defined in it / leverage the existing stuff.

@tswast
Copy link
Contributor Author

tswast commented Nov 20, 2017

There's so little code in these properties, I'm not super concerned about drift.

@dhermes
Copy link
Contributor

dhermes commented Nov 20, 2017

There's so little code in these properties, I'm not super concerned about drift.

But I am concerned. Care to sell me a bit on your position?

@tswast
Copy link
Contributor Author

tswast commented Nov 20, 2017

Most of these properties have implementations like:

return self._properties.get('tableReference', {}).get('datasetId')

These properties directly reflect properties on the resource, with minimal modification.

The reference does have code which could drift.

from google.cloud.bigquery import dataset

dataset_ref = dataset.DatasetReference(self.project, self.dataset_id)
return TableReference(dataset_ref, self.table_id)

I agree this could be shared with the Table class, but the Table class doesn't have this property yet. I'm adding it in #4405. I could update that PR to share the code with the TableListItem class once this one goes in.

You're right that view_use_legacy_sql is code that could drift, too. I think there are probably better ways to share that code than a subclass. I'll make an update to the PR.

@dhermes
Copy link
Contributor

dhermes commented Nov 20, 2017

@tswast Though these are simple "schema->data->Python" mappings, some APIs (BQ included) have changed the underlying schema and caught us off guard. With that in mind, "so little code" doesn't inspire much confidence?

Also, I don't want to get lost in this discussion that I think TableListItem is a not great name. Other candidates? ReadOnlyTable?

@tswast
Copy link
Contributor Author

tswast commented Nov 20, 2017

Discussed in chat. Since Table and "whatever we call the resource from table list" are actually decoupled from each other in the backend, it doesn't make sense to couple them here.

@@ -713,6 +713,137 @@ def _build_resource(self, filter_fields):
return resource


class TableListItem(object):
"""Read-only table resource object with a subset of table properties.

This comment was marked as spam.

This comment was marked as spam.

# it. So a missing or None can only come from the server, whose
# default is True.
return view.get('useLegacySql', True)
view_use_legacy_sql = property(_view_use_legacy_sql_getter)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

For performance reasons, the BigQuery API only includes some of the table
properties when listing tables. Notably,
:attr:`google.cloud.bigquery.table.Table.schema` and
:attr:`google.cloud.bigquery.table.Table.num_rows` are missing.

This comment was marked as spam.


For a full list of the properties that the BigQuery API returns, see the
`REST documentation for tables.list
<https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/list>`.

This comment was marked as spam.

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented Nov 20, 2017

@tswast OK Do you feel this is merge-able?

@tswast
Copy link
Contributor Author

tswast commented Nov 20, 2017

Do you feel this is merge-able?

Yeah. Just ran the full nox suite locally to double-check.

@dhermes
Copy link
Contributor

dhermes commented Nov 20, 2017

Cool LGTM

@tswast tswast merged commit f308bd4 into googleapis:master Nov 20, 2017
@tswast tswast deleted the bq-tablelist branch November 20, 2017 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants