AutoML: Add a TablesClient for automl-tables specific behavior. #8720

lwander · 2019-07-19T20:42:42Z

AutoML tables is heavily used by datascientists in jupyter notebooks. Based on feedback we've gotten, the existing client is hard to use, obtuse, not notebook-friendly. This wrapper client aims to fix that.

Using the client (see integration tests for more detailed snippets):

from google.cloud import automl_v1beta1

client = automl_v1beta1.TablesClient(project='my-project-123')

client.list_datasets()

dataset = client.create_dataset('my_dataset')
client.import_data(dataset=dataset, gcs_input_uris='gs://my-data/data.csv').result()
client.set_target_column(dataset=dataset, column_spec_display_name='output')

model = client.create_model(dataset=dataset, train_budget_milli_node_hours=1000).result()

Testing

Most testing is done in unit tests, however we have some integration tests covering the common user flows as well (create dataset -> import data -> train -> predict). To speed up the testing, we create resources the first time tests are run, and run tests off of those in subsequent runs. This mitigates the minimum 1h+ model creation & deployment time.

Future work

In no particular order, we plan to add:

IPython integration to pretty print objects
Matplotlib integration to show dataset & model stats
Pandas/numpy integration for data import & batch predict

Sorry for the large change, most of it is doc strings and unit tests. We were recommended to check in the full basic functionality first.

Additional test, docs & proposed cleanup needs to happen on top of this.

#16) * update create_model to allow user to specify included or excluded columns * made minor changes stylistically and with added ValueError outputs

* added two new func: set time, get table address * changed indentation

googlebot · 2019-07-19T20:42:54Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

lwander · 2019-07-19T20:43:29Z

@jonathan1920 you need to sign the CLA, or resign the commits with the correct @google.com address

beccasaurus · 2019-07-19T21:00:27Z

@merla18 Please take a look.

At a high level, it looks like this custom layer does the following:

Instantiate client with Project ID so the client knows the Project ID
Treat the resources' Display Names as a shorthand identifier
CRUD operations can accept any of the following to identify a resource:
- Full name (projects/X/locations/Y/resource_type/ID)
- The resource's ID only (project ID stored on client)
- The resource's Display Name
- Python pb object representing the resource, itself
This is branded for Tables but most of the functionality would work across the board for AutoML
- The identifier shorthand functionality would also work for any resourceful GCP API

Project ID on client

IIRC there have been proposals over the years for storing project on the client.

As this manual layer implements this feature, I'm wondering: is this a feature that has any active efforts for GAX/core?

Calling RPC methods with shorthand for resource names/identifiers

Are there any other Python Google Cloud Client Libraries that implement something like this?

I'm a little worried about consistency. When learning GCP APIs, part of the learning curve is resources (their "names" which are fully qualified identifiers -vs- display names which many resources have -vs- identifiers which are often server-side generated but sometimes client provided).

This layer simplifies working with resources a lot, it looks very elegant. But learning this interface might confuse developers when they use this interface and then try using any of the other GCP API Python Cloud Client Libraries.

Looks elegant! I'd like to find out more, though. Especially because this layer simplifies working with GCP API resources in general, which I love.

/cc @sirtorry @andrewferlitsch @dizcology @lukesneeringer @crwilcox @JustinBeckwith

andrewferlitsch · 2019-07-19T22:11:56Z

Rebecca What's most notable to me on the AutoML APIs is that they are designed in a procedural style instead of an OOP style. They lack abstraction, encapsulation and polymorphism (like Keras vs pre-TF 2.0 --hence why TF 2.0 adopted Keras for the model API). Looking at the above bullets, first thoughts are: - things like project ID, etc are attributes of the client object. (encapsulation) - shorthand would be attribute aliases. (abstraction) - CRUD ops sound like overloaded methods (polymorphism).

…

On Fri, Jul 19, 2019 at 2:00 PM Rebecca Taylor ***@***.***> wrote: @merla18 <https://github.com/merla18> Please take a look. At a high level, it looks like this custom layer does the following: - Instantiate client with Project ID so the client knows the Project ID - Treat the resources' Display Names as a shorthand identifier - CRUD operations can accept any of the following to identify a resource: - Full name (projects/X/locations/Y/resource_type/*ID*) - The resource's ID only (project ID stored on client) - The resource's Display Name - Python pb object representing the resource, itself - This is branded for Tables but most of the functionality would work across the board for AutoML - The identifier shorthand functionality would also work for any resourceful GCP API Project ID on client IIRC there have been proposals over the years for storing project on the client. As this manual layer implements this feature, I'm wondering: is this a feature that has any active efforts for GAX/core? Calling RPC methods with shorthand for resource names/identifiers Are there any other Python Google Cloud Client Libraries that implement something like this? I'm a little worried about consistency. When learning GCP APIs, part of the learning curve is resources (their "names" which are fully qualified identifiers -vs- display names which many resources have -vs- identifiers which are often server-side generated but sometimes client provided). This layer simplifies working with resources a lot, it looks very elegant. But learning this interface might confuse developers when they use this interface and then try using any of the other GCP API Python Cloud Client Libraries. Looks elegant! I'd like to find out more, though. Especially because this layer simplifies working with GCP API resources *in general*, which I love. ------------------------------ /cc @sirtorry <https://github.com/sirtorry> @andrewferlitsch <https://github.com/andrewferlitsch> @dizcology <https://github.com/dizcology> @lukesneeringer <https://github.com/lukesneeringer> @crwilcox <https://github.com/crwilcox> @JustinBeckwith <https://github.com/JustinBeckwith> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8720?email_source=notifications&email_token=AFOVC2GVPDOBZ7JNVMKST53QAITP3A5CNFSM4IFLYT2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MYBVQ#issuecomment-513376470>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFOVC2AWRXQIYPKDSFPRGPDQAITP3ANCNFSM4IFLYT2A> .

andrewferlitsch · 2019-07-19T22:17:28Z

@beccasaurus I recommend we do a review and see the best "wrapper" approach that can be applied uniformly across all the APIs.

For example, it maybe better to provide an attribute to a generic Client type that would indicate the type of dataset/model (Tables, etc) vs. subclassing (i.e., TablesClient is a subclass of Client).

lwander · 2019-07-20T12:52:40Z

That's a great point:

I'm a little worried about consistency. When learning GCP APIs, part of the learning curve is resources (their "names" which are fully qualified identifiers

The real difficultly in using the original client is that a lot of these identifiers are hidden behind multiple API calls & string transformations. For example, to update a column (a necessary and required step), you need to know first the fully-qualified name of the dataset, use that to lookup the fully-qualified name of the primary table, and then search the primary table for the column in question. At that point, you need to extract the API-assigned identifier from the column's fully qualified name (something of the form TBL123...). At this point you're ready to update the column. This doesn't even get into the odd behavior you see where certain properties of the column are required in the update, and others are optional.

This API works great when you're a browser & always have the entire state of all your resources in memory. However, when you're just getting started in a jupyter notebook and want to quickly run some experiments using autoML, you're out of luck.

Would the "correct" solution be to rewrite all the APIs with both client users & browsers in mind? Maybe. But that doesn't cover the future work we want to implement, namely integration with pandas, integration with matplotlib, and integration with IPython.

On top of that, we have complaints from both internal users who are experts in GCP, as well as users who are brand new to the platform (and these complaints don't even address the lack of the above mentioned enhancements). Overall, there is just too much complexity in the client to perform very simple operations.

In the end, the tables team wants a polished, first-class, data scientist-friendly client, as this is what our users are asking for. Consequently, we're ready to put in the effort to maintain this client.

googlebot · 2019-07-20T12:52:56Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

automl/README.rst

automl/google/cloud/automl_v1beta1/__init__.py

automl/tests/unit/gapic/v1beta1/test_tables_client_v1beta1.py

lwander · 2019-08-01T17:23:38Z

Updated

beccasaurus · 2019-08-01T17:37:27Z

✅ Green light from my end. We followed up about Project ID / Resource Objects offline.

I'm +1 to merge this when it's considered ready.

@busunkim96 Could you add @sirtorry as a reviewer?

busunkim96 · 2019-08-01T19:45:07Z

I opened a PR on sloth to add @sirtorry to yoshi-python. You should be able to add comments for now.

sirtorry

tests lgtm

automl/tests/system/gapic/v1beta1/test_system_tables_client_v1.py

lwander · 2019-08-08T18:32:59Z

I've made the list system tests more stringent. Anything blocking us from merging?

lwander · 2019-08-14T07:17:25Z

Friendly ping

busunkim96 · 2019-08-14T17:31:04Z

@sirtorry Please merge it when you're ready. You should have write permissions to the repo.

lwander · 2019-08-15T05:34:40Z

Thanks all! 🙂

…leapis#8720) * Checking in staged client helper code Additional test, docs & proposed cleanup needs to happen on top of this. * update create_model to allow user to specify included or excluded col… (#16) * update create_model to allow user to specify included or excluded columns * made minor changes stylistically and with added ValueError outputs * Update doc gen & module structure. Add unit & system tests * added two new func: set time, get table address (#23) * added two new func: set time, get table address * changed indentation * Add system tests * Address linter & python2.7 import errors * Passes **kwargs through to client & implements missing methods * Support BQ as input/output in batch_predict * Address first round of feedback * Switch to pytest.raises, fix .rst formatting exception * Make list system tests more stringent

* Checking in staged client helper code Additional test, docs & proposed cleanup needs to happen on top of this. * update create_model to allow user to specify included or excluded col… (#16) * update create_model to allow user to specify included or excluded columns * made minor changes stylistically and with added ValueError outputs * Update doc gen & module structure. Add unit & system tests * added two new func: set time, get table address (#23) * added two new func: set time, get table address * changed indentation * Add system tests * Address linter & python2.7 import errors * Passes **kwargs through to client & implements missing methods * Support BQ as input/output in batch_predict * Address first round of feedback * Switch to pytest.raises, fix .rst formatting exception * Make list system tests more stringent

Lars Wander and others added 5 commits July 19, 2019 16:32

Checking in staged client helper code

162fcbc

Additional test, docs & proposed cleanup needs to happen on top of this.

update create_model to allow user to specify included or excluded col… (

c45eea1

#16) * update create_model to allow user to specify included or excluded columns * made minor changes stylistically and with added ValueError outputs

Update doc gen & module structure. Add unit & system tests

be43449

added two new func: set time, get table address (#23)

e19f141

* added two new func: set time, get table address * changed indentation

Add system tests

a741c3f

lwander requested a review from busunkim96 as a code owner July 19, 2019 20:42

googlebot added the cla: no This human has *not* signed the Contributor License Agreement. label Jul 19, 2019

busunkim96 added the api: automl Issues related to the AutoML API. label Jul 19, 2019

busunkim96 changed the title ~~Adds a TablesClient for automl-tables specific behavior.~~ AutoML: Adds a TablesClient for automl-tables specific behavior. Jul 19, 2019

busunkim96 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 19, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 19, 2019

googlebot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Jul 20, 2019

Lars Wander added 2 commits July 22, 2019 09:28

Address linter & python2.7 import errors

b7d951e

Passes **kwargs through to client & implements missing methods

83e6d50

busunkim96 reviewed Jul 22, 2019

View reviewed changes

automl/README.rst Outdated Show resolved Hide resolved

automl/google/cloud/automl_v1beta1/__init__.py Show resolved Hide resolved

tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 23, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 23, 2019

Support BQ as input/output in batch_predict

3dc7975

tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 29, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 29, 2019

sirtorry mentioned this pull request Aug 1, 2019

add product recommendation for automl tables notebook GoogleCloudPlatform/python-docs-samples#2257

Merged

Address first round of feedback

b564d96

tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

busunkim96 reviewed Aug 1, 2019

View reviewed changes

automl/tests/unit/gapic/v1beta1/test_tables_client_v1beta1.py Outdated Show resolved Hide resolved

Switch to pytest.raises, fix .rst formatting exception

4c85d74

tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

busunkim96 mentioned this pull request Aug 1, 2019

Add sirtorry to yoshi-python googleapis/sloth#305

Merged

busunkim96 requested a review from sirtorry August 1, 2019 20:39

sirtorry approved these changes Aug 2, 2019

View reviewed changes

automl/tests/system/gapic/v1beta1/test_system_tables_client_v1.py Outdated Show resolved Hide resolved

tseaver changed the title ~~AutoML: Adds a TablesClient for automl-tables specific behavior.~~ AutoML: Add a TablesClient for automl-tables specific behavior. Aug 7, 2019

Make list system tests more stringent

9587150

busunkim96 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 8, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 8, 2019

busunkim96 requested a review from sirtorry August 14, 2019 17:31

sirtorry merged commit e9ea9ed into googleapis:master Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML: Add a TablesClient for automl-tables specific behavior. #8720

AutoML: Add a TablesClient for automl-tables specific behavior. #8720

lwander commented Jul 19, 2019

googlebot commented Jul 19, 2019

lwander commented Jul 19, 2019

beccasaurus commented Jul 19, 2019

andrewferlitsch commented Jul 19, 2019 via email

andrewferlitsch commented Jul 19, 2019

lwander commented Jul 20, 2019

googlebot commented Jul 20, 2019

lwander commented Aug 1, 2019

beccasaurus commented Aug 1, 2019

busunkim96 commented Aug 1, 2019

sirtorry left a comment •

edited

Loading

lwander commented Aug 8, 2019

lwander commented Aug 14, 2019

busunkim96 commented Aug 14, 2019

lwander commented Aug 15, 2019

AutoML: Add a TablesClient for automl-tables specific behavior. #8720

AutoML: Add a TablesClient for automl-tables specific behavior. #8720

Conversation

lwander commented Jul 19, 2019

Using the client (see integration tests for more detailed snippets):

Testing

Future work

googlebot commented Jul 19, 2019

lwander commented Jul 19, 2019

beccasaurus commented Jul 19, 2019

Project ID on client

Calling RPC methods with shorthand for resource names/identifiers

andrewferlitsch commented Jul 19, 2019 via email

andrewferlitsch commented Jul 19, 2019

lwander commented Jul 20, 2019

googlebot commented Jul 20, 2019

lwander commented Aug 1, 2019

beccasaurus commented Aug 1, 2019

busunkim96 commented Aug 1, 2019

sirtorry left a comment • edited Loading

Choose a reason for hiding this comment

lwander commented Aug 8, 2019

lwander commented Aug 14, 2019

busunkim96 commented Aug 14, 2019

lwander commented Aug 15, 2019

sirtorry left a comment •

edited

Loading