Specify GCP compute project in BigQuery Pusher executor #3460

codesue · 2021-03-29T20:22:04Z

Problem: We (Twitter) typically use two different GCP projects when working with BigQuery: one for compute (where the query is executed) and one for storage (where the table is located). With the current BigQuery Pusher executor, we have to use bigquery_serving_args[_PROJECT_ID_KEY] for both compute and storage, so our jobs fail due to permissions issues.

Proposed solution: Add support for specifying a compute project id that is different from the storage project id. It would be nice to have such a change backported to a 0.26.x version.

casassg · 2021-03-31T23:25:00Z

CC @rcrowe-google this would be important for us. Our set up has separate projects for query execution and model storage for BQ.

casassg · 2021-04-12T23:51:37Z

Re-upping this

SinaChavoshi

Hi Suzen,
Thank you so much for your contribution. Implementation looks great. I have one request, do you mind updating the documentation accordingly, see my comment below.

SinaChavoshi · 2021-04-14T18:34:36Z

tfx/extensions/google_cloud_big_query/pusher/executor.py

@@ -41,6 +41,9 @@
 _BQ_DATASET_ID_KEY = 'bq_dataset_id'
 _MODEL_NAME_KEY = 'model_name'

+# Project where query will be executed
+_COMPUTE_PROJECT_ID_KEY = 'compute_project_id'


Could you update the docstring (line 74 see below) to include a description of what this configuration is, what is the default behaviour and possibly an example of how this configuration can be used if the user has two projects per your usecase.

... def Do(self, input_dict: Dict[Text, List[types.Artifact]], output_dict: Dict[Text, List[types.Artifact]], exec_properties: Dict[Text, Any]): """Overrides the tfx_pusher_executor. Args: ... exec_properties: Mostly a passthrough input dict for tfx.components.Pusher.executor. custom_config.bigquery_serving_args is consumed by this class. For the full set of parameters supported by Big Query ML, refer to https://cloud.google.com/bigquery-ml/

SinaChavoshi · 2021-04-14T18:42:41Z

Could you also address the lint issues raised in presubmit

************* Module tfx.extensions.google_cloud_big_query.pusher.executor
tfx/extensions/google_cloud_big_query/pusher/executor.py:100:41: W1116: Second argument of isinstance is not a type (isinstance-second-argument-not-valid-type)
tfx/extensions/google_cloud_big_query/pusher/executor.py:147:6: W0707: Consider explicitly re-raising using the 'from' keyword (raise-missing-from)

************* Module tfx.extensions.google_cloud_big_query.pusher.executor_test
tfx/extensions/google_cloud_big_query/pusher/executor_test.py:38:4: R1725: Consider using Python 3 style super() without arguments (super-with-arguments)

…b.com/codesue/tfx into codesue/specify-gcp-project-bq-pusher

codesue · 2021-04-21T03:09:44Z

The error Second argument of isinstance is not a type (isinstance-second-argument-not-valid-type) was coming from isinstance(custom_config, Dict). This was an issue in pylint: pylint-dev/pylint#3507.

codesue · 2021-04-29T15:21:29Z

@SinaChavoshi or @1025KB, may I have another review of this? 😄

zhitaoli · 2021-05-05T18:17:47Z

Could you also address the lint issues raised in presubmit

************* Module tfx.extensions.google_cloud_big_query.pusher.executor
tfx/extensions/google_cloud_big_query/pusher/executor.py:100:41: W1116: Second argument of isinstance is not a type (isinstance-second-argument-not-valid-type)
tfx/extensions/google_cloud_big_query/pusher/executor.py:147:6: W0707: Consider explicitly re-raising using the 'from' keyword (raise-missing-from)

************* Module tfx.extensions.google_cloud_big_query.pusher.executor_test
tfx/extensions/google_cloud_big_query/pusher/executor_test.py:38:4: R1725: Consider using Python 3 style super() without arguments (super-with-arguments)

Don't worry too much lint, we can address it while importing.

…pusher PiperOrigin-RevId: 372169256

…elp dependency resolution in Colab. Specify GCP compute project in BigQuery Pusher executor. (#3703) * Update RELEASE.md * Update version.py * Update version.py * Reduce google-cloud-bigquery version range to help dependency resolution in Colab. With the old version of google-cloud-bigquery(==1.21 which is pre-installed in Colab), pip resolver cannot find proper set of dependencies. PiperOrigin-RevId: 372648960 * Merge pull request #3460 from codesue:codesue/specify-gcp-project-bq-pusher PiperOrigin-RevId: 372730386 Co-authored-by: jiyongjung <jiyongjung@google.com> Co-authored-by: tensorflow-extended-team <tensorflow-extended-nonhuman@googlegroups.com>

specify GCP compute project in BigQuery Pusher executor

060d837

google-cla bot added the cla: yes label Mar 29, 2021

rcrowe-google added the Twitter Issues from the Twitter team label Apr 4, 2021

zhitaoli requested a review from 1025KB April 5, 2021 22:02

SinaChavoshi reviewed Apr 14, 2021

View reviewed changes

Suzen Fylke added 4 commits April 20, 2021 22:36

specify GCP compute project in BigQuery Pusher executor

5f865c2

Merge branch 'codesue/specify-gcp-project-bq-pusher' of https://githu…

972f735

…b.com/codesue/tfx into codesue/specify-gcp-project-bq-pusher

Merge branch 'codesue/specify-gcp-project-bq-pusher' of https://githu…

75a55a4

…b.com/codesue/tfx into codesue/specify-gcp-project-bq-pusher

Merge branch 'codesue/specify-gcp-project-bq-pusher' of https://githu…

06381ef

…b.com/codesue/tfx into codesue/specify-gcp-project-bq-pusher

zhitaoli self-requested a review May 5, 2021 18:22

zhitaoli approved these changes May 5, 2021

View reviewed changes

zhitaoli removed the request for review from 1025KB May 5, 2021 18:25

copybara-service bot pushed a commit that referenced this pull request May 8, 2021

Merge pull request #3460 from codesue:codesue/specify-gcp-project-bq-…

3c8c72d

…pusher PiperOrigin-RevId: 372169256

copybara-service bot mentioned this pull request May 8, 2021

PR #3460: Specify GCP compute project in BigQuery Pusher executor #3701

Merged

copybara-service bot merged commit b26dae9 into tensorflow:master May 8, 2021

codesue deleted the codesue/specify-gcp-project-bq-pusher branch May 8, 2021 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify GCP compute project in BigQuery Pusher executor #3460

Specify GCP compute project in BigQuery Pusher executor #3460

codesue commented Mar 29, 2021

casassg commented Mar 31, 2021

casassg commented Apr 12, 2021

SinaChavoshi left a comment

SinaChavoshi Apr 14, 2021

SinaChavoshi commented Apr 14, 2021

codesue commented Apr 21, 2021

codesue commented Apr 29, 2021 •

edited

Loading

zhitaoli commented May 5, 2021

Specify GCP compute project in BigQuery Pusher executor #3460

Specify GCP compute project in BigQuery Pusher executor #3460

Conversation

codesue commented Mar 29, 2021

casassg commented Mar 31, 2021

casassg commented Apr 12, 2021

SinaChavoshi left a comment

Choose a reason for hiding this comment

SinaChavoshi Apr 14, 2021

Choose a reason for hiding this comment

SinaChavoshi commented Apr 14, 2021

codesue commented Apr 21, 2021

codesue commented Apr 29, 2021 • edited Loading

zhitaoli commented May 5, 2021

codesue commented Apr 29, 2021 •

edited

Loading