Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify GCP compute project in BigQuery Pusher executor #3460

Merged
merged 5 commits into from
May 8, 2021
Merged

Specify GCP compute project in BigQuery Pusher executor #3460

merged 5 commits into from
May 8, 2021

Conversation

codesue
Copy link
Contributor

@codesue codesue commented Mar 29, 2021

Problem: We (Twitter) typically use two different GCP projects when working with BigQuery: one for compute (where the query is executed) and one for storage (where the table is located). With the current BigQuery Pusher executor, we have to use bigquery_serving_args[_PROJECT_ID_KEY] for both compute and storage, so our jobs fail due to permissions issues.

Proposed solution: Add support for specifying a compute project id that is different from the storage project id. It would be nice to have such a change backported to a 0.26.x version.

@google-cla google-cla bot added the cla: yes label Mar 29, 2021
@casassg
Copy link
Member

casassg commented Mar 31, 2021

CC @rcrowe-google this would be important for us. Our set up has separate projects for query execution and model storage for BQ.

@rcrowe-google rcrowe-google added the Twitter Issues from the Twitter team label Apr 4, 2021
@zhitaoli zhitaoli requested a review from 1025KB April 5, 2021 22:02
@casassg
Copy link
Member

casassg commented Apr 12, 2021

Re-upping this

Copy link

@SinaChavoshi SinaChavoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Suzen,
Thank you so much for your contribution. Implementation looks great. I have one request, do you mind updating the documentation accordingly, see my comment below.

@@ -41,6 +41,9 @@
_BQ_DATASET_ID_KEY = 'bq_dataset_id'
_MODEL_NAME_KEY = 'model_name'

# Project where query will be executed
_COMPUTE_PROJECT_ID_KEY = 'compute_project_id'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the docstring (line 74 see below) to include a description of what this configuration is, what is the default behaviour and possibly an example of how this configuration can be used if the user has two projects per your usecase.

...
  def Do(self, input_dict: Dict[Text, List[types.Artifact]],
         output_dict: Dict[Text, List[types.Artifact]],
         exec_properties: Dict[Text, Any]):
    """Overrides the tfx_pusher_executor.
    Args:
    ...
      exec_properties: Mostly a passthrough input dict for
        tfx.components.Pusher.executor.  custom_config.bigquery_serving_args is
        consumed by this class.  For the full set of parameters supported by
        Big Query ML, refer to https://cloud.google.com/bigquery-ml/

@SinaChavoshi
Copy link

Could you also address the lint issues raised in presubmit

************* Module tfx.extensions.google_cloud_big_query.pusher.executor
tfx/extensions/google_cloud_big_query/pusher/executor.py:100:41: W1116: Second argument of isinstance is not a type (isinstance-second-argument-not-valid-type)
tfx/extensions/google_cloud_big_query/pusher/executor.py:147:6: W0707: Consider explicitly re-raising using the 'from' keyword (raise-missing-from)

************* Module tfx.extensions.google_cloud_big_query.pusher.executor_test
tfx/extensions/google_cloud_big_query/pusher/executor_test.py:38:4: R1725: Consider using Python 3 style super() without arguments (super-with-arguments)

@codesue
Copy link
Contributor Author

codesue commented Apr 21, 2021

The error Second argument of isinstance is not a type (isinstance-second-argument-not-valid-type) was coming from isinstance(custom_config, Dict). This was an issue in pylint: pylint-dev/pylint#3507.

@codesue
Copy link
Contributor Author

codesue commented Apr 29, 2021

@SinaChavoshi or @1025KB, may I have another review of this? 😄

@zhitaoli
Copy link
Contributor

zhitaoli commented May 5, 2021

Could you also address the lint issues raised in presubmit

************* Module tfx.extensions.google_cloud_big_query.pusher.executor
tfx/extensions/google_cloud_big_query/pusher/executor.py:100:41: W1116: Second argument of isinstance is not a type (isinstance-second-argument-not-valid-type)
tfx/extensions/google_cloud_big_query/pusher/executor.py:147:6: W0707: Consider explicitly re-raising using the 'from' keyword (raise-missing-from)

************* Module tfx.extensions.google_cloud_big_query.pusher.executor_test
tfx/extensions/google_cloud_big_query/pusher/executor_test.py:38:4: R1725: Consider using Python 3 style super() without arguments (super-with-arguments)

Don't worry too much lint, we can address it while importing.

@zhitaoli zhitaoli self-requested a review May 5, 2021 18:22
@zhitaoli zhitaoli removed the request for review from 1025KB May 5, 2021 18:25
copybara-service bot pushed a commit that referenced this pull request May 8, 2021
@copybara-service copybara-service bot merged commit b26dae9 into tensorflow:master May 8, 2021
@codesue codesue deleted the codesue/specify-gcp-project-bq-pusher branch May 8, 2021 23:20
dhruvesh09 added a commit that referenced this pull request May 12, 2021
…elp dependency resolution in Colab. Specify GCP compute project in BigQuery Pusher executor. (#3703)

* Update RELEASE.md

* Update version.py

* Update version.py

* Reduce google-cloud-bigquery version range to help dependency resolution in Colab.

With the old version of google-cloud-bigquery(==1.21 which is pre-installed in Colab), pip resolver cannot find proper set of dependencies.

PiperOrigin-RevId: 372648960

* Merge pull request #3460 from codesue:codesue/specify-gcp-project-bq-pusher

PiperOrigin-RevId: 372730386

Co-authored-by: jiyongjung <jiyongjung@google.com>
Co-authored-by: tensorflow-extended-team <tensorflow-extended-nonhuman@googlegroups.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Twitter Issues from the Twitter team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants