-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify GCP compute project in BigQuery Pusher executor #3460
Specify GCP compute project in BigQuery Pusher executor #3460
Conversation
CC @rcrowe-google this would be important for us. Our set up has separate projects for query execution and model storage for BQ. |
Re-upping this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Suzen,
Thank you so much for your contribution. Implementation looks great. I have one request, do you mind updating the documentation accordingly, see my comment below.
@@ -41,6 +41,9 @@ | |||
_BQ_DATASET_ID_KEY = 'bq_dataset_id' | |||
_MODEL_NAME_KEY = 'model_name' | |||
|
|||
# Project where query will be executed | |||
_COMPUTE_PROJECT_ID_KEY = 'compute_project_id' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the docstring (line 74 see below) to include a description of what this configuration is, what is the default behaviour and possibly an example of how this configuration can be used if the user has two projects per your usecase.
...
def Do(self, input_dict: Dict[Text, List[types.Artifact]],
output_dict: Dict[Text, List[types.Artifact]],
exec_properties: Dict[Text, Any]):
"""Overrides the tfx_pusher_executor.
Args:
...
exec_properties: Mostly a passthrough input dict for
tfx.components.Pusher.executor. custom_config.bigquery_serving_args is
consumed by this class. For the full set of parameters supported by
Big Query ML, refer to https://cloud.google.com/bigquery-ml/
Could you also address the lint issues raised in presubmit
|
…b.com/codesue/tfx into codesue/specify-gcp-project-bq-pusher
…b.com/codesue/tfx into codesue/specify-gcp-project-bq-pusher
…b.com/codesue/tfx into codesue/specify-gcp-project-bq-pusher
The error |
@SinaChavoshi or @1025KB, may I have another review of this? 😄 |
Don't worry too much lint, we can address it while importing. |
…pusher PiperOrigin-RevId: 372169256
…elp dependency resolution in Colab. Specify GCP compute project in BigQuery Pusher executor. (#3703) * Update RELEASE.md * Update version.py * Update version.py * Reduce google-cloud-bigquery version range to help dependency resolution in Colab. With the old version of google-cloud-bigquery(==1.21 which is pre-installed in Colab), pip resolver cannot find proper set of dependencies. PiperOrigin-RevId: 372648960 * Merge pull request #3460 from codesue:codesue/specify-gcp-project-bq-pusher PiperOrigin-RevId: 372730386 Co-authored-by: jiyongjung <jiyongjung@google.com> Co-authored-by: tensorflow-extended-team <tensorflow-extended-nonhuman@googlegroups.com>
Problem: We (Twitter) typically use two different GCP projects when working with BigQuery: one for compute (where the query is executed) and one for storage (where the table is located). With the current BigQuery Pusher executor, we have to use
bigquery_serving_args[_PROJECT_ID_KEY]
for both compute and storage, so our jobs fail due to permissions issues.Proposed solution: Add support for specifying a compute project id that is different from the storage project id. It would be nice to have such a change backported to a 0.26.x version.