-
Notifications
You must be signed in to change notification settings - Fork 85
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add demo for run_models experimental method.
PiperOrigin-RevId: 388982644
- Loading branch information
1 parent
cb4108f
commit 4922b2b
Showing
2 changed files
with
343 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
341 changes: 341 additions & 0 deletions
341
...orials/experimental/running_vision_models_from_tf_model_garden_on_gcp_with_tf_cloud.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,341 @@ | ||
{ | ||
"nbformat": 4, | ||
"nbformat_minor": 0, | ||
"metadata": { | ||
"colab": { | ||
"name": "Running vision models from TF Model Garden on GCP with TF Cloud", | ||
"provenance": [], | ||
"collapsed_sections": [], | ||
"toc_visible": true | ||
}, | ||
"kernelspec": { | ||
"name": "python3", | ||
"display_name": "Python 3" | ||
}, | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "ApxORpbFShVP" | ||
}, | ||
"source": [ | ||
"##### Copyright 2021 The TensorFlow Cloud Authors." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "eR70XKMMmC8I", | ||
"cellView": "form" | ||
}, | ||
"source": [ | ||
"#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", | ||
"# you may not use this file except in compliance with the License.\n", | ||
"# You may obtain a copy of the License at\n", | ||
"#\n", | ||
"# https://www.apache.org/licenses/LICENSE-2.0\n", | ||
"#\n", | ||
"# Unless required by applicable law or agreed to in writing, software\n", | ||
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n", | ||
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", | ||
"# See the License for the specific language governing permissions and\n", | ||
"# limitations under the License." | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "wKcTRRxsAmDl" | ||
}, | ||
"source": [ | ||
"# Running vision models from TF Model Garden on GCP with TF Cloud\n", | ||
"\n", | ||
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n", | ||
" <td>\n", | ||
" <!-- <a target=\"_blank\" href=\"https://www.tensorflow.org/cloud/tutorials/overview.ipynb\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a> MSSING HREF -->\n", | ||
" </td>\n", | ||
" <td>\n", | ||
" <!-- <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/cloud/blob/master/g3doc/tutorials/overview.ipynb\"\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a> MSSING HREF -->\n", | ||
" </td>\n", | ||
" <td>\n", | ||
" <!-- <a target=\"_blank\" href=\"https://github.com/tensorflow/cloud/blob/master/g3doc/tutorials/overview.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View on GitHub</a> MSSING HREF -->\n", | ||
" </td>\n", | ||
" <td>\n", | ||
" <!-- <a href=\"https://storage.googleapis.com/tensorflow_docs/cloud/tutorials/overview.ipynb\"><img src=\"https://www.tensorflow.org/images/download_logo_32px.png\" />Download notebook</a> MSSING HREF -->\n", | ||
" </td>\n", | ||
" <td>\n", | ||
" <!-- <a href=\"https://kaggle.com/kernels/welcome?src=https://github.com/tensorflow/cloud/blob/master/g3doc/tutorials/overview.ipynb\" target=\"blank\"> <img width=\"90\" src=\"https://www.kaggle.com/static/images/site-logo.png\" alt=\"Kaggle logo\" />Run in Kaggle</a>MSSING HREF MSSING HREF -->\n", | ||
" </td>\n", | ||
"</table>" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "FAUbwFuJB3bw" | ||
}, | ||
"source": [ | ||
"In this example we will use [run_models](https://github.com/tensorflow/cloud/blob/690c3eee65dadee8af260a19341ff23f42f1f070/src/python/tensorflow_cloud/core/experimental/models.py#L30) from the experimental module of TF Cloud to train a ResNet model from [TF Model Garden](https://github.com/tensorflow/models/tree/master/official) on [imagenette from TFDS](https://www.tensorflow.org/datasets/catalog/imagenette)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "EFCSAVDbC8-W" | ||
}, | ||
"source": [ | ||
"## Install Packages\n", | ||
"\n", | ||
"We need the nightly version of tensorflow-cloud that we can get from github, the official release of tf-models-official, and keras 2.6.0rc0 for compatibility." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "r4sSs1azu-Ti" | ||
}, | ||
"source": [ | ||
"!pip install -q 'git+https://github.com/tensorflow/cloud.git@refs/pull/352/head#egg=tensorflow-cloud&subdirectory=src/python' tf-models-official keras==2.6.0rc0" | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "N3NC5vrDslsf" | ||
}, | ||
"source": [ | ||
"## Import required modules" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "sdkgm_6PvHkk", | ||
"colab": { | ||
"base_uri": "https://localhost:8080/" | ||
}, | ||
"outputId": "c17384b4-07f1-493c-edf8-5eadde79524f" | ||
}, | ||
"source": [ | ||
"import os\n", | ||
"import sys\n", | ||
"\n", | ||
"import tensorflow_cloud as tfc\n", | ||
"from tensorflow_cloud.core.experimental.models import run_models\n", | ||
"\n", | ||
"print(tfc.__version__)" | ||
], | ||
"execution_count": 2, | ||
"outputs": [ | ||
{ | ||
"output_type": "stream", | ||
"text": [ | ||
"0.1.17.dev\n" | ||
], | ||
"name": "stdout" | ||
} | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "Ka6MHtF-tTU1" | ||
}, | ||
"source": [ | ||
"## Project Configurations\n", | ||
"Setting project parameters. For more details on Google Cloud Specific parameters please refer to [Google Cloud Project Setup Instructions](https://www.kaggle.com/nitric/google-cloud-project-setup-instructions/)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "OFPPSLF9vx4H" | ||
}, | ||
"source": [ | ||
"# Set Google Cloud Specific parameters\n", | ||
"\n", | ||
"# TODO: Please set GCP_PROJECT_ID to your own Google Cloud project ID.\n", | ||
"GCP_PROJECT_ID = 'YOUR_PROJECT_ID' #@param {type:\"string\"}\n", | ||
"\n", | ||
"# TODO: set GCS_BUCKET to your own Google Cloud Storage (GCS) bucket.\n", | ||
"GCS_BUCKET = 'YOUR_GCS_BUCKET_NAME' #@param {type:\"string\"}\n", | ||
"\n", | ||
"# DO NOT CHANGE: Currently only the 'us-central1' region is supported.\n", | ||
"REGION = 'us-central1'\n", | ||
"\n", | ||
"# OPTIONAL: You can change the job name to any string.\n", | ||
"JOB_NAME = 'run_models_demo' #@param {type:\"string\"}" | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "F1_shlH4tUM5" | ||
}, | ||
"source": [ | ||
"## Authenticating the notebook to use your Google Cloud Project\n", | ||
"\n", | ||
"This code authenticates the notebook, checking your valid Google Cloud credentials and identity. It is inside the `if not tfc.remote()` block to ensure that it is only run in the notebook, and will not be run when the notebook code is sent to Google Cloud.\n", | ||
"\n", | ||
"Note: For Kaggle Notebooks click on \"Add-ons\"->\"Google Cloud SDK\" before running the cell below." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "EeW7IHBgtPJD" | ||
}, | ||
"source": [ | ||
"if not tfc.remote():\n", | ||
"\n", | ||
" # Authentication for Kaggle Notebooks\n", | ||
" if \"kaggle_secrets\" in sys.modules:\n", | ||
" from kaggle_secrets import UserSecretsClient\n", | ||
" UserSecretsClient().set_gcloud_credentials(project=GCP_PROJECT_ID)\n", | ||
"\n", | ||
" # Authentication for Colab Notebooks\n", | ||
" if \"google.colab\" in sys.modules:\n", | ||
" from google.colab import auth\n", | ||
" auth.authenticate_user()\n", | ||
" os.environ[\"GOOGLE_CLOUD_PROJECT\"] = GCP_PROJECT_ID" | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "EQrVntO2twh1" | ||
}, | ||
"source": [ | ||
"## Set up TensorFlowCloud run\n", | ||
"\n", | ||
"Set up parameters for tfc.run(). The chief_config, worker_count and worker_config will be set up individually for each distribution strategy. For more details refer to [TensorFlow Cloud overview tutorial](https://colab.research.google.com/github/tensorflow/cloud/blob/master/g3doc/tutorials/overview.ipynb)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "o539iLTKv9a3" | ||
}, | ||
"source": [ | ||
"with open('requirements.txt','w') as f:\n", | ||
" f.write('git+https://github.com/tensorflow/cloud.git@refs/pull/352/head#egg=tensorflow-cloud&subdirectory=src/python\\n'+\n", | ||
" 'tf-models-official\\n'+\n", | ||
" 'keras==2.6.0rc0')\n", | ||
"\n", | ||
"run_kwargs = dict(\n", | ||
" requirements_txt = 'requirements.txt',\n", | ||
" docker_config=tfc.DockerConfig(\n", | ||
" parent_image=\"gcr.io/deeplearning-platform-release/tf2-gpu.2-5\",\n", | ||
" image_build_bucket=GCS_BUCKET\n", | ||
" ),\n", | ||
" chief_config=tfc.COMMON_MACHINE_CONFIGS[\"P100_4X\"],\n", | ||
" job_labels={'job': JOB_NAME}\n", | ||
")" | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "hd4luG7nt3_0" | ||
}, | ||
"source": [ | ||
"## Run the training using run_models" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "_aVt71qpxHUe" | ||
}, | ||
"source": [ | ||
"values = run_models(\n", | ||
" 'imagenette',\n", | ||
" 'resnet',\n", | ||
" GCS_BUCKET,\n", | ||
" 'train',\n", | ||
" 'validation',\n", | ||
" **run_kwargs,\n", | ||
" )" | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "Ku7oBH8iuc2X" | ||
}, | ||
"source": [ | ||
"## Training Results\n", | ||
"\n", | ||
"### Reconnect your Colab instance\n", | ||
"\n", | ||
"Most remote training jobs are long running. If you are using Colab, it may time out before the training results are available.\n", | ||
"\n", | ||
"In that case, **rerun the following sections in order** to reconnect and configure your Colab instance to access the training results.\n", | ||
"\n", | ||
"1. Import required modules\n", | ||
"2. Project Configurations\n", | ||
"3. Authenticating the notebook to use your Google Cloud Project\n", | ||
"\n", | ||
"**DO NOT** rerun the rest of the code.\n", | ||
"\n", | ||
"### Load Tensorboard\n", | ||
"While the training is in progress you can use Tensorboard to view the results. Note the results will show only after your training has started. This may take a few minutes." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "rhVCh8x9upRY" | ||
}, | ||
"source": [ | ||
"if not tfc.remote():\n", | ||
" %load_ext tensorboard\n", | ||
" tensorboard_logs_dir = values['tensorboard_logs']\n", | ||
" %tensorboard --logdir $tensorboard_logs_dir" | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"id": "kOU5Gu4Ku1Qc" | ||
}, | ||
"source": [ | ||
"### Load your trained model\n", | ||
"\n", | ||
"Once training is complete, you can retrieve your model from the GCS Bucket you specified above." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"metadata": { | ||
"id": "rHoQnqKhu2Y8" | ||
}, | ||
"source": [ | ||
"import tensorflow as tf\n", | ||
"if not tfc.remote():\n", | ||
" trained_model = tf.keras.models.load_model(values['saved_model'])\n", | ||
" trained_model.summary()" | ||
], | ||
"execution_count": null, | ||
"outputs": [] | ||
} | ||
] | ||
} |