Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flax training example #4978

Merged
merged 10 commits into from
Aug 8, 2023
Merged

Add Flax training example #4978

merged 10 commits into from
Aug 8, 2023

Conversation

awolant
Copy link
Contributor

@awolant awolant commented Aug 4, 2023

Category:

New feature

Description:

New tutorial showing how to train neural network implemented in Flax with DALI. It builds on the JAX training tutorial - significant parts are the same.

Additional information:

Affected modules and functionalities:

JAX documentation.

Key points relevant for the review:

Is it understandable? Spelling, grammar?

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-3564

Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Signed-off-by: Albert Wolant <awolant@nvidia.com>
@awolant
Copy link
Contributor Author

awolant commented Aug 4, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9239937]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9239937]: BUILD FAILED

"\n",
"This simple example shows how to train a neural network implemented in Flax with DALI pipelines. If you want to learn more about training neural networks with Flax look into [Flax Getting Started](https://flax.readthedocs.io/en/latest/getting_started.html) example.\n",
"\n",
"DALI setup is very similar to the [training example with pure JAX](jax-basic_example.ipynb). The only difference is the addition of a trailing dimention to the returned image to make it compatible with Flax convolutions. If you are familiar with how to use DALI with JAX you can skip this part and move to the training section of this notebook.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"DALI setup is very similar to the [training example with pure JAX](jax-basic_example.ipynb). The only difference is the addition of a trailing dimention to the returned image to make it compatible with Flax convolutions. If you are familiar with how to use DALI with JAX you can skip this part and move to the training section of this notebook.\n",
"DALI setup is very similar to the [training example with pure JAX](jax-basic_example.ipynb). The only difference is the addition of a trailing dimension to the returned image to make it compatible with Flax convolutions. If you are familiar with how to use DALI with JAX, you can skip this part and move to the training section of this notebook.\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"**Here is a quick explnation of how these parameters work:**\n",
"\n",
" - `output_map`: iterators return a dictionary with outputs of the pipeline as its values. Keys in this dictionary are defined by `output_map`. For example, `labels` output returned from the DALI pipeline defined above will be accessible as `iterator_output['labels']`,\n",
" - `reader_name`: setting this parameter introduces the notion of an epoch to our iterator. DALI pipeline itself is infinite, it will return the data indefinately, wrapping around the dataset. DALI readers (such as `fn.readers.caffe2` used in this example) have access to the information about the size of the dataset. If we want to pass this information to the iterator we need to point to the operator that should be queried for the dataset size. We do it by naming the operator (note `name=\"mnist_caffe2_reader\"`) and passing the same name as the value for `reader_name` argument,\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" - `reader_name`: setting this parameter introduces the notion of an epoch to our iterator. DALI pipeline itself is infinite, it will return the data indefinately, wrapping around the dataset. DALI readers (such as `fn.readers.caffe2` used in this example) have access to the information about the size of the dataset. If we want to pass this information to the iterator we need to point to the operator that should be queried for the dataset size. We do it by naming the operator (note `name=\"mnist_caffe2_reader\"`) and passing the same name as the value for `reader_name` argument,\n",
" - `reader_name`: setting this parameter introduces the notion of an epoch to our iterator. DALI pipeline itself is infinite, it will return the data indefinately, wrapping around the dataset. DALI readers (such as `fn.readers.caffe2` used in this example) have access to the information about the size of the dataset. If we want to pass this information to the iterator, we need to point to the operator that should be queried for the dataset size. We do it by naming the operator (note `name=\"mnist_caffe2_reader\"`) and passing the same name as the value for `reader_name` argument,\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"\n",
"Now we need to setup model and training utilities. The goal of this notebook is not to explain Flax concepts. We want to show how to train models implemented in Flax with DALI as a data loading and preprocessing library. We used standard Flax tools do define simple neural network. We have functions to create an instance of this network, run one training step on it and calculate accuracy of the model at the end of each epoch.\n",
"\n",
"If you want to learn more about Flax and get better understanding of the code below look into [Flax Documentation](https://flax.readthedocs.io/en/latest/)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"If you want to learn more about Flax and get better understanding of the code below look into [Flax Documentation](https://flax.readthedocs.io/en/latest/)."
"If you want to learn more about Flax and get better understanding of the code below, look into [Flax Documentation](https://flax.readthedocs.io/en/latest/)."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"cell_type": "markdown",
"metadata": {},
"source": [
"At this point everything is ready to run the training."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"At this point everything is ready to run the training."
"At this point, everything is ready to run the training."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"cell_type": "markdown",
"metadata": {},
"source": [
"With the setup above DALI iterators are ready for the training. \n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"With the setup above DALI iterators are ready for the training. \n",
"With the setup above, DALI iterators are ready for the training. \n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"source": [
"# Training neural network with DALI and Flax\n",
"\n",
"This simple example shows how to train a neural network implemented in Flax with DALI pipelines. If you want to learn more about training neural networks with Flax look into [Flax Getting Started](https://flax.readthedocs.io/en/latest/getting_started.html) example.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"This simple example shows how to train a neural network implemented in Flax with DALI pipelines. If you want to learn more about training neural networks with Flax look into [Flax Getting Started](https://flax.readthedocs.io/en/latest/getting_started.html) example.\n",
"This simple example shows how to train a neural network implemented in Flax with DALI pipelines. If you want to learn more about training neural networks with Flax, look into [Flax Getting Started](https://flax.readthedocs.io/en/latest/getting_started.html) example.\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"\n",
" - `output_map`: iterators return a dictionary with outputs of the pipeline as its values. Keys in this dictionary are defined by `output_map`. For example, `labels` output returned from the DALI pipeline defined above will be accessible as `iterator_output['labels']`,\n",
" - `reader_name`: setting this parameter introduces the notion of an epoch to our iterator. DALI pipeline itself is infinite, it will return the data indefinately, wrapping around the dataset. DALI readers (such as `fn.readers.caffe2` used in this example) have access to the information about the size of the dataset. If we want to pass this information to the iterator we need to point to the operator that should be queried for the dataset size. We do it by naming the operator (note `name=\"mnist_caffe2_reader\"`) and passing the same name as the value for `reader_name` argument,\n",
" - `auto_reset`: this argument controls the behaviour of the iterator after the end of an epoch. If set to `True` will automatically reset the state of the iterator and prepare it to start the next epoch."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" - `auto_reset`: this argument controls the behaviour of the iterator after the end of an epoch. If set to `True` will automatically reset the state of the iterator and prepare it to start the next epoch."
" - `auto_reset`: this argument controls the behaviour of the iterator after the end of an epoch. If set to `True`, it will automatically reset the state of the iterator and prepare it to start the next epoch."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"cell_type": "markdown",
"metadata": {},
"source": [
"With utilities defined above we can create an instance of the model we want to train."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"With utilities defined above we can create an instance of the model we want to train."
"With utilities defined above, we can create an instance of the model we want to train."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Albert Wolant <awolant@nvidia.com>
@awolant
Copy link
Contributor Author

awolant commented Aug 7, 2023

Some of the comments were also applicable to the basic JAX tutorial since it shares some content with this one. I applied them there as well.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9239937]: BUILD PASSED

@@ -0,0 +1,368 @@
{
Copy link
Member

@szalpal szalpal Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I clicked the "training example with pure JAX" link, it redirected me to raw output of the jupyter notebook. Is this intentional? Or maybe the reviewing tool fails here?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the review tool can't handle these links. I checked them in the docs build.

@@ -0,0 +1,368 @@
{
Copy link
Member

@szalpal szalpal Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this section "Getting started" link redirected me to the raw jupyter notebook and "pipeline documentation" to the raw rst file. Could you double-check these?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the review tool can't handle these links. I checked them in the docs build.

@@ -0,0 +1,368 @@
{
Copy link
Member

@szalpal szalpal Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could also explain above, why setting seed=0?


Reply via ReviewNB

Copy link
Contributor Author

@awolant awolant Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is described in the tutorials that are linked just above so I am going to leave it to keep the notebook as lean as possible and only related to JAX+DALI. Hope that's ok?

@@ -0,0 +1,368 @@
{
Copy link
Member

@szalpal szalpal Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:

"Next step is to instantiate pipelines and build them." -> "Next step is to instantiate DALI pipelines and build them."


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,368 @@
{
Copy link
Member

@szalpal szalpal Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions:

"For DALI pipeline to work with JAX it needs to be wrapped with appropriate DALI iterator." -> "For DALI pipeline to work with JAX, the former needs to be wrapped with an appropriate DALI iterator."

"In addition to the pipeline we can pass the" -> "In addition to the DALI pipeline object we can pass the"

"Here is a quick explnation of how these parameters work:" - maybe it would be good to put a link to the full documentation somewhere here?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@szalpal szalpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check the links, cause they did not work correctly in the ReviewNB (but it might be a problem of the review tool)

Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
@awolant
Copy link
Contributor Author

awolant commented Aug 7, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9267807]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9267807]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [9267807]: BUILD PASSED

@awolant awolant merged commit 7539566 into NVIDIA:main Aug 8, 2023
5 checks passed
JanuszL pushed a commit to JanuszL/DALI that referenced this pull request Oct 13, 2023
New tutorial showing how to train neural network implemented in Flax with DALI. It builds on the JAX training tutorial - significant parts are the same.

Signed-off-by: Albert Wolant <awolant@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants