Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Flax and Paxml training tutorials #5205

Merged
merged 12 commits into from
Dec 1, 2023

Conversation

awolant
Copy link
Contributor

@awolant awolant commented Nov 29, 2023

Category:

Refactoring

Description:

Rework training examples for Flax and Paxml to use data_iterator.

Additional information:

Flax and Paxml training examples were adjusted to the new API with data_iterator.

Affected modules and functionalities:

Flax and Paxml training examples.

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-3711

Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -9,14 +9,14 @@
"\n",
Copy link
Contributor

@mzient mzient Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(...) create an iterator (...)

Also, I'm not sure that listing of all the operators adds much here.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I left the list for now.

@@ -9,14 +9,14 @@
"\n",
Copy link
Contributor

@mzient mzient Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #27.            labels = fn.one_hot(labels, num_classes=num_classes)

How does one_hot encoding depend on random shuffling? Is that a bug or exploiting some coincidence?

If it's about training vs validation, then perhaps is_training parameter would be less confusing.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -9,14 +9,14 @@
"\n",
Copy link
Contributor

@mzient mzient Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(...) creating ~a~an iterator definition function. It is a slightly modified (...)

Simpler / less verbose version:

Note tThe new arguments passed to fn.readers.caffe2:, (comma instead of a colon) num_shards and shard_id~. They are used to~, (comma) control sharding:

We add devices argument to the decorator to indicate to the iterator specify which devices we want to use. (...)


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with some changes.

@@ -5,16 +5,16 @@
"cell_type": "markdown",
Copy link
Contributor

@mzient mzient Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It produces a log compatible with tensorboard under in /tmp/dali_pax_logs/. To read this log in console we create a helper function that prints training accuracy from the logs. We use a helper function that reads training accuracy from the logs and prints it in the terminal.


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
@awolant
Copy link
Contributor Author

awolant commented Dec 1, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [11169573]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [11169573]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [11169573]: BUILD PASSED

@awolant awolant merged commit ac92c6f into NVIDIA:main Dec 1, 2023
5 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants