(re)enable torch.compile in the pytorch trainer for train, predict, and eval #18569

kiukchung · 2023-10-06T23:01:35Z

Change Summary

Updates torch to torch>=2.1.0 (which has many improvements to dynamo)
Wraps the underlying function with torch.compile when jit_compile=True for train, eval, and predict
Updated docs for model.fit() explaining that jit_compile="auto" defaults to eager for the torch backend (torch.compile only kicks in if the user explicitly sets jit_compile=True).
Adds setUp() to clear_session() in testing.TestCase (required for dynamo)
Fixes a few functions to make the codebase dynamo friendly
Fix incorrect assertion in naming_test.py:test_uniquify_already_uniquified_name()
Use jit_compile="auto" (versus jit_compile=True) in keras.testing.test_case.TestCase.run_layer_test.run_training_step() so that the backends are tested in their "default" jitted mode (jit for tf and jax and eager for torch).

Note On Dynamo

Currently there are two caveats to running torch backend with jit_compile=True

(performance) It is slower than eager because of too many graph breaks, which is mainly due to the usage of tree in the function (dynamo will not trace through tree, see skipfiles) any_symbolic_tensors(), which in turn is called by pretty much all ops (e.g. numpy, layer, activation, etc). Therefore, no "deep-graph" can be captured and hence no opportunities for optimizations such as op-fusion. This can be fixed by not using tree.flatten in any_symbolic_tensors()
(overhead) torch.core.convert_to_tensor needs to be simplified to just calling torch.as_tensor(x, dtype, device) rather than using x.to(dtype, device). This won't make things compile better but reduces frame eval overhead since convert_to_tensor is called for each op and tracing through many branches is less than ideal.
(compatibility) There are cases where primitive operators can be traced by dynamo, but when a sequence of them are used as a higher order operator such as a layer (e.g. up_sampling_2d), causes guard failures on the primitive ops, which in turn makes dynamo trace with dynamic shapes via symbolic variables rather than concretized values, which can often lead to tracing failures due to "missing methods".

Testing

CI for unittests

Manual testing on examples/keras_io/vision/mnist_convnet.py by explicitly enabling jit_compile.
Observations:

No significant speedup
First 1,2 epochs are slow due to (re)compilation
Later epochs are still slower: 19ms (eager) vs 28ms (compiled) on CPU, have not tried on GPU.
(3) mostly due to recompilation / graph breaks since some functions (e.g. convert_to_tensor) are highly dynamic in the input types (python types).

codecov-commenter · 2023-10-06T23:15:53Z

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (d026dfd) 78.11% compared to head (d65c494) 78.30%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #18569      +/-   ##
==========================================
+ Coverage   78.11%   78.30%   +0.18%     
==========================================
  Files         334      334              
  Lines       32477    32484       +7     
  Branches     6339     6342       +3     
==========================================
+ Hits        25371    25438      +67     
+ Misses       5539     5482      -57     
+ Partials     1567     1564       -3

Flag	Coverage Δ
keras	`78.19% <91.30%> (+0.17%)`	⬆️
keras-jax	`63.58% <34.78%> (+0.16%)`	⬆️
keras-numpy	`57.94% <30.43%> (+0.13%)`	⬆️
keras-tensorflow	`64.42% <34.78%> (+0.11%)`	⬆️
keras-torch	`65.31% <91.30%> (+0.22%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
keras/backend/common/global_state.py	`96.29% <100.00%> (+0.46%)`	⬆️
keras/backend/torch/core.py	`91.45% <100.00%> (ø)`
keras/backend/torch/numpy.py	`95.73% <100.00%> (+0.01%)`	⬆️
keras/backend/torch/random.py	`91.30% <100.00%> (+0.09%)`	⬆️
keras/backend/torch/trainer.py	`90.04% <100.00%> (+1.01%)`	⬆️
keras/layers/reshaping/flatten.py	`100.00% <100.00%> (ø)`
keras/layers/reshaping/up_sampling2d.py	`95.45% <100.00%> (-0.30%)`	⬇️
keras/testing/test_case.py	`86.20% <100.00%> (+0.18%)`	⬆️
keras/trainers/epoch_iterator.py	`90.74% <ø> (ø)`
keras/trainers/trainer.py	`84.73% <100.00%> (-0.24%)`	⬇️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fchollet

Thanks for the PR!

keras/backend/torch/core.py

keras/backend/torch/trainer.py

keras/trainers/trainer_test.py

fchollet · 2023-10-14T09:26:10Z

which is mainly due to the usage of tree in the function (dynamo will not trace through tree, see skipfiles

This is actually very fixable. Instead of using tree we can use e.g. keras.utils.tree, and in those functions we can route between the actual tree functions (which are C++ based) or a simple pure Python implementation (which would be Dynamo compatible) depending on whether we're in a Dynamo context. What do you think?

The reason tree is not Dynamo compatible is presumably because it isn't Python based (for performance reasons -- which is good when in eager execution).

fchollet

LGTM -- I think doing the tree conversion would likely unlock the performance benefits here.

sampathweb · 2023-10-15T01:09:33Z

requirements.txt

@@ -2,8 +2,8 @@
 tf-nightly==2.15.0.dev20231009  # Pin a working nightly until rc0.

 # Torch.
-torch>=2.0.1
-torchvision>=0.15.1
+torch>=2.1.0


@grasskin - FYI. I remember Gabriel wanting to keep the requirements as torch 2.0.1. So wanted him to take a look or be in the loop.

thanks @sampathweb, @grasskin, let me know if we have a good reason to stay at 2.0.1. I'd like to update to 2.1 if possible since it has a bunch of fixes (especially to torch.compile)

kiukchung · 2023-10-15T18:42:48Z

which is mainly due to the usage of tree in the function (dynamo will not trace through tree, see skipfiles

This is actually very fixable. Instead of using tree we can use e.g. keras.utils.tree, and in those functions we can route between the actual tree functions (which are C++ based) or a simple pure Python implementation (which would be Dynamo compatible) depending on whether we're in a Dynamo context. What do you think?

The reason tree is not Dynamo compatible is presumably because it isn't Python based (for performance reasons -- which is good when in eager execution).

Yep I created an issue for this (#18614). I can do this in a fast-follow PR since this one is getting big and the torch backend defaults to eager right now.

…nd eval

fchollet · 2023-10-15T20:39:32Z

Happy to merge this now since CI is passing and we can do the rest in future PRs. If the updated torch version is an issue we can revert that part later.

google-ml-butler bot added the size:M label Oct 6, 2023

google-ml-butler bot assigned gbaned Oct 6, 2023

kiukchung force-pushed the master branch 2 times, most recently from 8212c7d to 22ec236 Compare October 6, 2023 23:03

fchollet reviewed Oct 8, 2023

View reviewed changes

keras/backend/torch/core.py Outdated Show resolved Hide resolved

keras/backend/torch/trainer.py Outdated Show resolved Hide resolved

keras/trainers/trainer_test.py Outdated Show resolved Hide resolved

kiukchung force-pushed the master branch 19 times, most recently from 91ddb48 to e166beb Compare October 14, 2023 05:18

kiukchung marked this pull request as ready for review October 14, 2023 05:19

fchollet approved these changes Oct 14, 2023

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Oct 14, 2023

kokoro-team removed the kokoro:force-run label Oct 14, 2023

sampathweb reviewed Oct 15, 2023

View reviewed changes

kiukchung mentioned this pull request Oct 15, 2023

(torch) Use pure-python implementation of tree when in dynamo context #18614

Open

(re)enable torch.compile in the pytorch trainer for train, predict, a…

d65c494

…nd eval

kiukchung force-pushed the master branch from e166beb to d65c494 Compare October 15, 2023 18:47

google-ml-butler bot removed the ready to pull Ready to be merged into the codebase label Oct 15, 2023

fchollet merged commit 1c0d997 into keras-team:master Oct 15, 2023
6 checks passed

james77777778 mentioned this pull request Mar 14, 2024

Replace dm-tree with optree #19306

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(re)enable torch.compile in the pytorch trainer for train, predict, and eval #18569

(re)enable torch.compile in the pytorch trainer for train, predict, and eval #18569

kiukchung commented Oct 6, 2023 •

edited

Loading

codecov-commenter commented Oct 6, 2023 •

edited

Loading

fchollet left a comment

fchollet commented Oct 14, 2023

fchollet left a comment

sampathweb Oct 15, 2023

kiukchung Oct 15, 2023

kiukchung commented Oct 15, 2023

fchollet commented Oct 15, 2023

(re)enable torch.compile in the pytorch trainer for train, predict, and eval #18569

(re)enable torch.compile in the pytorch trainer for train, predict, and eval #18569

Conversation

kiukchung commented Oct 6, 2023 • edited Loading

Change Summary

Note On Dynamo

Testing

codecov-commenter commented Oct 6, 2023 • edited Loading

Codecov Report

fchollet left a comment

Choose a reason for hiding this comment

fchollet commented Oct 14, 2023

fchollet left a comment

Choose a reason for hiding this comment

sampathweb Oct 15, 2023

Choose a reason for hiding this comment

kiukchung Oct 15, 2023

Choose a reason for hiding this comment

kiukchung commented Oct 15, 2023

fchollet commented Oct 15, 2023

kiukchung commented Oct 6, 2023 •

edited

Loading

codecov-commenter commented Oct 6, 2023 •

edited

Loading