Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make conditionals work in debug mode #4738

Merged
merged 24 commits into from
Apr 7, 2023

Conversation

klecki
Copy link
Contributor

@klecki klecki commented Mar 23, 2023

Category: New feature

Description:

Enable using conditionals and Split/Merge in Debug Mode.
Keep batch size tracking when feasible, skip some checks when the tracking is not possible.
TODO:

  • Add more tests - possibly convert all conditionals tests into debug mode.

Additional information:

Affected modules and functionalities:

Debug Mode, Conditionals

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
- [ ] GTests - [ ] Benchmark - [ ] Other - [ ] N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@@ -312,6 +313,21 @@ def track_merge(self, split_predicate):
self.no_branch()
self.top().add_produced(split_predicate)

def scope_batch_size_tracker(self):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
Comment on lines 281 to 282
assert (batch is None or isinstance(batch, DataNodeDebug),
"Conditionals in debug mode work only with DataNodeDebug")

Check failure

Code scanning / CodeQL

Asserting a tuple

Assertion of non-empty tuple is always True.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see namedtuple imported in this line.

Comment on lines 27 to 32
from conditionals.test_pipeline_conditionals import (pred_gens, impl_test_against_split_merge,
impl_test_dot_gpu,
impl_test_arg_inputs_scoped_tracking,
impl_test_arg_inputs_scoped_tracking,
impl_test_arg_inputs_scoped_uninitialized,
impl_test_generators, impl_test_uninitialized)

Check notice

Code scanning / CodeQL

Unused import

Import of 'pred_gens' is not used.
@klecki
Copy link
Contributor Author

klecki commented Mar 24, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7704282]: BUILD STARTED

Comment on lines 760 to 761
# for base_debug, conditional_debug in [(True, False), (False, True), (True, True)]:
# impl_test_generators(pred, {'debug': base_debug}, {'debug': conditional_debug})

Check notice

Code scanning / CodeQL

Commented-out code

This comment appears to contain commented-out code.
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7704282]: BUILD PASSED

@klecki
Copy link
Contributor Author

klecki commented Mar 27, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7724827]: BUILD STARTED

@klecki
Copy link
Contributor Author

klecki commented Mar 27, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7725236]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7724827]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7725236]: BUILD PASSED

Comment on lines 53 to 55
transferred_node = DataNodeDebug(self._data._as_gpu(), self_split.name, "gpu",
self_split.source)
_conditionals.register_data_nodes(transferred_node, [self])
Copy link
Contributor

@mzient mzient Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: is it really always transferred? If the node is already on GPU, it's a no-op, so perhaps calling it a gpu_node would be more accurate (see the special case below - it seems like it isn't handled here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a typo as well, I should copy the split node, adjusted + a shortcut.

@klecki
Copy link
Contributor Author

klecki commented Mar 29, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7748825]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7748825]: BUILD PASSED

Comment on lines 521 to 523
op = fn.constant(device=device, fdata=fdata, idata=idata, shape=shape, dtype=dtype,
layout=layout, **kwargs)
return op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
You can take it one step further:

Suggested change
op = fn.constant(device=device, fdata=fdata, idata=idata, shape=shape, dtype=dtype,
layout=layout, **kwargs)
return op
return fn.constant(device=device, fdata=fdata, idata=idata, shape=shape, dtype=dtype,
layout=layout, **kwargs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -505,7 +505,7 @@ def _type_from_value_or_list(v):
if dtype is None:
dtype = actual_type

import nvidia.dali.ops as ops
import nvidia.dali.fn as fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@klecki klecki added the conditional execution Questions related to conditional execution and using `if` statements in DALI label Mar 30, 2023
Needs a lot of cleanup out of debug info.
Keep things wrapped in DataNodes.
Consider the tracking of batch sizes?
We can keep track of bs in scopes maybe, but with split/merge only
it is not possible.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
klecki added 10 commits April 3, 2023 17:15
This is a bit too much workarounds for my taste

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
The test is not feasible with the asserts, as the debug mode generates
diffenrent naming

Generator test is mismatching batch sizes for some splits,
so there is still something wrong

Generating nodes outside pipeline won't work with current
implementation.

TODO: check if there can be naming collision if we use the same op
several times

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki klecki force-pushed the conditionals-in-debug-mode branch from 51fa7b3 to 004592f Compare April 3, 2023 15:17
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
test_data_root = get_dali_extra_path()
caffe_db_folder = os.path.join(test_data_root, 'db', 'lmdb')
bs = 10
kwargs = {"batch_size": bs, "num_threads": 4, "device_id": 0}

@experimental.pipeline_def(enable_conditionals=True, **kwargs)
@experimental.pipeline_def(enable_conditionals=True, **kwargs, **additional_additional_kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why additional_additional_kwargs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search & replace mistake, fixed.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Apr 3, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7802285]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7802285]: BUILD PASSED

Comment on lines +73 to +74
assert (_conditionals._data_node_repr(some_nested_op) == _conditionals._data_node_repr(
preprocessed))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugly :(

return pred, output

with assert_raises(
ValueError, glob=("Debug mode for conditionals doesn't allow for modification of"
Copy link
Contributor

@mzient mzient Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ValueError, glob=("Debug mode for conditionals doesn't allow for modification of"
ValueError, glob=("Debug mode for conditional execution doesn't allow for modification of"

"conditionals" is our internal jargon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if _conditionals.conditionals_enabled():
if input_set_len != -1:
raise ValueError("Multiple input sets are not supported with conditionals.")
Copy link
Contributor

@mzient mzient Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Multiple input sets are not supported with conditionals.")
raise ValueError("Multiple input sets are not supported with conditional execution.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# TODO(klecki): Add better handling of constant nodes for conditionals in debug mode.
for i, classification in enumerate(self._inputs_classification):
if classification.is_batch and not classification.was_data_node:
raise ValueError(f"Debug mode for conditionals doesn't allow for modification"
Copy link
Contributor

@mzient mzient Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Debug mode for conditionals doesn't allow for modification"
raise ValueError(f"Debug mode for conditional execution doesn't allow for modification"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


for key, classification in self._kwargs_classification.items():
if classification.is_batch and not classification.was_data_node:
raise ValueError(f"Debug mode for conditionals doesn't allow for modification"
Copy link
Contributor

@mzient mzient Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Debug mode for conditionals doesn't allow for modification"
raise ValueError(f"Debug mode for conditional execution doesn't allow for modification"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@mzient mzient left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the messages not to use our jargon - otherwise good.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Apr 6, 2023

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7838899]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [7838899]: BUILD PASSED

@klecki klecki merged commit 751b75b into NVIDIA:main Apr 7, 2023
@JanuszL JanuszL mentioned this pull request Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conditional execution Questions related to conditional execution and using `if` statements in DALI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants