`MixedBackend` support for `InputOperator` #4603

szalpal · 2023-01-23T19:16:16Z

Category:

New feature (*non-breaking change

Description:

So far we never had a MixedBackend input operator. None of: ExternalSourceCpu, ExternalSourceGpu and VideoInputCpu had to inherit from InputOperator<MixedBackend>. With VideoInputGpu the situation changes, since this operator is actually a mixed operator. Therefore the support for MixedBackend operators in the InputOperator had to be added.

The mixed input operator has somehow different meaning than the other two. Both CPU and GPU have a uniform device across the execution, while Mixed operator changes the device in the process. In other words, the following pattern occurs:

OpBackend   InpB OutB
CPU         CPU  CPU
MXD         CPU  GPU
GPU         GPU  GPU

Let's take an example of the operator, that subclasses InputOperator:

MyOp<CPUBackend>. This part is already handled by the InputOperator. Since operator takes the CPU input and returns a CPU output, the "intermediate" data (i.e. the data that is acquired using InputOperator API ForwardCurrentData) will also have a CPUBackend.
MyOp<GPUBackend>. This part is also handled already. Since the operator takes the GPU input and returns a GPU output, the intermediate data has to be on the GPU.
MyOp<MixedOperator>. This part is what the PR targets.

The input operator grabs the actual input from the top of the queue (i.e. CachingList in the implementation). Since it has a CPU input and returns a GPU output, we do not know what backend of the data the operator needs to proceed. It might be a CPU data (which I believe will be the more common case), when e.g. the operator need to conduct Huffman CPU-based decoding and then proceed with JPEG decoding on the GPU. However, maybe the operator does not need CPU data, but it needs to be a Mixed operator for any other reason - it this case, the operator would like ForwardCurrentData to return a tensor already with GPU backend.

For these reasons, there are 2 overloads of the ForwardCurrentData. One is for CPU and one for GPU target tensor. The former part would be more common and the latter part is just a convenience function, making the code a lot simpler. ForwardCurrentData<Mixed>(Tensor<CPU>) would behave in the same way as its CPU sibling - copy or swap depending on the pageability. In case of ForwardCurrentData<Mixed>(Tensor<GPU>) the data on the top of the queue is always a CPU one (since MixedOperator has a CPU input). Therefore the InputOperator needs to copy this data to the GPU.

For the CPU part, the operator has to create its own ThreadPool, since MixedOperators are not supplied with one automatically.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

szalpal · 2023-01-23T19:18:49Z

!build

dali-automaton · 2023-01-23T19:53:58Z

CI MESSAGE: [7086603]: BUILD STARTED

szalpal · 2023-01-23T22:30:17Z

!build

dali-automaton · 2023-01-23T22:35:20Z

CI MESSAGE: [7088060]: BUILD STARTED

dali-automaton · 2023-01-24T00:32:10Z

CI MESSAGE: [7088060]: BUILD PASSED

dali/pipeline/operator/builtin/input_operator.cc

dali/test/operators/passthrough_input.cc

dali/test/operators/passthrough_input.h

dali/pipeline/operator/builtin/input_operator.cc

jantonguirao · 2023-01-24T13:22:09Z

dali/operators/input/input_operator_test.cu

+
+ private:
+  void PutTogetherDaliGraph() {
+    pipeline_->AddOperator(OpSpec("PassthroughInput")


Ideally this PassthroughInput would be defined in a _test.* file so that it doesn't belong to DALI library

jantonguirao · 2023-01-24T13:30:29Z

dali/pipeline/operator/builtin/input_operator.cc

+  }
+  // if the output is pinned and input not it needs to be copied
+  if (target.is_pinned() && !tensor_list_elm.front()->is_pinned()) {
+    const auto &shapes = tensor_list_elm.front()->shape();


nitpick: auto &elem = tensor_list_elm.front(); (you use it several times in this scope)

jantonguirao · 2023-01-24T13:33:52Z

dali/pipeline/operator/builtin/input_operator.cc

+                  target.CopySample(sample_id, *tensor_list_elm.front(), sample_id,
+                                    AccessOrder::host());


Suggested change

target.CopySample(sample_id, *tensor_list_elm.front(), sample_id,

AccessOrder::host());

target.CopySample(sample_id, *tensor_list_elm.front(), sample_id);

leave the order to be figured out from the arguments?

Signed-off-by: szalpal <mszolucha@nvidia.com>

klecki

Looks good, mostly nitpicks about test scope and the misleading naming of the test operator.

klecki · 2023-01-24T17:35:31Z

dali/test/operators/passthrough_input.cc

+DALI_SCHEMA(PassthroughInput)
+                .DocStr(
+                        R"code(
+The operator that is a passthrough operator and also an input operator. Used for test only.


To be a Passthrough operator it must contain a PassThrough or SamplewisePassThrough declaration in the schema. This one doesn't, so I would name it something along the lines of TestInputOperator or similar.

Renamed to IdentityInput

klecki · 2023-01-24T18:08:00Z

dali/test/operators/passthrough_input.h

+    this->ForwardCurrentData(intermediate,
+                             std::is_same_v<Backend, CPUBackend> ? ws.GetThreadPool() : *tp_);
+    out.Copy(intermediate, ws.stream());


Could we check, for the CPUBackend, if we can just ForwardCurrentData to the output without copying it?

I'm not sure what you have in mind, but let me explain what happens here.

The point of this test is to check both overloads:

void DLL_PUBLIC ForwardCurrentData(TensorList<CPUBackend> &target, ThreadPool &tp); void DLL_PUBLIC ForwardCurrentData(TensorList<GPUBackend> &target, cudaStream_t stream = nullptr);

inside an InputOperator<MixedBackend> operator. We don't care for InputOperator<CPUBackend> and InputOperator<GPUBackend>, because these two are tested with the ExternalSource tests.

Therefore we have a cpu_input flag, specified and switched in tests, which picks the proper overload:

void RunCpuInput(Workspace &ws) { auto &out = ws.Output<OutBackend>(0); TensorList<CPUBackend> intermediate; this->ForwardCurrentData(intermediate, // ForwardCurrentData(TensorList<CPUBackend>, ThreadPool &) std::is_same_v<Backend, CPUBackend> ? ws.GetThreadPool() : *tp_); out.Copy(intermediate, ws.stream()); } void RunGpuInput(Workspace &ws) { auto &out = ws.Output<OutBackend>(0); this->ForwardCurrentData(out, ws.stream()); // ForwardCurrentData(TensorList<GPUBackend> &, cudaStream_t); }

This makes both of them tested, inside MixedOperator, we're happy :)

Ok, as long as we don't test the CPU and GPU variants, it doesn't make sense to do it. In that case IMO it is not necessary to implement this operator as templated over Backend, it should just be implemented for MixedBackend.
I can see that we only do DALI_REGISTER_OPERATOR(IdentityInput, IdentityInput<MixedBackend>, Mixed);, but that could just be DALI_REGISTER_OPERATOR(IdentityInput, IdentityInputMixed, Mixed);

klecki · 2023-01-24T18:12:56Z

dali/operators/input/input_operator_test.cu

+        {2, 2, true, false, true},
+        {3, 3, true, false, true},
+        {2, 2, true, true,  false},
+        {3, 3, true, true,  false},


Suggested change

{2, 2, true, false, true},

{3, 3, true, false, true},

{2, 2, true, true, false},

{3, 3, true, true, false},

{2, 2, true, false, true},

{3, 3, true, false, false},

{2, 2, true, true, false},

{3, 3, true, true, true},

Should we make every other usage of cpu_input different for the particular executor type?

Signed-off-by: szalpal <mszolucha@nvidia.com>

szalpal · 2023-01-24T23:41:46Z

!build

dali-automaton · 2023-01-24T23:45:08Z

CI MESSAGE: [7097931]: BUILD STARTED

dali-automaton · 2023-01-25T01:53:13Z

CI MESSAGE: [7097931]: BUILD PASSED

* Add In/Out type logic to the InputOperator. * Add ForwardCurrentData specializations for MixedBackend. * Refactor SetExternalInput routines and add MixedBackend handling. * Add IdentityInput test-only operator. * Add InputOperator<MixedBackend> tests. So far we never had a `MixedBackend` input operator. None of: `ExternalSourceCpu`, `ExternalSourceGpu` and `VideoInputCpu` had to inherit from `InputOperator<MixedBackend>`. With `VideoInputGpu` the situation changes, since this operator is actually a mixed operator. Therefore the support for `MixedBackend` operators in the `InputOperator` had to be added. The mixed input operator has somehow different meaning than the other two. Both CPU and GPU have a uniform device across the execution, while Mixed operator changes the device in the process. In other words, the following pattern occurs: OpBackend InpB OutB CPU CPU CPU MXD CPU GPU GPU GPU GPU Let's take an example of the operator, that subclasses `InputOperator`: 1. `MyOp<CPUBackend>`. This part is already handled by the `InputOperator`. Since operator takes the CPU input and returns a CPU output, the "intermediate" data (i.e. the data that is acquired using InputOperator API `ForwardCurrentData`) will also have a CPUBackend. 2. `MyOp<GPUBackend>`. This part is also handled already. Since the operator takes the GPU input and returns a GPU output, the intermediate data has to be on the GPU. 3. `MyOp<MixedOperator>`. This part is what the PR targets. The input operator grabs the actual input from the top of the queue (i.e. `CachingList` in the implementation). Since it has a CPU input and returns a GPU output, we do not know what backend of the data the operator needs to proceed. It might be a CPU data (which I believe will be the more common case), when e.g. the operator need to conduct Huffman CPU-based decoding and then proceed with JPEG decoding on the GPU. However, maybe the operator does not need CPU data, but it needs to be a Mixed operator for any other reason - it this case, the operator would like `ForwardCurrentData` to return a tensor already with GPU backend. For these reasons, there are 2 overloads of the `ForwardCurrentData`. One is for CPU and one for GPU target tensor. The former part would be more common and the latter part is just a convenience function, making the code a lot simpler. `ForwardCurrentData<Mixed>(Tensor<CPU>)` would behave in the same way as its CPU sibling - copy or swap depending on the pageability. In case of `ForwardCurrentData<Mixed>(Tensor<GPU>)` the data on the top of the queue is always a CPU one (since MixedOperator has a CPU input). Therefore the InputOperator needs to copy this data to the GPU. For the CPU part, the operator has to create its own ThreadPool, since MixedOperators are not supplied with one automatically. Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>

szalpal marked this pull request as ready for review January 24, 2023 07:15

szalpal marked this pull request as draft January 24, 2023 07:19

szalpal marked this pull request as ready for review January 24, 2023 08:55

jantonguirao assigned klecki and jantonguirao Jan 24, 2023

jantonguirao reviewed Jan 24, 2023

View reviewed changes

dali/pipeline/operator/builtin/input_operator.cc Show resolved Hide resolved

jantonguirao reviewed Jan 24, 2023

View reviewed changes

dali/test/operators/passthrough_input.cc Outdated Show resolved Hide resolved

jantonguirao reviewed Jan 24, 2023

View reviewed changes

dali/test/operators/passthrough_input.h Outdated Show resolved Hide resolved

szalpal commented Jan 24, 2023

View reviewed changes

dali/pipeline/operator/builtin/input_operator.cc Outdated Show resolved Hide resolved

jantonguirao reviewed Jan 24, 2023

View reviewed changes

szalpal added 7 commits January 24, 2023 14:53

wip

7a80915

Signed-off-by: szalpal <mszolucha@nvidia.com>

lint

d0c265a

Signed-off-by: szalpal <mszolucha@nvidia.com>

pretty code

6fa16cc

Signed-off-by: szalpal <mszolucha@nvidia.com>

test error fix

41f7f66

Signed-off-by: szalpal <mszolucha@nvidia.com>

add cpu & gpu intermediates

e1620dc

Signed-off-by: szalpal <mszolucha@nvidia.com>

wip

4b8df3a

Signed-off-by: szalpal <mszolucha@nvidia.com>

remove resize

d6c7d1a

szalpal force-pushed the mxd_input branch from 0a3a8e4 to d6c7d1a Compare January 24, 2023 13:53

jantonguirao approved these changes Jan 24, 2023

View reviewed changes

klecki reviewed Jan 24, 2023

View reviewed changes

szalpal added 2 commits January 25, 2023 00:38

review

8180343

Signed-off-by: szalpal <mszolucha@nvidia.com>

review

7951c4d

Signed-off-by: szalpal <mszolucha@nvidia.com>

klecki approved these changes Jan 25, 2023

View reviewed changes

szalpal merged commit 11cc181 into NVIDIA:main Jan 25, 2023

szalpal deleted the mxd_input branch February 9, 2024 00:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MixedBackend` support for `InputOperator` #4603

`MixedBackend` support for `InputOperator` #4603

szalpal commented Jan 23, 2023 •

edited

Loading

szalpal commented Jan 23, 2023

dali-automaton commented Jan 23, 2023

szalpal commented Jan 23, 2023

dali-automaton commented Jan 23, 2023

dali-automaton commented Jan 24, 2023

jantonguirao Jan 24, 2023

jantonguirao Jan 24, 2023

jantonguirao Jan 24, 2023

klecki left a comment

klecki Jan 24, 2023

szalpal Jan 24, 2023

klecki Jan 24, 2023

szalpal Jan 24, 2023

klecki Jan 25, 2023

klecki Jan 24, 2023

szalpal Jan 24, 2023

szalpal commented Jan 24, 2023

dali-automaton commented Jan 24, 2023

dali-automaton commented Jan 25, 2023

		target.CopySample(sample_id, *tensor_list_elm.front(), sample_id,
		AccessOrder::host());

MixedBackend support for InputOperator #4603

MixedBackend support for InputOperator #4603

Conversation

szalpal commented Jan 23, 2023 • edited Loading

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

szalpal commented Jan 23, 2023

dali-automaton commented Jan 23, 2023

szalpal commented Jan 23, 2023

dali-automaton commented Jan 23, 2023

dali-automaton commented Jan 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klecki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szalpal commented Jan 24, 2023

dali-automaton commented Jan 24, 2023

dali-automaton commented Jan 25, 2023

`MixedBackend` support for `InputOperator` #4603

`MixedBackend` support for `InputOperator` #4603

szalpal commented Jan 23, 2023 •

edited

Loading