Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Laplacian GPU operator #3644

Merged
merged 8 commits into from
Feb 7, 2022
Merged

Add Laplacian GPU operator #3644

merged 8 commits into from
Feb 7, 2022

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented Jan 25, 2022

Category:

New feature (non-breaking change which adds functionality)

Description:

  • Adds GPU Laplacian operator. Splits instantiations of operator implementation with different In/Out template parameters to allow for parallel compilation (trick akin to the one in gaussian blur).
  • Moves a class that computes laplacian windows to the separate header in kernels dir.

Additional information:

Affected modules and functionalities:

  • Laplacian CPU

Key points relevant for the review:

Don't be discouraged with the number of added lines, 300 of them is boilerplate for splitting the instantiation of op impl in separate files.

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: LAPL.01 - LAPL.17

JIRA TASK: DALI-2438

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Jan 25, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3826306]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3826306]: BUILD PASSED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@@ -0,0 +1,125 @@
// Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved from laplacian_params.h

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not 1:1 copy though :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the hindsight - it is the worst kind of moving the code around.

@stiepan stiepan marked this pull request as ready for review January 26, 2022 15:41
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Jan 26, 2022

!build

@klecki klecki self-assigned this Jan 26, 2022
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3832078]: BUILD STARTED

constexpr static const int maxWindowSize = 23;

template <typename T>
class LaplacianWindows {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to laplacian_windows.h

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3832078]: BUILD PASSED

@stiepan stiepan changed the title Laplacian GPU operator Add Laplacian GPU operator Jan 26, 2022
@NVIDIA NVIDIA deleted a comment from dali-automaton Jan 27, 2022
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3840048]: BUILD STARTED

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, few comments.
Please check the test running time and the compilation time.

@@ -29,13 +29,20 @@ namespace dali {
#define LAPLACIAN_CPU_SUPPORTED_TYPES \
(uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t, float16, float)

// TODO(klecki): float16 support - it's not easily compatible with float window,
// need to introduce some cast in between and expose it in the kernels
// #define LAPLACIAN_GPU_SUPPORTED_TYPES (float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be an old comment here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposely copied that part as it seems to apply here as well as the op is based on the gpu convolution. I thought it would be easier to navigate, but maybe it is unnecessary or irrelevant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the // #define LAPLACIAN_GPU_SUPPORTED_TYPES (float) part

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, obviously. I did not notice that even when pointed to. Thanks :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 154 to 155
op_impl_uptr GetLaplacianGpuImpl(const OpSpec& spec,
const DimDesc& dim_desc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: weird formatting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

BOOL_SWITCH(dim_desc.is_channel_last(), HasChannels, (
BOOL_SWITCH(dim_desc.is_sequence(), IsSeq, (
using LaplacianImpl = LaplacianOpGpu<Out, In, Axes, HasChannels, IsSeq>;
return std::make_unique<LaplacianImpl>(spec, dim_desc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my recollection using make_unique in code that already has layers of templates resulted in super slow compilation times.
That's why for gaussian and arithm ops I used

std::unique_ptr<OpImplBase<GPUBackend>> result;
SWITCH( ...) (
   result.reset(new OpGpu<...>(...));
);

Can you check it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I remember wondering why the reset method. I'll check it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* to allow for parallel compilation of underlying kernels.
*/
template <typename Out, typename In>
op_impl_uptr GetLaplacianGpuImpl(const OpSpec& spec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are replicating a pattern from Gaussian Blur, you might want to take a look into the reasoning in: #3472 to have a GSG-style pass-by-pointer here. I'm just posting this, but I'm not sure I agree with the reasoning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


std::array<bool, axes> has_smoothing = uniform_array<axes>(false);
for (int sample_idx = 0; sample_idx < nsamples; sample_idx++) {
const auto& window_sizes = args.GetWindowSizes(sample_idx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LaplacianArgs prohibits from smoothing one sample and not smoothing other sample or is it allowed?
Can we have some weird case where smoothing window would have a 1D size for one sample and be empty for other? Or is it still 1D with {0} size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no empty windows and the smallest window size is 1 (that corresponds to a window [1]). If for given partial derivative all samples don't require smoothing the whole list of windows is empty. If some samples require smoothing and some don't, those that don't will be convolved with [1].

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Jan 31, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3862472]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3862472]: BUILD PASSED

@stiepan stiepan merged commit 6c73ace into NVIDIA:main Feb 7, 2022
cyyever pushed a commit to cyyever/DALI that referenced this pull request Feb 21, 2022
* Add Laplacian GPU operator
* Move LaplacianWindows to kernels
* Add slow attr to some of Laplacian Python tests

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@JanuszL JanuszL mentioned this pull request Mar 30, 2022
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
* Add Laplacian GPU operator
* Move LaplacianWindows to kernels
* Add slow attr to some of Laplacian Python tests

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
* Add Laplacian GPU operator
* Move LaplacianWindows to kernels
* Add slow attr to some of Laplacian Python tests

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants