Remove possibility of access to contiguous TL buffer #3373

klecki · 2021-09-24T13:34:19Z

Description

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactoring (Redesign of existing code that doesn't affect functionality)
Other (e.g. Documentation, Tests, Configuration)

What happened in this PR

Keep escape-hatch functions for the purpose of Pipeline
output.

This is intended as intermediate step, the main
purpose is to not introduce the access to the underlying
buffer again into the code-base.

Get rid of those functions from TL tests.

Additional information

Affected modules and functionalities:
Buffer, Tensor, Tensor List, Pipeline outputs
Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-2255

klecki · 2021-09-24T13:34:41Z

dali/pipeline/data/buffer.h

@@ -86,71 +86,6 @@ class DLL_PUBLIC Buffer {
  inline Buffer() = default;
  virtual ~Buffer() = default;

-  /**


This was just moved from public: to protected:

I don't like it. The Buffer class is really quite useless after this change. How about making the inheritance of TensorList from Buffer protected (or even private) instead?

useless -> redundant ?

Switched to private inheritance.

dali/pipeline/data/tensor.h

JanuszL · 2021-09-24T14:07:48Z

dali/pipeline/data/tensor.h

+
+  /**
+   * @brief Returns a typed pointer to the underlying storage. If the
+   * buffer has not been allocated because it does not yet have a type,


Suggested change

* buffer has not been allocated because it does not yet have a type,

* tensor has not been allocated because it does not yet have a type,

As this is the Tensor API I would stick to using tensor in the docs.

I just copied the old Doxygen verbatim from tensor, I can adjust if you want.

I would appreciate that.

All removed, will resolve stale comments.

dali/pipeline/data/tensor.h

JanuszL · 2021-09-24T14:08:23Z

dali/pipeline/data/tensor.h

+   *
+   * If the buffer already has a valid type, and the calling type does
+   * not match, the type of the buffer is reset and the underlying
+   * storage is re-allocated if the buffer does not currently own


Suggested change

* storage is re-allocated if the buffer does not currently own

* storage is re-allocated if the tensor does not currently own

dali-automaton · 2021-09-24T14:08:36Z

CI MESSAGE: [3049105]: BUILD STARTED

dali/pipeline/data/tensor.h

dali/pipeline/data/tensor_list.h

JanuszL · 2021-09-24T14:16:39Z

dali/pipeline/data/tensor_list.h

+
+  /**
+   * @brief Return an un-typed pointer to the underlying storage.
+   * Buffer must be either empty or have a valid type and be contiguous.


Suggested change

* Buffer must be either empty or have a valid type and be contiguous.

* The memory must be either empty or have a valid type and be contiguous.

JanuszL · 2021-09-24T14:16:51Z

dali/pipeline/data/tensor_list.h

+
+  /**
+   * @brief Return an un-typed const pointer to the underlying storage.
+   * Buffer must be either empty or have a valid type and be contiguous.


Suggested change

* Buffer must be either empty or have a valid type and be contiguous.

* The memory must be either empty or have a valid type and be contiguous.

JanuszL · 2021-09-24T14:19:30Z

dali/pipeline/data/tensor_list_test.cc


    // Check the internals
-    ASSERT_NE(tensor_list->template mutable_data<float>(), nullptr);
+    ASSERT_TRUE(tensor_list->has_data());


Shouldn't you specialize has_data to the TensorList now (as it is implemented in the buffer)?

I maybe can do something, but I think it needs to wait for the proper changes.

JanuszL · 2021-09-24T14:21:20Z

dali/pipeline/data/tensor_list_test.cc


  // Check the internals
  ASSERT_EQ(tensor_list.ntensor(), shape.size());
  for (size_t i = 0; i < tensor_list.ntensor(); ++i) {
+    // ASSERT_EQ(ptrs[i], tensor_list.raw_tensor(i));


JanuszL · 2021-09-24T14:22:49Z

dali/pipeline/data/tensor_list_test.cc

    ASSERT_EQ(tensor_list.tensor_shape(i), shape[i]);
    ASSERT_EQ(tensor_list.tensor_offset(i), offsets[i]);
  }

-  // No memory allocation should have occured
-  ASSERT_EQ(ptr, tensor_list.raw_data());


Maybe we can save ptrs to each tensor and then check if no reallocation has happened.

I cannot check if no reallocation happen this way, as it will match only for the first pointer. I can add the unsafe call.

JanuszL · 2021-09-24T14:24:39Z

dali/pipeline/data/copy_to_external.h

+  DeviceGuard d(src.device_id());
+  const auto &type_info = src.type_info();
+
+  // TODO(klecki): Add a proper test for non-contiguous access when we can have non-contiguous


Do think we will have such case?
I imagine that DALI still should return raw outputs as continuous tensors so we can wrap them into CuPy/NumPy or DLPack directly.

I just wrote it for the sake of TODO for the next PR mostly, I guess we won't be returning non-contiguous stuff soon, but either way I will add an error here or test this code-path and error out somewhere else.

JanuszL · 2021-09-24T14:30:41Z

dali/pipeline/data/copy_to_external.h

+    const auto &src_shape = src.shape();
+    auto *dst_buf = static_cast<uint8_t *>(dst); SmallVector<void *, 256> to;
+    SmallVector<const void *, 256> from;
+    SmallVector<int64_t, 256> sizes;
+    int num_samples = src_shape.num_samples();
+    sizes.reserve(num_samples);
+    to.reserve(num_samples);
+    from.reserve(num_samples);
+    for (int i = 0; i < num_samples; i++) {
+      sizes.push_back(src_shape.tensor_size(i));
+      to.push_back(dst_buf);
+      dst_buf += sizes[i] * type_info.size();
+      from.push_back(src.raw_tensor(i));
+    }
+
+    type_info.template Copy<DstBackend, SrcBackend>(to.data(), from.data(), sizes.data(),
+                                                    num_samples, stream, use_copy_kernel);
+  }


Suggested change

const auto &src_shape = src.shape();

auto *dst_buf = static_cast<uint8_t *>(dst); SmallVector<void *, 256> to;

SmallVector<const void *, 256> from;

SmallVector<int64_t, 256> sizes;

int num_samples = src_shape.num_samples();

sizes.reserve(num_samples);

to.reserve(num_samples);

from.reserve(num_samples);

for (int i = 0; i < num_samples; i++) {

sizes.push_back(src_shape.tensor_size(i));

to.push_back(dst_buf);

dst_buf += sizes[i] * type_info.size();

from.push_back(src.raw_tensor(i));

}

type_info.template Copy<DstBackend, SrcBackend>(to.data(), from.data(), sizes.data(),

num_samples, stream, use_copy_kernel);

}

const auto &src_shape = src.shape();

SmallVector<const void *, 256> from;

SmallVector<int64_t, 256> sizes;

int num_samples = src_shape.num_samples();

sizes.reserve(num_samples);

to.reserve(num_samples);

from.reserve(num_samples);

for (int i = 0; i < num_samples; i++) {

sizes.push_back(src_shape.tensor_size(i));

from.push_back(src.raw_tensor(i));

}

type_info.template Copy<DstBackend, SrcBackend>(dst, from.data(), sizes.data(),

num_samples, stream, use_copy_kernel);

}

As we have:

template <typename DstBackend, typename SrcBackend> void TypeInfo::Copy(void *dst, const void** srcs, const Index* sizes, int n, cudaStream_t stream, bool use_copy_kernel) const {

dali-automaton · 2021-09-24T20:27:14Z

CI MESSAGE: [3049105]: BUILD PASSED

klecki · 2021-09-27T10:19:02Z

!build

dali-automaton · 2021-09-27T10:20:42Z

CI MESSAGE: [3063967]: BUILD STARTED

JanuszL · 2021-09-27T10:43:20Z

dali/util/user_stream.h

-        "Can only wait on user streams");
-    DeviceGuard g(dev);
-    CUDA_CALL(cudaStreamSynchronize(streams_[dev]));
+  DLL_PUBLIC void WaitForDevice(const dali::Tensor<GPUBackend> &t) {


Suggested change

DLL_PUBLIC void WaitForDevice(const dali::Tensor<GPUBackend> &t) {

template<template<GPUBackend>class Container>

DLL_PUBLIC void WaitForDevice(const Container<GPUBackend> &t) {

?

That's the other alternative, I just went with two overloads, I guess a coin flip to decide.

dali-automaton · 2021-09-27T12:14:21Z

CI MESSAGE: [3063967]: BUILD FAILED

Keep escape-hatch functions for the purpose of Pipeline output. This is intended as intermediate step, the main purpose is to not introduce the access to the underlying buffer again into the code-base. Get rid of those functions from TL tests. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

The type was previously implicit in the allocation Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki · 2021-09-27T12:47:48Z

!build

dali-automaton · 2021-09-27T12:50:41Z

CI MESSAGE: [3064661]: BUILD STARTED

dali-automaton · 2021-09-27T14:30:06Z

CI MESSAGE: [3064661]: BUILD PASSED

mzient · 2021-09-27T15:26:14Z

dali/pipeline/data/tensor_list.h

+  using Buffer<Backend>::SetGrowthFactor;
+  using Buffer<Backend>::SetShrinkThreshold;
+  using Buffer<Backend>::GetGrowthFactor;
+  using Buffer<Backend>::GetShrinkThreshold;


Do we need those 4?

Change TensorList Buffer inheritance to private, and reexpose the old API. Keep the buffer-access methods private. Add escape-hatch functions for the purpose of Pipeline output. This is intended as intermediate step, the main purpose is to not introduce the access to the underlying buffer again into the code-base. Get rid of those functions from TL tests. Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>

klecki commented Sep 24, 2021

View reviewed changes

dali/pipeline/data/tensor.h Outdated Show resolved Hide resolved

klecki assigned awolant Sep 24, 2021

JanuszL self-assigned this Sep 24, 2021