Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bugs in C API and refactor tests #3350

Merged
merged 9 commits into from
Sep 20, 2021
Merged

Conversation

klecki
Copy link
Contributor

@klecki klecki commented Sep 17, 2021

Description

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (Redesign of existing code that doesn't affect functionality)
  • Other (e.g. Documentation, Tests, Configuration)

What happened in this PR

Fix daliDeserializeDefault - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing daliDeletePipeline in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs,
whilst accessing only one output iteration.
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Additional information

  • Affected modules and functionalities:
    C API

  • Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
The tests do not do cleanup after creating CAPI pipeline,
leaving live pipeline to be cleaned up by process shutdown.
Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, whilst accessing
only one output iteration.
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
(as it was using async execution) and the whole test process
could crash during shutdown.

The read outputs are also not released properly after
the last output is accessed.
This is a bit awkward part of C API.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Sep 17, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3004657]: BUILD STARTED

for (int i = 0; i < prefetch_queue_depth; i++) {
data.push_back(AllocBuffer<TypeParam>(num_elems * sizeof(uint8_t), false));
}
std::vector<TensorList<TypeParam>> input_wrapper(3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to use TensorVector unless TensorList is the data structure we are reworking towards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it should be

Suggested change
std::vector<TensorList<TypeParam>> input_wrapper(3);
std::vector<TensorList<TypeParam>> input_wrapper(prefetch_queue_depth);

but other then that I just kept the TensorList. What we will use is up to be decided, but I've heard voices to keep the TensorList to match what we have in Python. But we will see.

@JanuszL JanuszL self-assigned this Sep 17, 2021
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3004657]: BUILD PASSED

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
pipe_handle->ws = ws.release();
pipe_handle->copy_stream = stream.release();
pipe_handle->pipe = pipeline.release();
pipe_handle->batch_sizes_map = bs_map.release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pipe_handle->batch_sizes_map = bs_map.release();
pipe_handle->batch_size_map = bs_map.release();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pipe_handle->ws = ws.release();
pipe_handle->copy_stream = stream.release();
pipe_handle->pipe = pipeline.release();

auto bs_map = std::make_unique<batch_size_map_t>();
pipe_handle->batch_sizes_map = bs_map.release();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pipe_handle->batch_sizes_map = bs_map.release();
pipe_handle->batch_size_map = bs_map.release();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Sep 20, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3017866]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [3017866]: BUILD PASSED

@klecki klecki merged commit 325a1eb into NVIDIA:main Sep 20, 2021
@jantonguirao jantonguirao added the important-fix Fixes an important issue in the software or development environment. label Sep 30, 2021
cyyever pushed a commit to cyyever/DALI that referenced this pull request Oct 17, 2021
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jan 23, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Feb 21, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022
Fix `daliDeserializeDefault` - the batch size
map was not allocated and when the pipeline
handle was correctly cleaned, it tried to
deallocate memory under invalid pointer.

Add missing `daliDeletePipeline` in all C API tests
so the tests no longer leak the created pipeline.

Some tests (daliOutputCopySamples) used to fill the
prefetch queue and schedule additional runs, 
whilst accessing only one output iteration. 
The not-freed pipeline could be actively using the memory
when the memory resources were cleared on shutdown
and the whole test process could crash during shutdown.
Remove the additional unused iteration in such case.

Rework tests to not use access to underlying
contiguous buffer of TensorList.

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
important-fix Fixes an important issue in the software or development environment.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants