Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure keeping py_pool alive until pipline is garbage collected #3245

Merged
merged 6 commits into from
Aug 12, 2021

Conversation

stiepan
Copy link
Member

@stiepan stiepan commented Aug 11, 2021

Signed-off-by: Kamil Tokarski ktokarski@nvidia.com

Description

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (Redesign of existing code that doesn't affect functionality)
  • Other (e.g. Documentation, Tests, Configuration)

What happened in this PR

Fixes an issue of random Python crashes at teardown if parallel external source is used.
That behavior was caused by garbage collection order issue: python pool objects, and in effect, shared memory wrappers
could be collected before pipeline backed, which led to pipeline accessing unmmaped memory regions.

Additional information

  • Affected modules and functionalities:
    Pipeline/Pipeline backend

  • Key points relevant for the review:

Checklist

Tests

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-2239

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@JanuszL JanuszL self-assigned this Aug 11, 2021
@stiepan
Copy link
Member Author

stiepan commented Aug 11, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2740575]: BUILD STARTED

Comment on lines 555 to 558
# Pipeline backend doesn't really do anything with the pool, sole point of this call
# is to ensure lifetime of the pool exceeds the lifetime of the pipeline's backend
# so that shared memory managed by the pool is not freed before pipline is garbage collected.
# Otherwise pipline may try to access freed memory which leads to crashes at the Python teardown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Pipeline backend doesn't really do anything with the pool, sole point of this call
# is to ensure lifetime of the pool exceeds the lifetime of the pipeline's backend
# so that shared memory managed by the pool is not freed before pipline is garbage collected.
# Otherwise pipline may try to access freed memory which leads to crashes at the Python teardown
# The sole point of this call is to ensure lifetime of the pool exceeds the lifetime of the pipeline's backend,
# which runs external source operator that may access memory owned by the pool,
# so that shared memory managed by the pool is not freed before pipline is garbage collected.
# Otherwise pipline may try to access freed memory which leads to crashes at the Python teardown

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the sentence, please take a look as I've removed I think unnecessary "which" parts.

@JanuszL
Copy link
Contributor

JanuszL commented Aug 11, 2021

Do you think we can add a test for that?

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2740575]: BUILD PASSED

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Comment on lines 29 to 42
def create_py_file_reader(images_dir):
with open(os.path.join(images_dir, "image_list.txt"), 'r') as f:
file_label = [line.rstrip().split(' ') for line in f if line != '']
files, labels = zip(*file_label)

def py_file_reader(sample_info):
sample_idx = sample_info.idx_in_epoch
jpeg_filename = files[sample_idx]
label = np.int32([labels[sample_idx]])
with open(os.path.join(images_dir, jpeg_filename), 'rb') as f:
encoded_img = np.frombuffer(f.read(), dtype=np.uint8)
return encoded_img, label

return py_file_reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that?
How about just:

jpeg_file = os.path.join(data_root, 'db', 'single', 'jpeg', '510', 'ship-1083562_640.jpg')

def create_py_file_reader(images_dir):
    encoded_img = np.fromfile(jpeg_file, dtype=np.uint8)
    label = 1
    return encoded_img, label

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
@stiepan
Copy link
Member Author

stiepan commented Aug 12, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2750244]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2750244]: BUILD PASSED

@stiepan stiepan merged commit 3bea8e7 into NVIDIA:main Aug 12, 2021
@JanuszL JanuszL added the important-fix Fixes an important issue in the software or development environment. label Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
important-fix Fixes an important issue in the software or development environment.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants