-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure keeping py_pool alive until pipline is garbage collected #3245
Conversation
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
!build |
CI MESSAGE: [2740575]: BUILD STARTED |
dali/python/nvidia/dali/pipeline.py
Outdated
# Pipeline backend doesn't really do anything with the pool, sole point of this call | ||
# is to ensure lifetime of the pool exceeds the lifetime of the pipeline's backend | ||
# so that shared memory managed by the pool is not freed before pipline is garbage collected. | ||
# Otherwise pipline may try to access freed memory which leads to crashes at the Python teardown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Pipeline backend doesn't really do anything with the pool, sole point of this call | |
# is to ensure lifetime of the pool exceeds the lifetime of the pipeline's backend | |
# so that shared memory managed by the pool is not freed before pipline is garbage collected. | |
# Otherwise pipline may try to access freed memory which leads to crashes at the Python teardown | |
# The sole point of this call is to ensure lifetime of the pool exceeds the lifetime of the pipeline's backend, | |
# which runs external source operator that may access memory owned by the pool, | |
# so that shared memory managed by the pool is not freed before pipline is garbage collected. | |
# Otherwise pipline may try to access freed memory which leads to crashes at the Python teardown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted the sentence, please take a look as I've removed I think unnecessary "which" parts.
Do you think we can add a test for that? |
CI MESSAGE: [2740575]: BUILD PASSED |
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
def create_py_file_reader(images_dir): | ||
with open(os.path.join(images_dir, "image_list.txt"), 'r') as f: | ||
file_label = [line.rstrip().split(' ') for line in f if line != ''] | ||
files, labels = zip(*file_label) | ||
|
||
def py_file_reader(sample_info): | ||
sample_idx = sample_info.idx_in_epoch | ||
jpeg_filename = files[sample_idx] | ||
label = np.int32([labels[sample_idx]]) | ||
with open(os.path.join(images_dir, jpeg_filename), 'rb') as f: | ||
encoded_img = np.frombuffer(f.read(), dtype=np.uint8) | ||
return encoded_img, label | ||
|
||
return py_file_reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need that?
How about just:
jpeg_file = os.path.join(data_root, 'db', 'single', 'jpeg', '510', 'ship-1083562_640.jpg')
def create_py_file_reader(images_dir):
encoded_img = np.fromfile(jpeg_file, dtype=np.uint8)
label = 1
return encoded_img, label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
!build |
CI MESSAGE: [2750244]: BUILD STARTED |
CI MESSAGE: [2750244]: BUILD PASSED |
Signed-off-by: Kamil Tokarski ktokarski@nvidia.com
Description
What happened in this PR
Fixes an issue of random Python crashes at teardown if parallel external source is used.
That behavior was caused by garbage collection order issue: python pool objects, and in effect, shared memory wrappers
could be collected before pipeline backed, which led to pipeline accessing unmmaped memory regions.
Additional information
Affected modules and functionalities:
Pipeline/Pipeline backend
Key points relevant for the review:
Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2239