-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix large mini-batch handling in parallel external source #3768
Conversation
…for storing capacity Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
if not proc.exitcode: | ||
task_queue.close() | ||
proc.join() | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: no enter/exit to use with proc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For plain proc not really, but good point that it looked like a good candidate for ctx manager.
… large batch test Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
CI MESSAGE: [4281330]: BUILD STARTED |
CI MESSAGE: [4281330]: BUILD PASSED |
!build |
CI MESSAGE: [4284870]: BUILD STARTED |
CI MESSAGE: [4284870]: BUILD PASSED |
* Add better error messaging on c-struct serialization * Use wider type for storing capacity * Add test for handling large sample in parallel external source Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
* Add better error messaging on c-struct serialization * Use wider type for storing capacity * Add test for handling large sample in parallel external source Signed-off-by: Kamil Tokarski <ktokarski@nvidia.com>
Category:
Bug fix (non-breaking change which fixes an issue)
Description:
Parallel external source passes data from workers to the main process through shared memory buffers.
If the capacity of a buffer exceeds
max_int
, worker process fails to serialize the minibatch meta-data (which includes the capacity) when it is written into C-like structure (python'sstruct
package).This PR:
unsigned long long int
instead ofint
.Additional information:
Affected modules and functionalities:
_multiproc
module and tests.Key points relevant for the review:
unsigned long long int
is too much and simply unsignedint
there will be good enough?Checklist
Tests
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2679