Synchronize stream in the CUDAProductBase destructor #334

makortel · 2019-04-23T21:38:18Z

PR description:

Otherwise there are possibilities for weird races (e.g. combination non-ExternalWork producers, consumed-but-not-read CUDAProducts, CUDA streams executing work later than expected (= on the next event)).

PR validation:

Profiling workflow runs, unit tests run.

Otherwise there are possibilities for weird races (e.g. combination non-ExternalWork producers, consumed-but-not-read CUDAProducts, CUDA streams executing work later than expected (= on the next event)).

VinInn · 2019-04-24T08:00:48Z

adding this to #329 does not help much on V100 (as expected, I suppose)
as any action that slow down execution, crashes became rarer...

makortel · 2019-04-24T15:01:08Z

adding this to #329 does not help much on V100 (as expected, I suppose)

Right, I didn't expect it to fix the crashes, since (to my understanding) we already in practice synchronize each CUDA stream as the last action in each event. So this PR is more about correctness in certain corner cases (and it should really be done more efficiently).

Port PR 271 to master

fwyzard · 2019-09-10T21:08:37Z

@makortel , what shall we do with this ?

makortel · 2019-09-10T21:28:33Z

I think the problem itself should be addressed. To remind (also myself), weird behavior can happen if

non-ExternalWork EDProducer A makes a GPU product B
- the producer itself does not synchronize with the CUDA stream
- A has some device memory allocated as member data (per EDM stream), to be reused in the next event
EDModule C declares that it consumes B
- so that the producer A will be run also unscheduled
C does not actually read B
- C does not synchronize with the CUDA stream either

In this case it may happen that

Nobody synchronizes the CUDA stream that produces B (ok, this "happens" in any case)
The EDM stream may proceed to the next event (2) before the CUDA stream producing B has finished
The producer A uses another CUDA stream to produce B' on the next event
Now there is a race condition between kernels running on events 1 and 2

I don't like synchronizing the stream, but that's much simpler than any alternative. It would be interesting to check the impact on performance, but for that I should rebase the PR.

makortel · 2019-09-20T16:20:37Z

Superseded by #391.

Synchronize stream in the CUDAProductBase destructor

6d1bd64

Otherwise there are possibilities for weird races (e.g. combination non-ExternalWork producers, consumed-but-not-read CUDAProducts, CUDA streams executing work later than expected (= on the next event)).

fwyzard pushed a commit that referenced this pull request May 15, 2019

Merge pull request #334 from peruzzim/port_pr_271

1992b44

Port PR 271 to master

makortel mentioned this pull request Sep 20, 2019

Synchronize event in the CUDAProductBase destructor #391

Merged

makortel closed this Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronize stream in the CUDAProductBase destructor #334

Synchronize stream in the CUDAProductBase destructor #334

makortel commented Apr 23, 2019 •

edited

Loading

VinInn commented Apr 24, 2019

makortel commented Apr 24, 2019

fwyzard commented Sep 10, 2019

makortel commented Sep 10, 2019 •

edited

Loading

makortel commented Sep 20, 2019

Synchronize stream in the CUDAProductBase destructor #334

Synchronize stream in the CUDAProductBase destructor #334

Conversation

makortel commented Apr 23, 2019 • edited Loading

PR description:

PR validation:

VinInn commented Apr 24, 2019

makortel commented Apr 24, 2019

fwyzard commented Sep 10, 2019

makortel commented Sep 10, 2019 • edited Loading

makortel commented Sep 20, 2019

makortel commented Apr 23, 2019 •

edited

Loading

makortel commented Sep 10, 2019 •

edited

Loading