fix: crashpad scope flushing synchronization #1019

supervacuus · 2024-07-19T16:37:55Z

This change cleans up a couple of issues in the current crashpad backend:

Via scope-flushing, we use mpack in the signal handler, which uses system malloc. This change configures mpack to use sentry_malloc, which switches to the page-allocator inside the signal handler. Fixes Inconsistent allocator usage causing crashes #687.
We flush the scope outside the crash handler and then again inside. This is only necessary on macOS and on Windows when a crash gets handled vie the WER module. There is no need to flush the scope into __sentry_event outside the handler on Linux.
The scope-flushing unnecessarily introduced a shared mutable variable between the two paths (via crashpad_backend_flush_scope and the signal handler), which could be called from two different threads. We can ensure that these two paths handle separate local events.
Further, the two paths could also lead to overwritten __sentry_event files if the scope flushing is triggered from a thread other than the signal handler thread, silently invalidating any changes to the event coming from the before_send and on_crash hooks. This is now fully synchronized in the following fashion:
- crashpad_backend_flush_scope signals via std::atomic<bool> that a flush is in progress
- if the application crashes and our signal handler wants to flush the scope into the crash event, it must wait until any flush-in-progress is finished
- the signal-handler also signals that the application has crashed and prevents further flushes from outside the signal handler
these changes fix Crashpad crash collection is not thread-safe #931
last but not least: crashpad invokes the FirstChanceHandler on any incoming signal (Linux) or unhandled exception (Windows). While we locked the signal handler on Linux, the Windows handler would be called twice if another thread also surfaces an exception. This is bad because the signal handler doesn't act like it will only be called once before it terminates. This adapts the code in the crashpad client to prevent second invocations to the FirstChanceHandler. This adapts the proposed changes by: FirstChanceHandler execute once crashpad#95

* reenable flushing on Windows due to WER module * TODO: std::memory_order_[relaxed|acquire|release]

...which ensures block-free synchronization on all platforms.

This reverts commit 562f7c2.

supervacuus added 6 commits July 4, 2024 17:20

fix: clean-up scope application for crash-events

cae3be3

explicitly sync handler flushing with atomics

35f0ec8

* reenable flushing on Windows due to WER module * TODO: std::memory_order_[relaxed|acquire|release]

replace std::atomic<bool> with std::atomic_flag...

562f7c2

...which ensures block-free synchronization on all platforms.

Revert "replace std::atomic<bool> with std::atomic_flag..."

fa9de71

This reverts commit 562f7c2.

Clean up memory order and assert atomics to be lock-free

f56fd60

Update changelog

ccddc71

supervacuus mentioned this pull request Jul 19, 2024

fix: adapt blocking of handlers for Linux and Windows getsentry/crashpad#109

Merged

supervacuus added 2 commits July 20, 2024 11:43

Fix python lint

13bbbfc

Conditionally exclude handler flush from non-linux/-windows builds

e9f139a

Swatinem approved these changes Jul 22, 2024

View reviewed changes

supervacuus mentioned this pull request Jul 23, 2024

FirstChanceHandler execute once getsentry/crashpad#95

Closed

Update crashpad reference after merging fork PR onto getsentry

c4c613e

supervacuus merged commit e58b655 into master Jul 23, 2024
20 checks passed

supervacuus deleted the fix/crashpad_scope_handling branch July 23, 2024 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: crashpad scope flushing synchronization #1019

fix: crashpad scope flushing synchronization #1019

supervacuus commented Jul 19, 2024

fix: crashpad scope flushing synchronization #1019

fix: crashpad scope flushing synchronization #1019

Conversation

supervacuus commented Jul 19, 2024