-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MOD-3251] Stop heartbeats before sending the container snapshot RPC #2004
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to verify or test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need a pause-and-restart structure here. This looks like just pause, which means a restored container will be killed for not heartbeating.
…lient into matt/gvisor-flake-fix
Update: I have landed on a solution I am happy with, with almost no overhead (@thundergolfer and @luiscape I removed the bottleneck I spoke of). Here is a summary:
|
The following invariants hold:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment re disabling heartbeats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
Heartbeats make a connection to the gRPC proxy (
modal.sock
) which causes the container to open a FD. This open FD causes gVisor to crash duringrunsc checkpoint
. I ran the repro over 300 times with 0 flakes.