Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MOD-3251] Stop heartbeats before sending the container snapshot RPC #2004

Merged
merged 28 commits into from
Jul 16, 2024
Merged
Changes from 2 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
f3c04c6
Stop heartbeats before snapshotting
mattnappo Jul 12, 2024
41adde8
Merge branch 'main' into matt/gvisor-flake-fix
thundergolfer Jul 12, 2024
4cceb72
Added asyncio.Event for pausing heartbeats
mattnappo Jul 12, 2024
2cdcd60
Merge branch 'matt/gvisor-flake-fix' of github.com:modal-labs/modal-c…
mattnappo Jul 12, 2024
626af1c
Use asyncio.Condition to ensure mutual exclusion
mattnappo Jul 12, 2024
7228652
Fixed unit tests
mattnappo Jul 12, 2024
b0bef65
Merge branch 'main' of github.com:modal-labs/modal-client into matt/g…
mattnappo Jul 15, 2024
86b8fd5
Fixed bug, but now its slow
mattnappo Jul 15, 2024
4c6ed72
Fixed bottleneck
mattnappo Jul 15, 2024
096a399
Remove old logic
mattnappo Jul 15, 2024
bf5400b
Renamed cond var
mattnappo Jul 15, 2024
484e24d
Wrote tests
mattnappo Jul 15, 2024
b6fa509
Wrote better tests
mattnappo Jul 15, 2024
6071891
Try to fix tests
mattnappo Jul 15, 2024
70c49ff
Retry
mattnappo Jul 15, 2024
5b423f5
Undo warning
mattnappo Jul 15, 2024
817709f
Await
mattnappo Jul 15, 2024
3259b2b
Try using async client
mattnappo Jul 15, 2024
3c3209e
Use servicer
mattnappo Jul 15, 2024
a9070ff
Use servicer
mattnappo Jul 15, 2024
153eb0a
Lint
mattnappo Jul 15, 2024
5bb4044
Add sleep in test
mattnappo Jul 15, 2024
91b5665
Reduce sleep time
mattnappo Jul 15, 2024
7e7d692
Merge branch 'main' into matt/gvisor-flake-fix
thundergolfer Jul 16, 2024
74ae84a
Address review
mattnappo Jul 16, 2024
7bbae23
Merge branch 'matt/gvisor-flake-fix' of github.com:modal-labs/modal-c…
mattnappo Jul 16, 2024
c0f5fcb
Run entire restore phase within the lock
mattnappo Jul 16, 2024
87a9036
Merge branch 'main' into matt/gvisor-flake-fix
thundergolfer Jul 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions modal/_container_io_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,9 @@ async def memory_snapshot(self) -> None:
if self.checkpoint_id:
logger.debug(f"Checkpoint ID: {self.checkpoint_id} (Memory Snapshot ID)")

# Heartbeats can leave the modal.sock file open, causing gVisor to crash
self.stop_heartbeat()
mattnappo marked this conversation as resolved.
Show resolved Hide resolved

await self._client.stub.ContainerCheckpoint(
api_pb2.ContainerCheckpointRequest(checkpoint_id=self.checkpoint_id)
)
Expand Down
Loading