Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-memory tree states #5533

Merged
merged 46 commits into from
Apr 24, 2024
Merged

In-memory tree states #5533

merged 46 commits into from
Apr 24, 2024

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Apr 8, 2024

Issue Addressed

This is the memory-only portion of tree-states, i.e. without any database schema changes.

Proposed Changes

  • Make BeaconState cheap to clone thanks to persistent data structures from milhouse.
  • Replace the snapshot cache with a new state cache inside the store which holds 32x more states for around the same memory cost 🎉
  • Delete the now-unnecessary block_production_state.
  • Delete StateProcessingStrategy. This could cause us to do more tree hashing in the case of an attester shuffling cache miss, but this is mitigated by 1) tree-states has faster tree hashing due to structural sharing, 2) the state cache is likely to hit even when the shuffling cache misses. It is less risky to remove the concept of inconsistent states altogether so that they can't end up in the new state_cache and poison it. We can re-optimise this codepath in future by adopting the HotStateSummary changes from full tree-states.

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress optimization Something to make Lighthouse run more efficiently. tree-states Upcoming state and database overhaul labels Apr 8, 2024
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Apr 23, 2024
@michaelsproul
Copy link
Member Author

I pushed a commit 970f3df which seems to have made memory usage less spikey on a restart (see 12:30 in chart). The problem previously was that the head state would not get cached in the state_cache on startup, creating the possibility for a cache-miss and reload from disk, which would increase memory usage due to the lack of full structural sharing.

spikeyness

The memory usage is still a bit spikier than I'd like, and this is despite a complete absence of cache misses. This may be memory usage unrelated to the state cache.

Even so, tree-states is using consistently less memory with lower spikes than stable:

bn_memory_comparison

The yellow line at the bottom is this branch, the rest are running variants of stable/unstable.

Copy link
Member

@realbigsean realbigsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@realbigsean
Copy link
Member

@mergify queue

Copy link

mergify bot commented Apr 23, 2024

queue

🛑 The pull request has been removed from the queue default

Pull request #5533 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Apr 23, 2024
@@ -4529,10 +4404,11 @@ impl<T: BeaconChainTypes> BeaconChain<T> {
let block = self
.get_blinded_block(&parent_block_root)?
.ok_or(Error::MissingBeaconBlock(parent_block_root))?;
let state = self
.get_state(&block.state_root(), Some(block.slot()))?
let (state_root, state) = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the log above needs to be updated to remove snapshot cache:

info!(
self.log,
"Missed snapshot cache during withdrawals calculation";
"slot" => proposal_slot,
"parent_block_root" => ?parent_block_root
);

@jimmygchen
Copy link
Member

Some minor comments:

@realbigsean
Copy link
Member

@mergify dequeue

Copy link

mergify bot commented Apr 24, 2024

dequeue

✅ The pull request has been removed from the queue default

@realbigsean
Copy link
Member

sorry @jimmygchen didn't realize you were mid-review!

@jimmygchen
Copy link
Member

All good, happy to merge this as it's gone through some testing and you've already done a round of review!
I'm just going through the changes to help me understand them, it'd probably take me a while. I haven't found anything major that's worth stopping the mege, I think we're good to merge and address further comments in a separate PR?

I'd be keen to get this in so

  • we avoid further code conflicts
  • others working on unstable can help discover potential issues not yet identified

@realbigsean
Copy link
Member

@jimmygchen has expressed offline he's good with merging and allowing feedback resolution in a followup PR so as to avoid conflicts

@realbigsean
Copy link
Member

@mergify queue

Copy link

mergify bot commented Apr 24, 2024

queue

🛑 The pull request has been removed from the queue default

Pull request #5533 has been dequeued by a dequeue command.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@realbigsean
Copy link
Member

@mergify requeue

Copy link

mergify bot commented Apr 24, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Apr 24, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 6196289

mergify bot added a commit that referenced this pull request Apr 24, 2024
@mergify mergify bot merged commit 6196289 into unstable Apr 24, 2024
27 checks passed
@mergify mergify bot deleted the tree-states-memory branch April 24, 2024 01:22
}
set_gauge_by_usize(
&BLOCK_PROCESSING_SNAPSHOT_CACHE_SIZE,
beacon_chain.store.state_cache_len(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename the metrics to BLOCK_PROCESSING_STATE_CACHE_SIZE instead?

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. ready-for-review The code is ready for review tree-states Upcoming state and database overhaul
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants