Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stricter match of BlockError in sync #6321

Open
wants to merge 1 commit into
base: peerdas-peergroup-scoring-only
Choose a base branch
from

Conversation

dapplion
Copy link
Collaborator

@dapplion dapplion commented Aug 28, 2024

Issue Addressed

Builds on

I split the original scope of the PR into

  1. Specific peer attributability -> Attribute invalid column proof error to correct peer #6377
  2. Stricter match of BlockError in sync -> this PR, so we can keep @pawanjay176 's review

Proposed Changes

Sync needs act on an error and decide:

  1. Should I penalize a peer for this error?
  2. Which peer should I penalize?
  3. Should I re-download this block component?

The title of this PR points to 1+2, but 3 is an important decision that should be handled in the same mega-match statement. Currently sync has catch-all match statements on errors which leave ample space for ambiguity and issues like this when functionality expands. I think it's better to have an explicit and long match statement that answers 1+2+3 in a single place, and sync is more "dumb".

⚠️ 🫱 please help me make sure that I mapping each error variant correctly.

for peer in peer_group.of_index(index) {
cx.report_peer(
*peer,
PeerAction::MidToleranceError,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should up this to LowToleranceError

);
let recoverable = match other.rpc_scoring() {
ErrorCategory::Internal { recoverable } => {
if matches!(other, BlockError::ExecutionPayloadError(_)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this extra condition from the original match as I imagine that when EL goes offline the node would spam non-actionable errors

// The peer has nothing to do with this error, do not penalize them.
ExecutionPayloadError::RequestFailed(_) => ErrorCategory::internal_recoverable(),
// Execution payload is invalid
ExecutionPayloadError::RejectedByExecutionEngine { .. }
Copy link
Collaborator Author

@dapplion dapplion Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If EL says the payload is invalid we MUST downscore this peer if fetched from RPC right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the peer has already imported the block into fork choice if we are fetching it over rpc

@@ -428,6 +472,30 @@ impl ExecutionPayloadError {
ExecutionPayloadError::UnverifiedNonOptimisticCandidate => false,
}
}

pub fn penalize_rpc_peer(&self) -> ErrorCategory {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this match statements be in beacon_chain on in sync / network?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that we are using this specifically to recover from errors in block_lookups, I think it should be in block lookups, but can be convinced either way

@pawanjay176 pawanjay176 added the under-review A reviewer has only partially completed a review. label Aug 29, 2024
@@ -778,7 +778,7 @@ impl<T: BeaconChainTypes> NetworkBeaconProcessor<T> {
})
}
ref err @ BlockError::ExecutionPayloadError(ref epe) => {
if !epe.penalize_peer() {
if !epe.penalize_gossip_peer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !epe.penalize_gossip_peer() {
if !epe.penalize_rpc_peer() {

Error::KzgNotInitialized => ErrorCategory::internal_non_recoverable(),
Error::SszTypes(_) => ErrorCategory::internal_non_recoverable(),
// A ChainSegment RpcBlock does not include the expected blobs or columns
Error::MissingBlobs | Error::MissingCustodyColumns => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't these be malicious? A peer sent us a response to a rpc request which has missing blobs for the same block they sent

ErrorCategory::internal_recoverable()
}
// Assume these errors to be recoverable
Error::StoreError(_)
| Error::DecodeError(_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe decode errors are malicious too? I couldn't track down the exact place where we would return this error but seems like a peer issue

ErrorCategory::internal_recoverable()
}
// Assume these errors to be recoverable
Error::StoreError(_)
| Error::DecodeError(_)
| Error::Unexpected
| Error::ParentStateMissing(_)
| Error::BlockReplayError(_)
| Error::UnableToDetermineImportRequirement
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this variant isn't used anywhere, can delete

@@ -137,6 +137,15 @@ impl PeerGroup {
pub fn all(&self) -> impl Iterator<Item = &PeerId> + '_ {
self.peers.keys()
}
pub fn of_index(&self, index: usize) -> impl Iterator<Item = &PeerId> + '_ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely sure why we need an iterator here. Would there be a scenario where you get > 1 peer responsible for a single index here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harder to model with the type here, should we just return the first match? Or enforce uniqueness on the reverse index -> peer mapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First match seems good to me

Self::BlockIsNotLaterThanParent { .. } => ErrorCategory::malicious_non_recoverable(),
// TODO: Not clear what to set here
Self::NonLinearParentRoots => ErrorCategory::malicious_recoverable(),
Self::NonLinearSlots => ErrorCategory::malicious_recoverable(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Self::NonLinearSlots => ErrorCategory::malicious_recoverable(),
// TODO: Ensure the proposer signature is not part of this error set
Self::PerBlockProcessingError(_) => ErrorCategory::malicious_non_recoverable(),
// TODO: Are we sure all BeaconChainError are internal?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a fair assumption

ErrorCategory::malicious_non_recoverable()
}
// TODO: Check if correct
Self::Slashable => ErrorCategory::malicious_non_recoverable(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a proposer making a slashable block, neither the peer's fault nor is it our fault. Maybe needs a special case

// The peer has nothing to do with this error, do not penalize them.
ExecutionPayloadError::RequestFailed(_) => ErrorCategory::internal_recoverable(),
// Execution payload is invalid
ExecutionPayloadError::RejectedByExecutionEngine { .. }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the peer has already imported the block into fork choice if we are fetching it over rpc

@@ -199,9 +199,11 @@ impl<T: BeaconChainTypes> DataAvailabilityChecker<T> {
.now_duration()
.ok_or(AvailabilityCheckError::SlotClockError)?;

// Note: currenlty not reporting which specific blob is invalid because we fetch all blobs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Note: currenlty not reporting which specific blob is invalid because we fetch all blobs
// Note: currently not reporting which specific blob is invalid because we fetch all blobs

@dapplion dapplion changed the base branch from unstable to peerdas-peergroup-scoring-only September 9, 2024 11:55
@dapplion dapplion changed the title Attribute invalid column proof error to correct peer Stricter match of BlockError in sync Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
under-review A reviewer has only partially completed a review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants