-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter match of BlockError in sync #6321
base: peerdas-peergroup-scoring-only
Are you sure you want to change the base?
Stricter match of BlockError in sync #6321
Conversation
for peer in peer_group.of_index(index) { | ||
cx.report_peer( | ||
*peer, | ||
PeerAction::MidToleranceError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should up this to LowToleranceError
); | ||
let recoverable = match other.rpc_scoring() { | ||
ErrorCategory::Internal { recoverable } => { | ||
if matches!(other, BlockError::ExecutionPayloadError(_)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this extra condition from the original match as I imagine that when EL goes offline the node would spam non-actionable errors
// The peer has nothing to do with this error, do not penalize them. | ||
ExecutionPayloadError::RequestFailed(_) => ErrorCategory::internal_recoverable(), | ||
// Execution payload is invalid | ||
ExecutionPayloadError::RejectedByExecutionEngine { .. } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If EL says the payload is invalid we MUST downscore this peer if fetched from RPC right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the peer has already imported the block into fork choice if we are fetching it over rpc
@@ -428,6 +472,30 @@ impl ExecutionPayloadError { | |||
ExecutionPayloadError::UnverifiedNonOptimisticCandidate => false, | |||
} | |||
} | |||
|
|||
pub fn penalize_rpc_peer(&self) -> ErrorCategory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this match statements be in beacon_chain on in sync / network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that we are using this specifically to recover from errors in block_lookups, I think it should be in block lookups, but can be convinced either way
a2ffbd2
to
5a69a7b
Compare
@@ -778,7 +778,7 @@ impl<T: BeaconChainTypes> NetworkBeaconProcessor<T> { | |||
}) | |||
} | |||
ref err @ BlockError::ExecutionPayloadError(ref epe) => { | |||
if !epe.penalize_peer() { | |||
if !epe.penalize_gossip_peer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if !epe.penalize_gossip_peer() { | |
if !epe.penalize_rpc_peer() { |
Error::KzgNotInitialized => ErrorCategory::internal_non_recoverable(), | ||
Error::SszTypes(_) => ErrorCategory::internal_non_recoverable(), | ||
// A ChainSegment RpcBlock does not include the expected blobs or columns | ||
Error::MissingBlobs | Error::MissingCustodyColumns => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't these be malicious? A peer sent us a response to a rpc request which has missing blobs for the same block they sent
ErrorCategory::internal_recoverable() | ||
} | ||
// Assume these errors to be recoverable | ||
Error::StoreError(_) | ||
| Error::DecodeError(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe decode errors are malicious too? I couldn't track down the exact place where we would return this error but seems like a peer issue
ErrorCategory::internal_recoverable() | ||
} | ||
// Assume these errors to be recoverable | ||
Error::StoreError(_) | ||
| Error::DecodeError(_) | ||
| Error::Unexpected | ||
| Error::ParentStateMissing(_) | ||
| Error::BlockReplayError(_) | ||
| Error::UnableToDetermineImportRequirement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this variant isn't used anywhere, can delete
@@ -137,6 +137,15 @@ impl PeerGroup { | |||
pub fn all(&self) -> impl Iterator<Item = &PeerId> + '_ { | |||
self.peers.keys() | |||
} | |||
pub fn of_index(&self, index: usize) -> impl Iterator<Item = &PeerId> + '_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not completely sure why we need an iterator here. Would there be a scenario where you get > 1 peer responsible for a single index here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harder to model with the type here, should we just return the first match? Or enforce uniqueness on the reverse index -> peer mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First match seems good to me
Self::BlockIsNotLaterThanParent { .. } => ErrorCategory::malicious_non_recoverable(), | ||
// TODO: Not clear what to set here | ||
Self::NonLinearParentRoots => ErrorCategory::malicious_recoverable(), | ||
Self::NonLinearSlots => ErrorCategory::malicious_recoverable(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
Self::NonLinearSlots => ErrorCategory::malicious_recoverable(), | ||
// TODO: Ensure the proposer signature is not part of this error set | ||
Self::PerBlockProcessingError(_) => ErrorCategory::malicious_non_recoverable(), | ||
// TODO: Are we sure all BeaconChainError are internal? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is a fair assumption
ErrorCategory::malicious_non_recoverable() | ||
} | ||
// TODO: Check if correct | ||
Self::Slashable => ErrorCategory::malicious_non_recoverable(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a proposer making a slashable block, neither the peer's fault nor is it our fault. Maybe needs a special case
// The peer has nothing to do with this error, do not penalize them. | ||
ExecutionPayloadError::RequestFailed(_) => ErrorCategory::internal_recoverable(), | ||
// Execution payload is invalid | ||
ExecutionPayloadError::RejectedByExecutionEngine { .. } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the peer has already imported the block into fork choice if we are fetching it over rpc
@@ -199,9 +199,11 @@ impl<T: BeaconChainTypes> DataAvailabilityChecker<T> { | |||
.now_duration() | |||
.ok_or(AvailabilityCheckError::SlotClockError)?; | |||
|
|||
// Note: currenlty not reporting which specific blob is invalid because we fetch all blobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note: currenlty not reporting which specific blob is invalid because we fetch all blobs | |
// Note: currently not reporting which specific blob is invalid because we fetch all blobs |
Issue Addressed
Builds on
I split the original scope of the PR into
Proposed Changes
Sync needs act on an error and decide:
The title of this PR points to 1+2, but 3 is an important decision that should be handled in the same mega-match statement. Currently sync has catch-all match statements on errors which leave ample space for ambiguity and issues like this when functionality expands. I think it's better to have an explicit and long match statement that answers 1+2+3 in a single place, and sync is more "dumb".