Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consensus] Active Pacemaker, TimeoutAggregator interface changes #2155

Merged

Conversation

durkmurder
Copy link
Member

@durkmurder durkmurder commented Mar 16, 2022

https://github.com/dapperlabs/flow-go/issues/6188

Context

This PR introduces TimeoutAggregator interface which is very similar to VoteAggregator and serves same purposes.

  • Introduced new interface TimeoutAggregator
  • Integrated TimeoutAggregator into both compliance cores
  • Test suit will be extended/implemented in implementation issue
  • Added new data structures

consensus/hotstuff/timeout_aggregator.go Outdated Show resolved Hide resolved
consensus/hotstuff/timeout_aggregator.go Outdated Show resolved Hide resolved
View: timeout.View,
HighestQC: timeout.HighestQC,
LastViewTC: timeout.LastViewTC,
SignerID: originID,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's always safe to assume the message sender was the creator/signer of the timeout object. For example, if I am synchronizing with another node N, I should receive all the TOs that N knows about. N will send me all these TOs, including TOs it didn't sign.

durkmurder and others added 2 commits March 24, 2022 14:53
Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>
consensus/hotstuff/timeout_aggregator.go Outdated Show resolved Hide resolved
consensus/hotstuff/timeout_aggregator.go Outdated Show resolved Hide resolved
@@ -300,6 +301,21 @@ func (c *Core) OnBlockVote(originID flow.Identifier, vote *messages.ClusterBlock
return nil
}

func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *model.TimeoutObject) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️

It is an intrinsically unsafe pattern to let the sender of a message specify their own identity. (Lying about their own identity = impersonation attack). On the wire, SignerID should not be transmitted. Our networking layer cryptographically authenticates the message's origin.

If you could replicate the pattern from messages.BlockVote that would be great:

  • messages.BlockVote is the on-the-wire format, without any SignerID
  • the compliance engine then converts to the message to hotstuff's model.Vote structure and fills in the authenticated value (-> code)
Suggested change
func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *model.TimeoutObject) error {
func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *messages.TimeoutObject) error {

Copy link
Member Author

@durkmurder durkmurder Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this pattern but then I've encountered a problem.
To build model.Vote from messages.BlockVote we do:

	v := &model.Vote{
		View:     vote.View,
		BlockID:  vote.BlockID,
		SignerID: originID,
		SigData:  vote.SigData,
	}

because sender of messages.BlockVote(originID) is always voter, but timeout object can be relayed from other nodes, not necessary by creator, that is why I did this way. What is the correct way to deal with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this resulted from my suggestion to make TOs able to be synchronized between nodes which did not create the TO (based on Data Synchronization section of the architecture doc). To enable nodes A and B being able to synchronize all TOs (including from another node C), the TO needs to include its signer ID in the message.

However:

  • we don't a concrete plan to implement synchronizing TOs in the way specified by the Architecture Doc
  • instead, we have discussed having nodes periodically re-broadcast their own TOs, which largely fulfills the same role. This way, TOs can still be handled like votes and the message sender can always be assumed to be the TO signer.

So, I think it makes sense to use origin ID as signer ID for TOs, like we do for votes (Alex's suggestion).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original motivation for this was Libra/Diem implementation, they have dedicated sync mechanism for syncing TOs. Of course we could rely on broadcast but having a way to sync all known TOs to some replica seems a useful thing to have, especially after crash. I don't have a strong opinion but we need to make a decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that there is an implicit assumption here that might be worth-while to be explicit about:

  • I feel there are three (potentially different) modes in which we could relay timeout objects.
    1. A node broadcasting its own timeout message for the first time.
    2. A node's application layer forwarding a timeout object that it received earlier from a different node. (This would be similar to the way we synchronize blocks. Node A created the block. Node B received the block and its application layer stored it. At some arbitrary point in time, B's application layer forwards the block to node C)
    3. A node's application re-sending its own timeout object to some node upon request. (This would be similar to the way we re-request result approvals. A consensus node checks for the result approvals for each chunk and messages the verifiers whose result approval is still missing. Here, the data structure is always sent from the original creator).
  • Do we desire that timeout messages for all three modes have exactly the same wire-format?

Depending how we answer this question, the engineering complexity increases in different part of the code base.

  • Approach A: If we always use the same wire format, we have the lowest implementation complexity in the low-level message handling code. However, we increase the complexity of out attack mediation code:

    • If we are not careful, we might be vulnerable to an impersonation attack in mode 1.
    • Even if we are not vulnerable, we still require logic to attribute, challenge and slash an impersonation attempt.
  • Approach B: Alternatively, we could introduce different wire messages for the different communication modes:

    • In mode 1, we don't need an origin ID. If the SigData is broken, we slash the the sender using the message 'envelope' as proof. Mode 3 is exactly the same as mode 1, except that we also need a nonce in the response to work around the networking layer's message de-duplication.
    • In mode 2, we expect that the relayer only forwards valid TOs. Here, the relayer needs to include the SignerID, but if something is broken, we challenge the relayer.

    Here, we have increased the complexity of the low-level message handling code, because we have different wire formats for the different modes. Though, we have decreased implementation complexity in the security-critical parts of the code, which detect and handle byzantine inputs, be removing the possibility of an impersonation attack.

From my perspective, we would need all the checks and slashing logic from Approach B either way, even if we followed approach A. Just that for Approach A, we additionally also need to worry about the impersonation attack, which is impossible in approach B.

@@ -354,6 +355,20 @@ func (c *Core) OnBlockVote(originID flow.Identifier, vote *messages.BlockVote) e
return nil
}

func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *model.TimeoutObject) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same ⚠️

On the wire, SignerID should not be transmitted. Furthermore, It would be great to have different message types for main consensus vs collector consensus; again following the same patter as messages.BlockVote. You could for instance introduce message.TimeoutObject and message.CollectorTimeout

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>
Base automatically changed from yurii/6187-safety-data-interface-changes to feature/active-pacemaker April 15, 2022 08:37
Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interfaces look great. While we haven't decided whether or not to include the SignerID in the TimeoutObject's wire-format, I feel this discussion is probably beyond the scope of this PR.

Copy link
Member

@jordanschalm jordanschalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one suggested naming consistency change, otherwise looks good. Thank you!

engine/collection/compliance/core.go Outdated Show resolved Hide resolved
engine/consensus/compliance/core.go Outdated Show resolved Hide resolved
@durkmurder
Copy link
Member Author

bors merge

@bors
Copy link
Contributor

bors bot commented Apr 26, 2022

Build succeeded:

@bors bors bot merged commit 662cdce into feature/active-pacemaker Apr 26, 2022
@bors bors bot deleted the yurii/6188-timeout-aggregator-interface-changes branch April 26, 2022 22:26
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants