[Consensus] Active Pacemaker, `TimeoutAggregator` interface changes #2155

durkmurder · 2022-03-16T18:48:34Z

https://github.com/dapperlabs/flow-go/issues/6188

Context

This PR introduces TimeoutAggregator interface which is very similar to VoteAggregator and serves same purposes.

Introduced new interface TimeoutAggregator
Integrated TimeoutAggregator into both compliance cores
Test suit will be extended/implemented in implementation issue
Added new data structures

consensus/hotstuff/timeout_aggregator.go

jordanschalm · 2022-03-22T19:48:26Z

engine/consensus/compliance/core.go

+		View:       timeout.View,
+		HighestQC:  timeout.HighestQC,
+		LastViewTC: timeout.LastViewTC,
+		SignerID:   originID,


I'm not sure it's always safe to assume the message sender was the creator/signer of the timeout object. For example, if I am synchronizing with another node N, I should receive all the TOs that N knows about. N will send me all these TOs, including TOs it didn't sign.

Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

consensus/hotstuff/timeout_aggregator.go

AlexHentschel · 2022-04-14T03:42:07Z

engine/collection/compliance/core.go

@@ -300,6 +301,21 @@ func (c *Core) OnBlockVote(originID flow.Identifier, vote *messages.ClusterBlock
 	return nil
 }

+func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *model.TimeoutObject) error {


⚠️ ❗

It is an intrinsically unsafe pattern to let the sender of a message specify their own identity. (Lying about their own identity = impersonation attack). On the wire, SignerID should not be transmitted. Our networking layer cryptographically authenticates the message's origin.

If you could replicate the pattern from messages.BlockVote that would be great:

messages.BlockVote is the on-the-wire format, without any SignerID

the compliance engine then converts to the message to hotstuff's model.Vote structure and fills in the authenticated value (-> code)

Suggested change

func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *model.TimeoutObject) error {

func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *messages.TimeoutObject) error {

I've done this pattern but then I've encountered a problem.
To build model.Vote from messages.BlockVote we do:

v := &model.Vote{ View: vote.View, BlockID: vote.BlockID, SignerID: originID, SigData: vote.SigData, }

because sender of messages.BlockVote(originID) is always voter, but timeout object can be relayed from other nodes, not necessary by creator, that is why I did this way. What is the correct way to deal with this?

I think this resulted from my suggestion to make TOs able to be synchronized between nodes which did not create the TO (based on Data Synchronization section of the architecture doc). To enable nodes A and B being able to synchronize all TOs (including from another node C), the TO needs to include its signer ID in the message.

However:

we don't a concrete plan to implement synchronizing TOs in the way specified by the Architecture Doc

instead, we have discussed having nodes periodically re-broadcast their own TOs, which largely fulfills the same role. This way, TOs can still be handled like votes and the message sender can always be assumed to be the TO signer.

So, I think it makes sense to use origin ID as signer ID for TOs, like we do for votes (Alex's suggestion).

My original motivation for this was Libra/Diem implementation, they have dedicated sync mechanism for syncing TOs. Of course we could rely on broadcast but having a way to sync all known TOs to some replica seems a useful thing to have, especially after crash. I don't have a strong opinion but we need to make a decision.

@AlexHentschel @jordanschalm

I feel that there is an implicit assumption here that might be worth-while to be explicit about:

I feel there are three (potentially different) modes in which we could relay timeout objects.

A node broadcasting its own timeout message for the first time.

A node's application layer forwarding a timeout object that it received earlier from a different node. (This would be similar to the way we synchronize blocks. Node A created the block. Node B received the block and its application layer stored it. At some arbitrary point in time, B's application layer forwards the block to node C)

A node's application re-sending its own timeout object to some node upon request. (This would be similar to the way we re-request result approvals. A consensus node checks for the result approvals for each chunk and messages the verifiers whose result approval is still missing. Here, the data structure is always sent from the original creator).

Do we desire that timeout messages for all three modes have exactly the same wire-format?

Depending how we answer this question, the engineering complexity increases in different part of the code base.

Approach A: If we always use the same wire format, we have the lowest implementation complexity in the low-level message handling code. However, we increase the complexity of out attack mediation code:

If we are not careful, we might be vulnerable to an impersonation attack in mode 1.

Even if we are not vulnerable, we still require logic to attribute, challenge and slash an impersonation attempt.

Approach B: Alternatively, we could introduce different wire messages for the different communication modes:

In mode 1, we don't need an origin ID. If the SigData is broken, we slash the the sender using the message 'envelope' as proof. Mode 3 is exactly the same as mode 1, except that we also need a nonce in the response to work around the networking layer's message de-duplication.

In mode 2, we expect that the relayer only forwards valid TOs. Here, the relayer needs to include the SignerID, but if something is broken, we challenge the relayer.

Here, we have increased the complexity of the low-level message handling code, because we have different wire formats for the different modes. Though, we have decreased implementation complexity in the security-critical parts of the code, which detect and handle byzantine inputs, be removing the possibility of an impersonation attack.

From my perspective, we would need all the checks and slashing logic from Approach B either way, even if we followed approach A. Just that for Approach A, we additionally also need to worry about the impersonation attack, which is impossible in approach B.

AlexHentschel · 2022-04-14T04:02:11Z

engine/consensus/compliance/core.go

@@ -354,6 +355,20 @@ func (c *Core) OnBlockVote(originID flow.Identifier, vote *messages.BlockVote) e
 	return nil
 }

+func (c *Core) OnTimeoutObject(originID flow.Identifier, timeout *model.TimeoutObject) error {


same ⚠️ ❗

On the wire, SignerID should not be transmitted. Furthermore, It would be great to have different message types for main consensus vs collector consensus; again following the same patter as messages.BlockVote. You could for instance introduce message.TimeoutObject and message.CollectorTimeout

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

…gator-interface-changes

AlexHentschel

The interfaces look great. While we haven't decided whether or not to include the SignerID in the TimeoutObject's wire-format, I feel this discussion is probably beyond the scope of this PR.

jordanschalm

Added one suggested naming consistency change, otherwise looks good. Thank you!

engine/collection/compliance/core.go

engine/consensus/compliance/core.go

Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

durkmurder · 2022-04-26T22:04:50Z

bors merge

bors · 2022-04-26T22:26:18Z

Build succeeded:

durkmurder added 2 commits March 16, 2022 20:26

Added TimeoutAggregator interface

eae00b9

Integrated TimeoutAggregator into compliance engine

1f6e77f

durkmurder assigned zhangchiqing Mar 16, 2022

durkmurder requested a review from AlexHentschel as a code owner March 16, 2022 18:48

durkmurder assigned jordanschalm Mar 16, 2022

durkmurder requested review from zhangchiqing and jordanschalm as code owners March 16, 2022 18:48

durkmurder assigned AlexHentschel Mar 16, 2022

jordanschalm reviewed Mar 22, 2022

View reviewed changes

durkmurder and others added 2 commits March 24, 2022 14:53

Apply suggestions from code review

63e1eff

Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

Removed messages.TimeoutObject

c649841

AlexHentschel reviewed Apr 14, 2022

View reviewed changes

Apply suggestions from code review

cfed502

Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

Base automatically changed from yurii/6187-safety-data-interface-changes to feature/active-pacemaker April 15, 2022 08:37

Merge branch 'feature/active-pacemaker' into yurii/6188-timeout-aggre…

a6b3c79

…gator-interface-changes

AlexHentschel approved these changes Apr 26, 2022

View reviewed changes

Reverted messages.TimeoutObject change

465ac8c

jordanschalm approved these changes Apr 26, 2022

View reviewed changes

engine/collection/compliance/core.go Outdated Show resolved Hide resolved

engine/consensus/compliance/core.go Outdated Show resolved Hide resolved

durkmurder and others added 2 commits April 26, 2022 22:34

Apply suggestions from code review

946c71b

Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

Changed to ClusterTimeoutObject for collection node

e760368

bors bot merged commit 662cdce into feature/active-pacemaker Apr 26, 2022

bors bot deleted the yurii/6188-timeout-aggregator-interface-changes branch April 26, 2022 22:26

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Consensus] Active Pacemaker, `TimeoutAggregator` interface changes #2155

[Consensus] Active Pacemaker, `TimeoutAggregator` interface changes #2155

durkmurder commented Mar 16, 2022 •

edited

Loading

jordanschalm Mar 22, 2022

AlexHentschel Apr 14, 2022

durkmurder Apr 14, 2022 •

edited

Loading

jordanschalm Apr 18, 2022

durkmurder Apr 18, 2022

durkmurder Apr 18, 2022

AlexHentschel Apr 26, 2022

AlexHentschel Apr 14, 2022

AlexHentschel left a comment

jordanschalm left a comment

durkmurder commented Apr 26, 2022

bors bot commented Apr 26, 2022

	func (c Core) OnTimeoutObject(originID flow.Identifier, timeout model.TimeoutObject) error {
	func (c Core) OnTimeoutObject(originID flow.Identifier, timeout messages.TimeoutObject) error {

[Consensus] Active Pacemaker, TimeoutAggregator interface changes #2155

[Consensus] Active Pacemaker, TimeoutAggregator interface changes #2155

Conversation

durkmurder commented Mar 16, 2022 • edited Loading

Context

jordanschalm Mar 22, 2022

Choose a reason for hiding this comment

AlexHentschel Apr 14, 2022

Choose a reason for hiding this comment

durkmurder Apr 14, 2022 • edited Loading

Choose a reason for hiding this comment

jordanschalm Apr 18, 2022

Choose a reason for hiding this comment

durkmurder Apr 18, 2022

Choose a reason for hiding this comment

durkmurder Apr 18, 2022

Choose a reason for hiding this comment

AlexHentschel Apr 26, 2022

Choose a reason for hiding this comment

AlexHentschel Apr 14, 2022

Choose a reason for hiding this comment

AlexHentschel left a comment

Choose a reason for hiding this comment

jordanschalm left a comment

Choose a reason for hiding this comment

durkmurder commented Apr 26, 2022

bors bot commented Apr 26, 2022

[Consensus] Active Pacemaker, `TimeoutAggregator` interface changes #2155

[Consensus] Active Pacemaker, `TimeoutAggregator` interface changes #2155

durkmurder commented Mar 16, 2022 •

edited

Loading

durkmurder Apr 14, 2022 •

edited

Loading