Collection`compliance`, `epochmgr` Engines Implement `component.Component` #3248

jordanschalm · 2022-09-19T16:16:24Z

⚠️ Depends on #3121

This PR updates collection/compliance.Engine and collection/epochmgr.Engine to implement component.Component (#3121, but for collection nodes).

…omponent-pattern-ln-engine

engine/collection/compliance/engine.go

zhangchiqing · 2022-09-20T15:52:16Z

engine/collection/epochmgr/epoch_components.go

+		// wait for shutdown to be commenced
+		<-ctx.Done()
+		// wait for compliance engine and event loop to shut down
+		<-util.AllDone(components.prop, components.sync, components.voteAggregator, components.timeoutAggregator)


can itself shut down before other components to shut down?

would it miss a few events if it's shut down earlier?

can itself shut down before other components to shut down?

Do you mean, "can the epochmgr engine shut down before the epoch components shut down?" (What is itself referring to?)

would it miss a few events if it's shut down earlier?

Yes, although I think we can miss events regardless of the order in which we are shutting down. Shutting down means, we are either turning off an old cluster consensus or shutting down the whole node. In neither case are missed events problematic from my perspective.

Oh, I see. the ctx is actually the shut down signal itself.

I thought it was an engine's Done channel, actually we are moving away from that pattern.

codecov-commenter · 2022-09-20T16:00:25Z

Codecov Report

Merging #3248 (1f8f2e9) into feature/active-pacemaker (03fb58b) will increase coverage by 0.00%.
The diff coverage is 67.08%.

@@                    Coverage Diff                    @@
##           feature/active-pacemaker    #3248   +/-   ##
=========================================================
  Coverage                     55.26%   55.27%           
=========================================================
  Files                           746      748    +2     
  Lines                         69325    69350   +25     
=========================================================
+ Hits                          38314    38330   +16     
- Misses                        27880    27889    +9     
  Partials                       3131     3131

Flag	Coverage Δ
unittests	`55.27% <67.08%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
module/epochs/base_client.go	`0.00% <0.00%> (ø)`
module/epochs/qc_client.go	`0.00% <0.00%> (ø)`
module/epochs/qc_voter.go	`58.82% <5.00%> (-6.28%)`	⬇️
engine/collection/synchronization/engine.go	`68.40% <50.00%> (-0.58%)`	⬇️
engine/consensus/compliance/engine.go	`63.79% <70.00%> (+2.38%)`	⬆️
module/epochs/errors.go	`72.72% <72.72%> (ø)`
engine/collection/compliance/engine.go	`66.48% <74.62%> (+2.92%)`	⬆️
engine/collection/epochmgr/engine.go	`74.23% <76.55%> (-2.10%)`	⬇️
engine/collection/epochmgr/epoch_components.go	`100.00% <100.00%> (ø)`
...e/orchestrator/mock_corruptible_conduit_factory.go	`80.59% <0.00%> (-2.99%)`	⬇️
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

zhangchiqing · 2022-09-20T16:22:00Z

engine/collection/epochmgr/epoch_components.go

+		// wait for shutdown to be commenced
+		<-ctx.Done()
+		// wait for compliance engine and event loop to shut down
+		<-util.AllDone(components.prop, components.sync, components.voteAggregator, components.timeoutAggregator)


Oh, I see. the ctx is actually the shut down signal itself.

I thought it was an engine's Done channel, actually we are moving away from that pattern.

…github.com:onflow/flow-go into jordan/6263-compliance-component-pattern-ln-engine

jordanschalm · 2022-09-23T18:05:45Z

CI Test Failure:

--- FAIL: TestClusterSwitchover_MultiCollectorCluster (11.04s)
    cluster_switchover_test.go:262: expecting transaction 08f3dc749f467a58c4e7da5a4d5607df6defa9403dd7c276517d143f962d70a7 in epoch 1 for cluster 0
    cluster_switchover_test.go:403: got guarantee from 909b9654e4c1256195ff459ab9f56a3135d403f886f8c53e5bc5355085ced986
    cluster_switchover_test.go:403: got guarantee from 4e4cdb4c29b887885c54c628be0daab61d8602df492034f66a15d8dfa87e6b96
    cluster_switchover_test.go:262: expecting transaction ec7bf89c3d4b59ce6977d34d02a26b8792c9cade28b998137172317aca7513da in epoch 2 for cluster 0
    cluster_switchover_test.go:403: got guarantee from 909b9654e4c1256195ff459ab9f56a3135d403f886f8c53e5bc5355085ced986
    cluster_switchover_test.go:403: got guarantee from 4e4cdb4c29b887885c54c628be0daab61d8602df492034f66a15d8dfa87e6b96
    irrecoverable.go:17: 
        	Error Trace:	/home/runner/work/flow-go/flow-go/engine/collection/test/irrecoverable.go:17
        	            				/home/runner/work/flow-go/flow-go/engine/collection/test/asm_amd64.s:1594
        	Error:      	Received unexpected error:
        	            	could not process proposal feee1c1dae702e7645966b68249d6d7d230b1d573b2001476fa10840fa556226: could not add block (feee1c1dae702e7645966b68249d6d7d230b1d573b2001476fa10840fa556226) to vote aggregator: could not process block: feee1c1dae702e7645966b68249d6d7d230b1d573b2001476fa10840fa556226, internal error updating VoteProcessor's status from VoteCollectorStatusCaching to VoteCollectorStatusVerifying for block feee1c1dae702e7645966b68249d6d7d230b1d573b2001476fa10840fa556226: failed to create VerifyingVoteProcessor for block feee1c1dae702e7645966b68249d6d7d230b1d573b2001476fa10840fa556226: instantiating vote processor for block feee1c1dae702e7645966b68249d6d7d230b1d573b2001476fa10840fa556226 failed: error retrieving consensus participants: could not retrieve resource: key not found
        	Test:       	TestClusterSwitchover_MultiCollectorCluster
        	Messages:   	observed unexpected irrecoverable error
FAIL
coverage: [no statements]
FAIL	github.com/onflow/flow-go/engine/collection/test	32.680s

CI Run

⚠️ Problem with the Collection-Node version - We can't guarantee we'll know Identities to satisfy IdentitiesByBlock

it is only a problem with the tests, because cluster.Mutator checks that the ref block ID is known

…omponent-pattern-ln-engine

durkmurder

First batch of comments

durkmurder · 2022-09-29T08:36:02Z

engine/collection/compliance/engine.go

-	return e.lm.Stopped()
-}
+	e.log.Info().Msg("starting hotstuff")
+	e.core.hotstuff.Start(ctx)


since both hotstuff and compliance are separate module which implement component.Component, do we want to start hotstuff from compliance engine? I would say we could start it as separate component, that would be more explicit

Moreover it will simplify design of compliance engine

Totally agree - might be better to do that along with MessageHub though. I feel like changing it now would largely get clobbered by MessageHub rewiring

engine/collection/epochmgr/epoch_components.go

network/errors.go

engine/collection/epochmgr/engine.go

durkmurder · 2022-09-29T09:05:55Z

engine/collection/epochmgr/engine.go

 	}

+	e.cm = component.NewComponentManagerBuilder().
+		AddWorker(e.handleEpochSetupPhaseStartedLoop).


do we really need 3 different worker threads to dispatch very infrequent events? I feel like it's unnecessary

engine/collection/epochmgr/engine.go

durkmurder · 2022-09-29T14:21:39Z

engine/collection/synchronization/engine.go

-		e.comp.SubmitLocal(synced)
+		// forward the block to the compliance engine for validation and processing
+		// we use the network.MessageProcessor interface here because the block is un-validated
+		err := e.comp.Process(channels.SyncCluster(block.Header.ChainID), originID, synced)


I think we need to pass me.NodeID() here instead of originID, original sender is already part of SyncedClusterBlock

Yeah, I feel like we're mixing practices a bit here.

We decided to use Process API, because the messages have not been validated (should be treated as originating from another node)

But passing me.NodeID() as originID indicates the message originates from this node (trusted)

The MessageProcessor interface says

flow-go/network/engine.go

Lines 43 to 47 in 6a39c8c

// MessageProcessor represents a component which receives messages from the

// networking layer. Since these messages come from other nodes, which may

// be Byzantine, implementations must expect and handle arbitrary message inputs

// (including invalid message types, malformed messages, etc.). Because of this,

// node-internal messages should NEVER be submitted to a component using Process.

Overall I think the API doesn't fit well with this usage pattern, where an untrusted message is passing through an internal component, without being validated first. Though setting originID to the local node ID for an untrusted message seems like the worse of two not-great options here.

I am not sure here. We need to define clear rules how should we treat such cases.
For instance sync engine doesn't do any checks so technically it's an untrusted input on other side it didn't come from network layer but from the sync engine.
I would treat this case as general one as if message comes from unknown origin and only use trusted callbacks when we actually need different processing logic.

OK, for now I will do that and pass me.NodeID with a comment.

Co-authored-by: Yurii Oleksyshyn <yuraolex@gmail.com>

module/epochs/qc_client.go

…github.com:onflow/flow-go into jordan/6263-compliance-component-pattern-ln-engine

…omponent-pattern-ln-engine

durkmurder

Looks good to me, thanks, very solid. The only thing is we need to decide on message passing pattern between different components, otherwise all good.

jordanschalm · 2022-10-04T18:58:41Z

bors merge

bors · 2022-10-04T19:28:48Z

Build succeeded:

3294: Front-Load HotStuff Verification Pt. 1: Update Compliance Engines r=jordanschalm a=jordanschalm This PR modifies compliance engines to front-load HotStuff verification: * Once a block connects to the finalized state, we validate the proposal using HotStuff first, and only then attempt to extend the state (compliance checks) * This means, all blocks in storage have been fully validated (there is no need for a separate `MarkValid` step) ⚠️ Depends on #3248 👉 Removing `MarkValid` will be done in a separate PR: #3303 Co-authored-by: Jordan Schalm <jordan@dapperlabs.com>

3294: Front-Load HotStuff Verification Pt. 1: Update Compliance Engines r=jordanschalm a=jordanschalm This PR modifies compliance engines to front-load HotStuff verification: * Once a block connects to the finalized state, we validate the proposal using HotStuff first, and only then attempt to extend the state (compliance checks) * This means, all blocks in storage have been fully validated (there is no need for a separate `MarkValid` step) ⚠️ Depends on #3248 👉 Removing `MarkValid` will be done in a separate PR: #3303 Co-authored-by: Jordan Schalm <jordan@dapperlabs.com> Co-authored-by: Alexander Hentschel <alex.hentschel@axiomzen.co>

jordanschalm added 5 commits September 19, 2022 09:47

style: move func call out of loop

d95c16f

feat: use component manager in ln comp eng

dab79e9

feat: ln comp eng: remove unit.Launch usage

1d39cbc

feat: update LN comp eng tests

a8ff099

rm unused func

1872d1c

jordanschalm changed the title ~~Collection Compliance, Epoch Manager Engines Implement component.Component~~ Collection Compliance Engine Implements component.Component Sep 19, 2022

jordanschalm requested a review from durkmurder September 19, 2022 16:24

jordanschalm marked this pull request as ready for review September 19, 2022 16:24

jordanschalm requested review from AlexHentschel and zhangchiqing as code owners September 19, 2022 16:24

update epochmgr factories

007bbf1

jordanschalm marked this pull request as draft September 19, 2022 18:24

jordanschalm added 2 commits September 19, 2022 17:28

feat: epochmgr uses ComponentManager

c5915f7

feat: epoch component error handling

4550ae9

jordanschalm changed the title ~~Collection Compliance Engine Implements component.Component~~ Collectioncompliance, epochmgr Engines Implement component.Component Sep 20, 2022

jordanschalm added the S-Consensus label Sep 20, 2022

Base automatically changed from jordan/6263-compliance-component-pattern to feature/active-pacemaker September 20, 2022 14:41

Merge branch 'feature/active-pacemaker' into jordan/6263-compliance-c…

90f9fc1

…omponent-pattern-ln-engine

zhangchiqing reviewed Sep 20, 2022

View reviewed changes

zhangchiqing approved these changes Sep 20, 2022

View reviewed changes

jordanschalm added 4 commits September 23, 2022 10:47

fix: re-add log

ccbd21b

Merge branch 'jordan/6263-compliance-component-pattern-ln-engine' of …

f36dec7

…github.com:onflow/flow-go into jordan/6263-compliance-component-pattern-ln-engine

rm un-differentiated sentinel

6f27004

fix: update epochmgr tests

1759474

jordanschalm marked this pull request as ready for review September 23, 2022 15:44

jordanschalm added 2 commits September 23, 2022 12:36

fix: ln sync eng tests

42753b7

fix: start prop engine as component

a033e29

fix: error handling/docs in QC voter + deps

38a398d

jordanschalm requested a review from vishalchangrani as a code owner September 23, 2022 20:09

jordanschalm added 3 commits September 23, 2022 16:12

lint

d4501c9

doc: improve godoc

8e3e51b

Merge branch 'feature/active-pacemaker' into jordan/6263-compliance-c…

16d7bc2

…omponent-pattern-ln-engine

jordanschalm removed request for vishalchangrani, yhassanzadeh13 and AlexHentschel September 26, 2022 16:03

durkmurder reviewed Sep 29, 2022

View reviewed changes

jordanschalm mentioned this pull request Sep 29, 2022

Front-Load HotStuff Verification Pt. 1: Update Compliance Engines #3294

Merged

jordanschalm and others added 3 commits September 29, 2022 16:01

style: rename prop to compliance engine

5b06655

doc: improve docs + rename

1d8fa0b

Apply suggestions from code review

b42decf

Co-authored-by: Yurii Oleksyshyn <yuraolex@gmail.com>

durkmurder reviewed Sep 30, 2022

View reviewed changes

module/epochs/qc_client.go Outdated Show resolved Hide resolved

jordanschalm added 8 commits September 30, 2022 10:47

rm block-range msg queue

bb62971

Merge branch 'jordan/6263-compliance-component-pattern-ln-engine' of …

aeb3b79

…github.com:onflow/flow-go into jordan/6263-compliance-component-pattern-ln-engine

use rwlock in epochmgr

f386f51

consolidate epochmgr workers

eccca74

avoid extra cast

da4fd4a

lint

a8c512b

Merge branch 'feature/active-pacemaker' into jordan/6263-compliance-c…

38b2b7d

…omponent-pattern-ln-engine

doc: usage of TransientError

1f8f2e9

durkmurder approved these changes Oct 3, 2022

View reviewed changes

fix: use me.NodeID for internal untrusted message

67862a7

bors bot merged commit 42c7f37 into feature/active-pacemaker Oct 4, 2022

bors bot deleted the jordan/6263-compliance-component-pattern-ln-engine branch October 4, 2022 19:28

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collection`compliance`, `epochmgr` Engines Implement `component.Component` #3248

Collection`compliance`, `epochmgr` Engines Implement `component.Component` #3248

jordanschalm commented Sep 19, 2022 •

edited

Loading

zhangchiqing Sep 20, 2022

jordanschalm Sep 20, 2022

zhangchiqing Sep 20, 2022

codecov-commenter commented Sep 20, 2022 •

edited

Loading

zhangchiqing Sep 20, 2022

jordanschalm commented Sep 23, 2022 •

edited

Loading

durkmurder left a comment

durkmurder Sep 29, 2022

durkmurder Sep 29, 2022

jordanschalm Sep 30, 2022

durkmurder Sep 29, 2022

jordanschalm Sep 30, 2022

durkmurder Sep 29, 2022

jordanschalm Sep 30, 2022

durkmurder Oct 3, 2022

jordanschalm Oct 4, 2022

durkmurder left a comment

jordanschalm commented Oct 4, 2022

bors bot commented Oct 4, 2022

	// MessageProcessor represents a component which receives messages from the
	// networking layer. Since these messages come from other nodes, which may
	// be Byzantine, implementations must expect and handle arbitrary message inputs
	// (including invalid message types, malformed messages, etc.). Because of this,
	// node-internal messages should NEVER be submitted to a component using Process.

Collectioncompliance, epochmgr Engines Implement component.Component #3248

Collectioncompliance, epochmgr Engines Implement component.Component #3248

Conversation

jordanschalm commented Sep 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 20, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

jordanschalm commented Sep 23, 2022 • edited Loading

⚠️ Problem with the Collection-Node version - We can't guarantee we'll know Identities to satisfy IdentitiesByBlock

durkmurder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

durkmurder left a comment

Choose a reason for hiding this comment

jordanschalm commented Oct 4, 2022

bors bot commented Oct 4, 2022

Collection`compliance`, `epochmgr` Engines Implement `component.Component` #3248

Collection`compliance`, `epochmgr` Engines Implement `component.Component` #3248

jordanschalm commented Sep 19, 2022 •

edited

Loading

codecov-commenter commented Sep 20, 2022 •

edited

Loading

jordanschalm commented Sep 23, 2022 •

edited

Loading