Implement Lean BEEFY #10882

acatangiu · 2022-02-18T13:25:21Z

Description

This PR implements Lean BEEFY as described here.

Simplified BEEFY worker logic based on the invariant that GRANDPA will always finalize 1st block of each new session, meaning BEEFY worker is guaranteed to receive finality notification for the BEEFY mandatory blocks.

Under these conditions the current design is as follows:

session changes are detected based on BEEFY Digest present in BEEFY mandatory blocks,
on each new session new Rounds of voting is created, with old rounds being dropped (for gossip rounds, last 3 are still alive so votes are still being gossiped),
after processing finality for a block, the worker votes if a new voting target has become available as a result of said block finality processing,
incoming votes as well as self-created votes are processed and signed commitments are created for completed BEEFY voting rounds,
the worker votes if a new voting target becomes available once a round successfully completes.

On worker startup, the current validator set is retrieved from the BEEFY pallet. If it is the genesis validator set, worker starts voting right away considering Block #1 as session start.

Otherwise (not genesis), the worker will vote starting with mandatory block of the next session.

Later on when we add the BEEFY initial-sync (catch-up) logic, the worker will sync all past mandatory blocks Signed Commitments and will be able to start voting right away.

Fixes

Fixes paritytech/grandpa-bridge-gadget#16
Fixes paritytech/grandpa-bridge-gadget#159
Fixes paritytech/grandpa-bridge-gadget#162
Fixes paritytech/grandpa-bridge-gadget#182
Fixes paritytech/grandpa-bridge-gadget#254
Fixes paritytech/grandpa-bridge-gadget#256

client/beefy/src/worker.rs

tomusdrw

Cool stuff!

client/beefy/src/gossip.rs

client/beefy/src/worker.rs

Co-authored-by: Tomasz Drwięga <tomusdrw@users.noreply.github.com>

acatangiu · 2022-02-21T16:24:44Z

Still missing logic for the Beefy Worker to change the BEEFY validator set when GRANDPA one changes (and add the corresponding header digest).

acatangiu · 2022-02-22T13:16:53Z

Still missing logic for the Beefy Worker to change the BEEFY validator set when GRANDPA one changes (and add the corresponding header digest).

Fixed this in pallet-beefy where session needs to change even if authority keys stay the same.

acatangiu · 2022-03-15T09:41:49Z

@andresilva @tomusdrw can you please take another look?

vgeddes · 2022-03-16T18:47:20Z

primitives/beefy/Cargo.toml

@@ -20,7 +25,7 @@ sp-std = { version = "4.0.0", path = "../std", default-features = false }
 [dev-dependencies]
 hex = "0.4.3"
 hex-literal = "0.3"
-sp-keystore = { version = "0.12.0", path = "../keystore" }
+sp-keystore = { version = "0.12.0", path = "../keystore", default-features = false }


Shouldn't be necessary to disable the std feature since unit tests are always built with std enabled.

Suggested change

sp-keystore = { version = "0.12.0", path = "../keystore", default-features = false }

sp-keystore = { version = "0.12.0", path = "../keystore" }

andresilva

Sorry for taking so long to review. Overall LGTM, just minor nits.

client/beefy/src/metrics.rs

andresilva · 2022-03-21T15:09:54Z

client/beefy/src/gossip.rs

+			// TODO: right now we're using an object-level rebroadcast gate that will only
+			// allow **a single message** being rebroadcast every `REBROADCAST_AFTER` minutes.
+			//
+			// Should we instead have a per-message/hash rebroadcast cooldown?


IMO we should not make this a closure. The way the gossip state machine currently works is that it will call into message_allowed to get a function for validation. When it's time to do a rebroadcast it will pass the returned function through all messages in order to check whether to (re-)propagate. If this isn't a closure then do_rebroadcast will have the same value for all messages and thus the comment above won't apply (i.e. we'll rebroadcast all messages since do_rebroadcast will stay true).

ok, that makes sense, done

andresilva · 2022-03-21T15:10:13Z

client/beefy/src/gossip.rs

 			}

 			let msg = match VoteMessage::<NumberFor<B>, Public, Signature>::decode(&mut data) {
 				Ok(vote) => vote,
-				Err(_) => return true,
+				Err(_) => return false,


Good catch.

andresilva · 2022-03-21T15:21:07Z

client/beefy/src/worker.rs

@@ -22,20 +22,22 @@ use codec::{Codec, Decode, Encode};
 use futures::{future, FutureExt, StreamExt};
 use log::{debug, error, info, log_enabled, trace, warn};
 use parking_lot::Mutex;
+use tokio::time::{sleep, Duration};


Could you replace this with futures_timer::Delay? We are currently only depending directly on tokio for the executor (and some tests).

andresilva · 2022-03-21T15:21:33Z

client/beefy/Cargo.toml

@@ -14,6 +14,7 @@ hex = "0.4.2"
 log = "0.4"
 parking_lot = "0.12.0"
 thiserror = "1.0"
+tokio = { version = "1.15", features = ["time"] }


Should be removed if we use futures_timer instead.

andresilva · 2022-03-21T15:36:39Z

client/beefy/src/worker.rs

-					return
-				},
-			};
+		if log_enabled!(target: "beefy", log::Level::Debug) {


I would do it regardless. Probably rustc is smart enough to generate code that is similar to this.

Let's keep it explicit for now (I'm not sure rustc reasons that far ahead). This will change anyway when we add non-authority nodes support (paritytech/grandpa-bridge-gadget#407).

andresilva · 2022-03-21T15:42:46Z

client/beefy/src/worker.rs

+				// Vote if there's now a new vote target.
+				if let Some(target_number) = self.current_vote_target() {
+					self.do_vote(target_number);


Is this needed? The only reason for not having a vote target would be if rounds is not initialized, that will happen when we get a finality notification and we'll also attempt to vote after.

depending on min_block_delta, vote target can be some not-yet-finalized block. In that case, current_vote_target is None (see last lines in fn vote_target).

andresilva · 2022-03-21T15:51:23Z

client/beefy/src/worker.rs

+	/// Wait for BEEFY runtime pallet to be available.
+	#[cfg(not(test))]
+	async fn wait_for_runtime_pallet(&mut self) {
+		loop {
+			let at = BlockId::hash(self.best_grandpa_block_header.hash());
+			if let Some(active) = self.client.runtime_api().validator_set(&at).ok().flatten() {
+				if active.id() == GENESIS_AUTHORITY_SET_ID {
+					// When starting from genesis, there is no session boundary digest.
+					// Just initialize `rounds` to Block #1 as BEEFY mandatory block.
+					self.init_session_at(active, 1u32.into());
+				}
+				// In all other cases, we just go without `rounds` initialized, meaning the worker
+				// won't vote until it witnesses a session change.
+				// Once we'll implement 'initial sync' (catch-up), the worker will be able to start
+				// voting right away.
+				break
+			} else {
+				info!(target: "beefy", "Waiting BEEFY pallet to become available...");
+				sleep(Duration::from_secs(5)).await;
 			}
 		}
 	}


I don't think this will work as at will never change throughout the loop. Can we instead listen to finality notifications until the runtime call works?

block_finality_notifications().take_while(|notif| { runtime_api().validator_set(notif.block) // ... })

good catch!

I had meant to do let at = BlockId::number(client.info().finalized_number);

but your idea is even better since we don't do any sleeping.

What I don't like about using take_while is that it consumes the finality stream so we can't keep using the same stream once we see pallet available. As a workaround, I just get a new stream that will provide new notifications from here on out (no older notifs). This is incorrect in the node catch-up scenario, but that's not implemented yet anyway and we'll tackle it when we get there.

andresilva · 2022-03-21T15:53:24Z

client/beefy/src/worker.rs

+		let (validators, validator_set_id) = if let Some(rounds) = &self.rounds {
+			if !rounds.should_self_vote(&(payload.clone(), target_number)) {
+				debug!(target: "beefy", "🥩 Don't double vote for block number: {:?}", target_number);
+				return
+			}
+			(rounds.validators_for(target_number), rounds.validator_set_id_for(target_number))
+		} else {
+			debug!(target: "beefy", "🥩 Missing validator set - can't vote for: {:?}", target_hash);
+			return
+		};


The fact that this implicitly votes with different validator sets is a bit confusing IMO, it also ties with the logic of vote_target. But let's keep it as is for now and refactor in the future once we block until we vote for mandatory blocks.

andresilva · 2022-03-21T15:56:00Z

client/beefy/src/worker.rs

+	#[cfg(test)]
+	// behavior modifiers used in tests
+	test_res: tests::TestModifiers,


I'm not really fond of mixing test-specific code with the general implementation. In this case it seems that we should instead allow changing the provider of validators (so we can mock it in tests), and also a provider for the payload (so we can create random mmr roots, potentially corrupted). I'm OK with fixing in a follow-up PR.

Opened paritytech/grandpa-bridge-gadget#406 for it

acatangiu

@andresilva addressed all comments, ptal

acatangiu · 2022-03-22T13:27:03Z

client/beefy/src/worker.rs

@@ -22,20 +22,22 @@ use codec::{Codec, Decode, Encode};
 use futures::{future, FutureExt, StreamExt};
 use log::{debug, error, info, log_enabled, trace, warn};
 use parking_lot::Mutex;
+use tokio::time::{sleep, Duration};


acatangiu · 2022-03-22T13:27:18Z

client/beefy/Cargo.toml

@@ -14,6 +14,7 @@ hex = "0.4.2"
 log = "0.4"
 parking_lot = "0.12.0"
 thiserror = "1.0"
+tokio = { version = "1.15", features = ["time"] }


acatangiu · 2022-03-22T13:30:52Z

client/beefy/src/round.rs

@@ -29,20 +29,25 @@ use sp_runtime::traits::MaybeDisplay;

 #[derive(Default)]
 struct RoundTracker {
+	self_vote: bool,


Opened paritytech/grandpa-bridge-gadget#405 to not lose this.

acatangiu · 2022-03-22T13:32:44Z

client/beefy/src/worker.rs

+	#[cfg(test)]
+	// behavior modifiers used in tests
+	test_res: tests::TestModifiers,


Opened paritytech/grandpa-bridge-gadget#406 for it

acatangiu · 2022-03-22T13:33:28Z

client/beefy/src/worker.rs


 		let public_keys = self.key_store.public_keys()?;
-		let store: BTreeSet<&Public> = public_keys.iter().collect();
+		let store: BTreeSet<&AuthorityId> = public_keys.iter().collect();

 		let missing: Vec<_> = store.difference(&active).cloned().collect();


thanks for the explanation, made the code and docs more clear based on it.

acatangiu · 2022-03-22T13:36:08Z

client/beefy/src/worker.rs

-					return
-				},
-			};
+		if log_enabled!(target: "beefy", log::Level::Debug) {


Let's keep it explicit for now (I'm not sure rustc reasons that far ahead). This will change anyway when we add non-authority nodes support (paritytech/grandpa-bridge-gadget#407).

acatangiu · 2022-03-22T13:39:27Z

client/beefy/src/worker.rs

+				// Vote if there's now a new vote target.
+				if let Some(target_number) = self.current_vote_target() {
+					self.do_vote(target_number);


depending on min_block_delta, vote target can be some not-yet-finalized block. In that case, current_vote_target is None (see last lines in fn vote_target).

acatangiu · 2022-03-22T14:23:54Z

client/beefy/src/worker.rs

+	/// Wait for BEEFY runtime pallet to be available.
+	#[cfg(not(test))]
+	async fn wait_for_runtime_pallet(&mut self) {
+		loop {
+			let at = BlockId::hash(self.best_grandpa_block_header.hash());
+			if let Some(active) = self.client.runtime_api().validator_set(&at).ok().flatten() {
+				if active.id() == GENESIS_AUTHORITY_SET_ID {
+					// When starting from genesis, there is no session boundary digest.
+					// Just initialize `rounds` to Block #1 as BEEFY mandatory block.
+					self.init_session_at(active, 1u32.into());
+				}
+				// In all other cases, we just go without `rounds` initialized, meaning the worker
+				// won't vote until it witnesses a session change.
+				// Once we'll implement 'initial sync' (catch-up), the worker will be able to start
+				// voting right away.
+				break
+			} else {
+				info!(target: "beefy", "Waiting BEEFY pallet to become available...");
+				sleep(Duration::from_secs(5)).await;
 			}
 		}
 	}


good catch!

I had meant to do let at = BlockId::number(client.info().finalized_number);

but your idea is even better since we don't do any sleeping.

What I don't like about using take_while is that it consumes the finality stream so we can't keep using the same stream once we see pallet available. As a workaround, I just get a new stream that will provide new notifications from here on out (no older notifs). This is incorrect in the node catch-up scenario, but that's not implemented yet anyway and we'll tackle it when we get there.

acatangiu · 2022-03-22T14:24:40Z

primitives/beefy/Cargo.toml

@@ -20,7 +25,7 @@ sp-std = { version = "4.0.0", path = "../std", default-features = false }
 [dev-dependencies]
 hex = "0.4.3"
 hex-literal = "0.3"
-sp-keystore = { version = "0.12.0", path = "../keystore" }
+sp-keystore = { version = "0.12.0", path = "../keystore", default-features = false }


svyatonik

I have a couple of dumb questions as a BEEFY newcomer. Overall everything seems fine. Thanks for the great job, Adrian! :)

svyatonik · 2022-03-25T09:16:36Z

client/beefy/src/gossip.rs

+	/// This can be called once round is complete so we stop gossiping for it.
+	pub(crate) fn conclude_round(&self, round: NumberFor<B>) {
+		debug!(target: "beefy", "🥩 About to drop gossip round #{}", round);
+		self.known_votes.write().remove(round);


Are all rounds are guaranteed to be "concluded"? There's a test below - note_and_drop_round_works. It "notes" rounds 1, 3, 7 and 10. And at some point 7 is concluded (this method is called). As a result - the btree map in known_votes will contain entries for 1, 3 and 10. If 1 and 3 are never concluded, then the map will hold entries forever => "memory leak". Please, correct me if I'm wrong

Good catch! I was thinking since the round-set is completely recycled on each new session there's no problem with leftover map entries (because they are dropped at session end).

But I guess the code would be cleaner if we explicitly removed all rounds older than concluded.

Also the gossip_validator rounds do not get recycled so these are in fact "leaked mem". Will fix.

svyatonik · 2022-03-25T09:20:39Z

client/beefy/src/gossip.rs

+
+	/// Create new round votes set if not already present.
+	pub fn insert(&mut self, round: NumberFor<B>) {
+		if !self.live.contains_key(&round) {


I'd replace with self.live.entry(number).or_default(); :) But this version is fine too :)

svyatonik · 2022-03-25T09:41:01Z

client/beefy/src/round.rs

+					.validators()
+					.iter()
+					.map(|authority_id| {
+						signatures.iter().find_map(|(id, sig)| {


I was thinking to suggest change type of RoundTracker::votes to some map to avoid this loop - iiuc the number of BEEFY validators = number of GRANDPA validators, which is e.g. ~1K on Kusama && this loop is N^2. But now I'm unsure if that's correct.

So I have a question - what if the same validator sends two different signatures. I don't see any checks in Rounds::add_vote - is it handled somewhere else? What I'm worrying about is that the RoundTracker::votes will have two different entries for the same validator. And then in try_conclude we are just looking whether votes.len() >= threshold. Is it possible? Sorry for possibly being dumb here - I'm looking at this code for the first time :)

That's a valid question. Votes validation happens at the gossip level here, so validator public key, beefy payload and validator signature are tied together and validated before accepting vote. Therefore, the Rounds code doesn't need to worry about invalid votes (like same validator submitting multiple signature votes).

I will add doc comments explaining this better.

Good catch on the O(n^2) loop, changed RoundTracker::votes to a map to get that down to O(n).

svyatonik · 2022-03-25T09:58:42Z

client/beefy/src/worker.rs

+			r.validator_set().clone()
+		} else {
+			// no previous rounds present use new validator set instead (genesis case)
+			active.clone()


Please check my thoughts:

we have some chain that has been started a year ago, so we're definitely not at genesis;

authority node is starting and BeefyWorker::new() is called, which sets rounds to None;

worker receives finality notification - i.e. handle_finality_notification is called;

the finalized header starts new session - i.e. handle_finality_notification calls init_session_at;

since rounds is still set to None, we're falling here and prev validator set is set to the same (new) set.

I don't know if I'm right and if I am, then whether it may lead to some issues or not. So just asking to recheck it. Thanks :)

Your reasoning is correct, but we plan on addressing the initial sync part after Lean Beefy milestone - issue for it here paritytech/grandpa-bridge-gadget#112.

Within Lean Beefy simplifying assumptions, this happens when starting not at genesis.

The end result is that this validator will not be able to vote for current session (doesn't have the right validator set), but will be able to vote for future ones.

Simplified BEEFY worker logic based on the invariant that GRANDPA will always finalize 1st block of each new session, meaning BEEFY worker is guaranteed to receive finality notification for the BEEFY mandatory blocks. Under these conditions the current design is as follows: - session changes are detected based on BEEFY Digest present in BEEFY mandatory blocks, - on each new session new `Rounds` of voting is created, with old rounds being dropped (for gossip rounds, last 3 are still alive so votes are still being gossiped), - after processing finality for a block, the worker votes if a new voting target has become available as a result of said block finality processing, - incoming votes as well as self-created votes are processed and signed commitments are created for completed BEEFY voting rounds, - the worker votes if a new voting target becomes available once a round successfully completes. On worker startup, the current validator set is retrieved from the BEEFY pallet. If it is the genesis validator set, worker starts voting right away considering Block #1 as session start. Otherwise (not genesis), the worker will vote starting with mandatory block of the next session. Later on when we add the BEEFY initial-sync (catch-up) logic, the worker will sync all past mandatory blocks Signed Commitments and will be able to start voting right away. BEEFY mandatory block is the block with header containing the BEEFY `AuthoritiesChange` Digest, this block is guaranteed to be finalized by GRANDPA. This session-boundary block is signed by the ending-session's validator set. Next blocks will be signed by the new session's validator set. This behavior is consistent with what GRANDPA does as well. Also drop the limit N on active gossip rounds. In an adversarial network, a bad actor could create and gossip N invalid votes with round numbers larger than the current correct round number. This would lead to votes for correct rounds to no longer be gossiped. Add unit-tests for all components, including full voter consensus tests. Signed-off-by: Adrian Catangiu <adrian@parity.io> Co-authored-by: Tomasz Drwięga <tomusdrw@users.noreply.github.com> Co-authored-by: David Salami <Wizdave97>

acatangiu added 12 commits February 16, 2022 13:41

consolidate set best beefy block operation

e47067a

add new beefy future to be used for core logic

f0fc691

add logic to init best beefy

26887ed

beefy: only hold single active round

f56eb0b

move beefy voting logic to separate async task

f5eab4a

beefy voting logic - wip 2

606a5fa

beefy voting logic - wip 3

e956de9

beefy keep track of votes for multiple blocks within one session

02e85f7

improve self-equivocation logic

38c1b48

beefy voting logic - wip 4

bbd3487

beefy fix gossip rounds - wip 5

d236c7f

remove leftover commented code

9d1cbca

acatangiu self-assigned this Feb 18, 2022

acatangiu requested review from tomusdrw and andresilva February 18, 2022 13:25

acatangiu commented Feb 18, 2022

View reviewed changes

client/beefy/src/worker.rs Outdated Show resolved Hide resolved

acatangiu marked this pull request as draft February 18, 2022 13:40

tomusdrw reviewed Feb 21, 2022

View reviewed changes

acatangiu and others added 3 commits February 21, 2022 15:26

replace channel with a custom impl Stream object

81b61d7

fix beefy-gadged tests

28e0c3d

Apply suggestions from code review

db02909

Co-authored-by: Tomasz Drwięga <tomusdrw@users.noreply.github.com>

acatangiu added 3 commits February 22, 2022 15:13

explicitly handle no best beefy block

d5e84ab

fix walk up the chain looking for session boundary

0a045b1

pallet-beefy: correctly change session

5b3f28f

vgeddes reviewed Mar 16, 2022

View reviewed changes

andresilva reviewed Mar 21, 2022

View reviewed changes

Merge branch 'master' of github.com:paritytech/substrate into lean-beefy

95ccf07

acatangiu commented Mar 22, 2022

View reviewed changes

acatangiu added 2 commits March 22, 2022 16:33

address review comments

9a89f87

fix clippy

d807c65

andresilva approved these changes Mar 22, 2022

View reviewed changes

svyatonik approved these changes Mar 25, 2022

View reviewed changes

acatangiu added 3 commits March 25, 2022 13:34

review suggestions

94f3b2f

fix known votes memory leak

06f564a

avoid quadratic loop in getting signatures

c04bcfb

svyatonik approved these changes Mar 25, 2022

View reviewed changes

Merge branch 'master' of github.com:paritytech/substrate into lean-beefy

ad5d8ea

acatangiu added A8-mergeoncegreen and removed A0-please_review Pull request needs code review. labels Mar 25, 2022

acatangiu merged commit 411d9bb into paritytech:master Mar 25, 2022

This was referenced Mar 28, 2022

Improve block-selection strategy #10727

Closed

Change intial best beefy block #10669

Closed

BEEFY subscription fires multiple times for the same commitment #10684

Closed

Delay beefy worker initialization while network is on major sync #10705

Closed

acatangiu mentioned this pull request Mar 30, 2022

Stalled BEEFY report paritytech/grandpa-bridge-gadget#377

Closed

seunlanlege mentioned this pull request Apr 1, 2022

Initialize BeefyWorker validator set id for out-of-sync BEEFY gadget/client paritytech/grandpa-bridge-gadget#112

Closed

acatangiu mentioned this pull request Apr 12, 2022

BEEFY end-to-end test paritytech/grandpa-bridge-gadget#81

Closed

	sp-keystore = { version = "0.12.0", path = "../keystore", default-features = false }
	sp-keystore = { version = "0.12.0", path = "../keystore" }

Implement Lean BEEFY #10882

Implement Lean BEEFY #10882

Conversation

acatangiu commented Feb 18, 2022 • edited Loading

Description

Fixes

tomusdrw left a comment

Choose a reason for hiding this comment

acatangiu commented Feb 21, 2022

acatangiu commented Feb 22, 2022

acatangiu commented Mar 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andresilva left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acatangiu Mar 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acatangiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svyatonik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acatangiu commented Feb 18, 2022 •

edited

Loading

acatangiu Mar 22, 2022 •

edited

Loading