Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Fix bitfield signing #1466

Merged
10 commits merged into from
Jul 29, 2020
Merged

Fix bitfield signing #1466

10 commits merged into from
Jul 29, 2020

Conversation

coriolinus
Copy link
Contributor

@coriolinus coriolinus commented Jul 24, 2020

Fixes the problems with #1364, which was merged too soon.

@coriolinus coriolinus added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Jul 24, 2020
@coriolinus coriolinus self-assigned this Jul 24, 2020
@coriolinus coriolinus marked this pull request as ready for review July 24, 2020 13:52
@coriolinus coriolinus requested a review from rphmeier July 24, 2020 13:52
@github-actions github-actions bot added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jul 24, 2020
@coriolinus coriolinus requested a review from montekki July 24, 2020 13:54
@rphmeier
Copy link
Contributor

rphmeier commented Jul 24, 2020

Could use a rebase. This is 3 PRs worth of commits!

edit: correction: 4. There are 2 of my Runtime API PRs from last week in here

@@ -169,6 +171,12 @@ async fn get_core_availability(
Ok(false)
}

// delegates to the v1 runtime API
async fn get_availability_cores(_relay_parent: Hash, _sender: &mut mpsc::Sender<FromJob>) -> Result<Vec<CoreState>, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized this later, but another way to stub this out would be to add the variant to the RuntimeApiRequest in the subsystem crate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea; 146746c.

Copy link
Contributor

@rphmeier rphmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes LGTM, maybe we can avoid the unimplemented! with the method I suggested

- use CoreState, not CoreOccupied
- query for availability chunks, not the whole PoV
- create a stub `fn availability_cores`
@coriolinus
Copy link
Contributor Author

Rebasing dramatically simplifies this PR, it turns out.

@coriolinus
Copy link
Contributor Author

An interesting fact: the failing test, protocol::tests::erasure_fetch_drop_also_drops_gossip_sender, succeeds instantly when run in isolation, but reliably hangs when running the full suite. This suggests that the test is malformed; context from outside the test body determines whether or not it succeeds.

$ time cargo test -p polkadot-network drops_gossip_sender
Mon 27 Jul 2020 01:11:53 PM CEST
    Finished test [unoptimized + debuginfo] target(s) in 0.25s
     Running target/debug/deps/polkadot_network-24a4527cc39ffa6b

running 1 test
test protocol::tests::erasure_fetch_drop_also_drops_gossip_sender ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 22 filtered out


real	0m0.302s
user	0m0.237s
sys	0m0.086s

…ng-fixes

Note: sorted the imports in messages.rs because it was getting difficult
to determine what the true diffs were by eye.
@coriolinus
Copy link
Contributor Author

This test appears to have become flaky at some point before or during #1469: the master-merge for that PR failed CI, as did the master-merge for #1445. In both of those cases, the test succeeded on the final commit before the squash-merge. However, three other master-merge CI runs in that span succeeded.

It's not obvious to me how the test is supposed to work, or why it's sometimes failing. The obvious suspicion is that the loop block isn't terminating, but it's not clear why it would have been reliable for some months and only become flaky now. The test itself hasn't changed since #897.

@rphmeier AFAICT you're the person who has the best chance of understanding this issue; what's your take on it?

@montekki
Copy link
Contributor

^^^^ Also saw this thing with the test on some CI runs, but atm it is not failing with this test. Probably fixing the build could help

Comment on lines 28 to 49
AvailableData,
BackedCandidate,
BlockNumber,
CandidateDescriptor,
CandidateEvent,
CandidateReceipt,
CommittedCandidateReceipt,
CoreAssignment,
CoreOccupied,
CoreState,
ErasureChunk,
Hash,
HeadData,
Id as ParaId,
OmittedValidationData,
PoV,
SignedAvailabilityBitfield,
SigningContext,
ValidationCode,
ValidatorId,
ValidatorIndex,
ValidatorSignature,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if thats ok; if that's the IDE sorting imports (rust-analyzer can do that I think) could that at least not do the one-per-line thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that intentionally: I was having trouble diffing the imports when resolving a merge conflict in which people added and removed various items in different places. Using one per line, alphabetically, made it much easier to sort things out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have them be more compact. We don't use this style anywhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a question of policy, I'd prefer to leave them like this, and expand the use of this style. Rationale: the vast majority of the time, we're not looking at the import lines, but elsewhere in the file. Of the times we are in fact looking at the import lines, it's when adding or merging imports; in those times, it's faster when the lists are broad and alphabetical than when using basically any other taxonomy.

Nevertheless, 8e5c6da.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the general rule should be to merge all crates import into one import, after that the compiler will ensure that you don't have duplicates in this list.

/// This is useful in cases like bitfield signing, when existence
/// matters, but we don't want to necessarily pass around large
/// quantities of data to get a single bit of information.
QueryChunkAvailability(Hash, ValidatorIndex, oneshot::Sender<bool>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had serious doubts as I was adding QueryDataAvaialbility message; is it really better for performance? We anyway lift one big blob from disk, there is no way to ask kvdb for just availability; when we send whatever data struct that's containing the pointer to this blob, only it will be sent around, wouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At worst, it masks a kvdb limitation which can be fixed in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node/subsystem/src/messages.rs Outdated Show resolved Hide resolved
@coriolinus
Copy link
Contributor Author

This passed, this time, but I do think we should resolve #1474 eventually.

@coriolinus coriolinus requested a review from rphmeier July 28, 2020 08:46
@coriolinus
Copy link
Contributor Author

@rphmeier I got a LGTM 5 days ago; could I get a formal approval so we can merge this PR please?

@rphmeier
Copy link
Contributor

bot merge

@ghost
Copy link

ghost commented Jul 29, 2020

Trying merge.

@ghost ghost merged commit cdd9394 into master Jul 29, 2020
@ghost ghost deleted the prgn-bitfield-signing-fixes branch July 29, 2020 22:33
ordian added a commit that referenced this pull request Jul 31, 2020
* master:
  Candidate Validation Subsystem (#1432)
  reduce slash defer durations (#1504)
  Implement the Runtime API subsystem (#1494)
  Companion for #6610 (Balances Weight Trait) (#1425)
  [CI] Publish draft release redux (#1493)
  Update README docs related to local build (#1479)
  Add a default trie-memory-tracker feature to the cli (#1502)
  Companion PR for `Add a `DefaultQueue` type alias to remove the need to use `sp_api::TransactionFor`` (#1499)
  Fix bitfield signing (#1466)
  Update Substrate, bump versions, clean up sort (#1496)
  Bump Substrate
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants