Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prospective-parachains: respond with multiple backable candidates #3160

Merged
merged 6 commits into from
Feb 6, 2024

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Jan 31, 2024

Fixes #3129

@alindima alindima marked this pull request as draft January 31, 2024 16:06
@alindima alindima added T0-node This PR/Issue is related to the topic “node”. I5-enhancement An additional feature request. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Jan 31, 2024
// Try finding a candidate chain starting from `base_node` of length `expected_count`.
// If not possible, return the longest we could find.
// Does a depth-first search, since we're optimistic that there won't be more than one such
// chains. Assumes that the chain must be acyclic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if they are cyclic? What is the worst case complexity here, assuming parachains misbehave as much as they possibly can?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good question. we stop the search on the respective branch when we see that node A has a child that we've already visited. maybe we should just return an error. I don't think it's a valid state transition to accept. Is it?

Even if we were to accept cycles, we'd still be bounded by a function of the requested number of candidates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code in the latest revision to accept cycles in the fragment trees. This is actually a documented behaviour of prospective-parachains even with some tests for it. See:


I also documented the performance considerations. Cycles are not an issue because we are limited by the core count anyway so we won't loop indefinitely

polkadot/node/core/prospective-parachains/src/lib.rs Outdated Show resolved Hide resolved
@alindima alindima marked this pull request as ready for review February 2, 2024 13:39
Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible! Algorithmic complexity still makes me uneasy: Pinged Rob on what is his stance on forks by now.

// 2. `expected_count` is equal to the number of cores a para is scheduled on (in an elastic
// scaling scenario). For non-elastic-scaling, this is just 1. In practice, this should be a
// small number (1-3), capped by the total number of available cores (a constant alterable
// only via governance/sudo).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with realistic numbers (current and expected/exaggerated future numbers): How bad does it get? What would be limits to those parameters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elastic scaling will eventually enable chains to be bounded by the rate of a single node's block production.

In Cumulus currently, block authorship is typically slower than block verification (in validators) due to the need to read from disk. Most of the time is spent in accessing the trie for storage, not in code execution. Access patterns for the trie are also very inefficient. So it's not feasible to use more than 3 cores until that bottleneck is cleared.

If this bottleneck is alleviated, then I could see parachains using many more cores.

Copy link
Contributor Author

@alindima alindima Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this a bit more, I don't think this traversal is any different in terms of worst-case complexity from the routine used for populating the fragment tree (which populates the tree breadth-first). Is it?
In an acyclic tree, it'll visit each node once, or until it reaches the requested count (in a worst case scenario). In a cyclic one, it'd be bounded by the requested count so it's the same complexity.

The complexity is linear in the number of nodes, but the number of nodes is an exponential function (because it's an n-ary tree, at least in theory, of arity num_forks and height of max_candidate_depth).

If we'll hit a bottleneck, this wouldn't be the first place where that'd surface.

I think we could limit the number of total forks allowed for a parachain (regardless of their level in the tree). So that the total number of branches in the tree would bring down the worst-case complexity massively. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, I think this is somewhat orthogonal to this PR.
to quote the docs from fragment_tree module:

//! The code in this module is not designed for speed or efficiency, but conceptual simplicity.
//! Our assumption is that the amount of candidates and parachains we consider will be reasonably
//! bounded and in practice will not exceed a few thousand at any time. This naive implementation
//! will still perform fairly well under these conditions, despite being somewhat wasteful of
//! memory.

we should revisit this assumption maybe, but it doesn't seem like it'll be a problem for the forseeable future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prospective-parachains seems to be a good subsystem benchmark target. That should give some clear numbers on CPU usage in worst case scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened a separate issue to track this security item and added it to the elastic scaling task: #3219

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

}

node
};

// TODO [now]: taking the first selection might introduce bias
// TODO: taking the first best selection might introduce bias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to be solved in this PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this was here beforehand
It could be solved by randomising the order in which we visit a node's children, but that would make tests harder to write.
Since validators don't favour particular collators when requesting collations, the order of potential forks in the fragment tree should already be somewhat "random" based on network latency (unless collators find a way to censor/DOS other collators)

}};
}

// Parachain 1 looks like this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

// 2. `expected_count` is equal to the number of cores a para is scheduled on (in an elastic
// scaling scenario). For non-elastic-scaling, this is just 1. In practice, this should be a
// small number (1-3), capped by the total number of available cores (a constant alterable
// only via governance/sudo).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prospective-parachains seems to be a good subsystem benchmark target. That should give some clear numbers on CPU usage in worst case scenarios.

@alindima alindima added this pull request to the merge queue Feb 6, 2024
Merged via the queue into master with commit 7df1ae3 Feb 6, 2024
130 checks passed
@alindima alindima deleted the alindima/elastic-scaling-dev branch February 6, 2024 09:02
github-merge-queue bot pushed a commit that referenced this pull request Mar 1, 2024
#3130

builds on top of #3160

Processes the availability cores and builds a record of how many
candidates it should request from prospective-parachains and their
predecessors.
Tries to supply as many candidates as the runtime can back. Note that
the runtime changes to back multiple candidates per para are not yet
done, but this paves the way for it.

The following backing/inclusion policy is assumed:
1. the runtime will never back candidates of the same para which don't
form a chain with the already backed candidates. Even if the others are
still pending availability. We're optimistic that they won't time out
and we don't want to back parachain forks (as the complexity would be
huge).
2. if a candidate is timed out of the core before being included, all of
its successors occupying a core will be evicted.
3. only the candidates which are made available and form a chain
starting from the on-chain para head may be included/enacted and cleared
from the cores. In other words, if para head is at A and the cores are
occupied by B->C->D, and B and D are made available, only B will be
included and its core cleared. C and D will remain on the cores awaiting
for C to be made available or timed out. As point (2) above already
says, if C is timed out, D will also be dropped.
4. The runtime will deduplicate candidates which form a cycle. For
example if the provisioner supplies candidates A->B->A, the runtime will
only back A (as the state output will be the same)

Note that if a candidate is timed out, we don't guarantee that in the
next relay chain block the block author will be able to fill all of the
timed out cores of the para. That increases complexity by a lot.
Instead, the provisioner will supply N candidates where N is the number
of candidates timed out, but doesn't include their successors which will
be also deleted by the runtime. This'll be backfilled in the next relay
chain block.

Adjacent changes:
- Also fixes: #3141
- For non prospective-parachains, don't supply multiple candidates per
para (we can't have elastic scaling without prospective parachains
enabled). paras_inherent should already sanitise this input but it's
more efficient this way.

Note: all of these changes are backwards-compatible with the
non-elastic-scaling scenario (one core per para).
skunert pushed a commit to skunert/polkadot-sdk that referenced this pull request Mar 4, 2024
…ch#3233)

paritytech#3130

builds on top of paritytech#3160

Processes the availability cores and builds a record of how many
candidates it should request from prospective-parachains and their
predecessors.
Tries to supply as many candidates as the runtime can back. Note that
the runtime changes to back multiple candidates per para are not yet
done, but this paves the way for it.

The following backing/inclusion policy is assumed:
1. the runtime will never back candidates of the same para which don't
form a chain with the already backed candidates. Even if the others are
still pending availability. We're optimistic that they won't time out
and we don't want to back parachain forks (as the complexity would be
huge).
2. if a candidate is timed out of the core before being included, all of
its successors occupying a core will be evicted.
3. only the candidates which are made available and form a chain
starting from the on-chain para head may be included/enacted and cleared
from the cores. In other words, if para head is at A and the cores are
occupied by B->C->D, and B and D are made available, only B will be
included and its core cleared. C and D will remain on the cores awaiting
for C to be made available or timed out. As point (2) above already
says, if C is timed out, D will also be dropped.
4. The runtime will deduplicate candidates which form a cycle. For
example if the provisioner supplies candidates A->B->A, the runtime will
only back A (as the state output will be the same)

Note that if a candidate is timed out, we don't guarantee that in the
next relay chain block the block author will be able to fill all of the
timed out cores of the para. That increases complexity by a lot.
Instead, the provisioner will supply N candidates where N is the number
of candidates timed out, but doesn't include their successors which will
be also deleted by the runtime. This'll be backfilled in the next relay
chain block.

Adjacent changes:
- Also fixes: paritytech#3141
- For non prospective-parachains, don't supply multiple candidates per
para (we can't have elastic scaling without prospective parachains
enabled). paras_inherent should already sanitise this input but it's
more efficient this way.

Note: all of these changes are backwards-compatible with the
non-elastic-scaling scenario (one core per para).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. T0-node This PR/Issue is related to the topic “node”. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prospective parachains API chain: Provide N candidates
4 participants