Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[remote-externalities] Batch calls scraping child-tree data #2245

Open
liamaharon opened this issue Nov 9, 2023 · 16 comments · May be fixed by #3367
Open

[remote-externalities] Batch calls scraping child-tree data #2245

liamaharon opened this issue Nov 9, 2023 · 16 comments · May be fixed by #3367
Assignees
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I5-enhancement An additional feature request.

Comments

@liamaharon
Copy link
Contributor

liamaharon commented Nov 9, 2023

Child-tree data is currently fetched sequentially.

Most chains to date don't have that many child-trees so this hasn't been an issue, but the Rococo Contracts chain has almost 3000 child tries making this serial approach unnecessarily add a really long time to scraping keys.

The keys to scrape appear to all be known ahead of time, so we should be able to batch the requests.

This should be a pretty easy one. Good beginner issue with mentor available :)

@liamaharon liamaharon added C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. I5-enhancement An additional feature request. D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. labels Nov 9, 2023
@brunopgalvao
Copy link
Contributor

brunopgalvao commented Nov 16, 2023

  1. Would you say this is where I should be looking at?

/// Load all of the child keys from the remote config, given the already scraped list of top key
/// pairs.
///
/// `top_kv` need not be only child-bearing top keys. It should be all of the top keys that are
/// included thus far.
///
/// This function concurrently populates `pending_ext`. the return value is only for writing to
/// cache, we can also optimize further.
async fn load_child_remote(

  1. Can you recommend an easy way to test/run child trie remote externalities?

async fn child_keys_are_loaded() {
const CACHE: &'static str = "snapshot_retains_storage";
init_logger();
// create an ext with children keys
let child_ext = Builder::<Block>::new()
.mode(Mode::Online(OnlineConfig {
pallets: vec!["Proxy".to_owned()],
child_trie: true,

  • For example, this test here uses Proxy pallet, but as far as I am aware, Proxy pallet does not use child tries

@liamaharon
Copy link
Contributor Author

Would you say this is where I should be looking at?

Yes exactly

Can you recommend an easy way to test/run child trie remote externalities?

Add a test that scrapes live state similar to this: https://github.com/paritytech/polkadot-sdk/pull/1985/files#diff-f73dfe42c5919a5ebddb4df3294fea8e06d1b4dc0f0bba175a2c555223202cf5R1543-R1565

A remote with lots of child tries is wss://rococo-contracts-rpc.polkadot.io:443.

@brunopgalvao
Copy link
Contributor

So I am following the rabbit and it appears to be grabbing the child keys here:

for prefixed_top_key in child_roots {
let child_keys =
Self::rpc_child_get_keys(&client, &prefixed_top_key, StorageKey(vec![]), at)

pub(crate) async fn rpc_child_get_keys(
client: &HttpClient,
prefixed_top_key: &StorageKey,
child_prefix: StorageKey,
at: B::Hash,
) -> Result<Vec<StorageKey>, &'static str> {
// This is deprecated and will generate a warning which causes the CI to fail.
#[allow(warnings)]
let child_keys = substrate_rpc_client::ChildStateApi::storage_keys(
client,
PrefixedStorageKey::new(prefixed_top_key.as_ref().to_vec()),
child_prefix,
Some(at),
)

which calls the self.backend.storage_keys() defined here:

impl<Block, Client> ChildStateApiServer<Block::Hash> for ChildState<Block, Client>
where
Block: BlockT + 'static,
Client: Send + Sync + 'static,
{
fn storage_keys(
&self,
storage_key: PrefixedStorageKey,
key_prefix: StorageKey,
block: Option<Block::Hash>,
) -> RpcResult<Vec<StorageKey>> {
self.backend.storage_keys(block, storage_key, key_prefix).map_err(Into::into)
}

Two questions I have:

The keys to scrape appear to all be known ahead of time

  1. Where/how can I find these well known keys?
  2. How would I go about batching a request?

Put all the known keys into a vector and create a fn storage_keys_batched in the child state backend API:

/// Child state backend API.
pub trait ChildStateBackend<Block: BlockT, Client>: Send + Sync + 'static

@liamaharon
Copy link
Contributor Author

  1. Where/how can I find these well known keys?
 for prefixed_top_key in child_roots { 
 	let child_keys = 
 		Self::rpc_child_get_keys(&client, &prefixed_top_key, StorageKey(vec![]), at) 

They are in child_roots. Instead of calling the RPC sequentially here we want to do it all at once.

  1. How would I go about batching a request?

ChatGPT is surprisingly good at answering these types of question https://chat.openai.com/share/b607012e-3f78-4bcd-9bdc-c6cac06646fd

@brunopgalvao
Copy link
Contributor

brunopgalvao commented Dec 4, 2023

I have some code (#2584) and it compliles! Now I want to run the tests but when I run the tests locally or using wss://rococo-contracts-rpc.polkadot.io:443 or wss://rpc.polkadot.io:443 I get the following error:

Dec 04 01:34:52.398 ERROR remote-ext: Error = Transport(Server returned an error status code: 429) 

Do I need to be behind a VPN for the Polkadot RPC nodes to work?

Locally, I am using the contracts node:

./target/debug/substrate-contracts-node --rpc-max-request-size 10000000 --rpc-max-response-size 10000000
TEST_WS=ws://127.0.0.1:9944 cargo test --features=remote-test -p frame-remote-externalities -- --nocapture

Same happens on master branch.

@ggwpez ggwpez added D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. and removed D0-easy Can be fixed primarily by duplicating and adapting code by an intermediate coder. labels Feb 5, 2024
@ggwpez
Copy link
Member

ggwpez commented Feb 5, 2024

What is the status of this @liamaharon @brunopgalvao ?

@brunopgalvao
Copy link
Contributor

What is the status of this @liamaharon @brunopgalvao ?

I would very much like to continue working on this issue. I got an error here: #2245 (comment)

I could use some help unblocking what could be the issue there.

@ggwpez
Copy link
Member

ggwpez commented Feb 5, 2024

Does it with any of these nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff
They need to be configured in a special way, and the normal nodes may not be.

@liamaharon
Copy link
Contributor Author

@brunopgalvao 429 means the server is rate limiting you. How many requests are you making in parallel? Limiting parallel requests and adding retry logic should resolve your issue.

@brunopgalvao
Copy link
Contributor

brunopgalvao commented Feb 9, 2024

Does it with any of these nodes? https://gist.github.com/ggwpez/81db110fe4390ed9a7622f5857dfc4ff They need to be configured in a special way, and the normal nodes may not be.

Yes. I have tried wss://rococo-try-runtime-node.parity-chains.parity.io:443

This is also happening on master for me:

polkadot-sdk % TEST_WS=wss://rococo-try-runtime-node.parity-chains.parity.io:443 cargo test --features=remote-test -p frame-remote-externalities -- --nocapture
   Compiling frame-remote-externalities v0.35.0 (/Users/bruno/src/polkadot-sdk/substrate/utils/frame/remote-externalities)
    Finished test [unoptimized + debuginfo] target(s) in 4.40s
     Running unittests src/lib.rs (target/debug/deps/frame_remote_externalities-c1e0e92aad840e1f)

running 13 tests
✅ Found 41 keys (1.49s)
✅ Found 1 keys (1.50s)
✅ Found 0 keys (1.51s)
test remote_tests::can_build_big_pallet ... ok
[00:00:00] ✅ Downloaded key values 4.8243/s [==================================================================================================================================] 1/1 (0s)
[00:00:00] ✅ Downloaded key values 93.605/s [================================================================================================================================] 41/41 (0s)
✅ Inserted keys into DB (0.00s)
✅ Inserted keys into DB (0.00s)
✅ Found 1383 keys (2.66s)
✅ Found 1383 keys (2.98s)
⠸     2.750 s   Scraping keys...░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
✅ Found 1383 keys (3.15s)
✅ Found 1383 keys (3.15s)
⠋     3.326 s   Scraping keys...░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
✅ Found 1383 keys (3.47s)
✅ Found 1383 keys (3.14s)
✅ Found 31 keys (1.23s)
[00:00:00] ✅ Downloaded key values 93.6132/s [===============================================================================================================================] 31/31 (0s)
✅ Inserted keys into DB (0.00s)
⠸     2.760 swnlScraping keys...████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
✅ Found 1383 keys (3.39s)
⠋    10.036 swnlScraping keys...Feb 09 13:58:41.698 ERROR remote-ext: batch processing failed: "Networking or low-level protocol error: Server returned an error status code: 429"    (6s)
thread 'remote_tests::can_create_child_snapshot' panicked at substrate/utils/frame/remote-externalities/src/lib.rs:1490:14:
called `Result::unwrap()` on an `Err` value: "batch processing failed"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
⠙    10.117 s   Scraping keys...test remote_tests::can_create_child_snapshot ... FAILED
✅ Loaded snapshot (0.01s)
✅ Loaded snapshot (0.01s)
test tests::can_exclude_from_snapshot ... ok
✅ Loaded snapshot (0.01s)
⠼    10.370 s   Scraping keys...test tests::can_load_state_snapshot ... ok
⠋    11.710 swnlScraping keys...Feb 09 13:58:43.335 ERROR remote-ext: Error while getting storage data: Networking or low-level protocol error: Server returned an error status code: 429    
thread 'remote_tests::can_build_one_small_pallet' panicked at substrate/utils/frame/remote-externalities/src/lib.rs:1428:14:
called `Result::unwrap()` on an `Err` value: "Error while getting storage data"
test remote_tests::can_build_one_small_pallet ... FAILED
⠹    11.873 swnlScraping keys...Feb 09 13:58:43.496 ERROR remote-ext: Error while getting storage data: Networking or low-level protocol error: Server returned an error status code: 429    
thread 'remote_tests::offline_else_online_works' panicked at substrate/utils/frame/remote-externalities/src/lib.rs:1389:14:
called `Result::unwrap()` on an `Err` value: "Error while getting storage data"
test remote_tests::offline_else_online_works ... FAILED
⠸    11.956 s   Scraping keys...Feb 09 13:58:43.603 ERROR remote-ext: Error while getting storage data: Networking or low-level protocol error: Server returned an error status code: 429 

@brunopgalvao 429 means the server is rate limiting you. How many requests are you making in parallel? Limiting parallel requests and adding retry logic should resolve your issue.

@liamaharon I am getting this issue on master as well. See above.
Also using TEST_WS=ws://127.0.0.1:9944 cargo test --features=remote-test -p frame-remote-externalities -- --nocapturewith polkadot-sdk on master.

@liamaharon
Copy link
Contributor Author

I wasn't able to replicate @brunopgalvao, maybe due to my further distance to the servers compared to you. Can you try with this branch? https://github.com/paritytech/polkadot-sdk/tree/liam-remote-ext-exponential-backoff

@brunopgalvao
Copy link
Contributor

Got it to work!

The TEST_WS environment variable is only used in certain tests. I also needed to replace DEFAULT_HTTP_ENDPOINT with one of the nodes listed here #2245 (comment):

const DEFAULT_HTTP_ENDPOINT: &str = "https://rpc.polkadot.io:443";

@liamaharon
Copy link
Contributor Author

liamaharon commented Feb 9, 2024

Ah nice catch. do you mind opening a pr to fix those tests? I would also be interested to know if my exponential backoff branch fixes the 429 errors :)

@brunopgalvao
Copy link
Contributor

Ah nice catch. do you mind opening a pr to fix those tests? I would also be interested to know if my exponential backoff branch fixes the 429 errors :)

429 errors no longer happen (including your branch) as long as the nodes have a --rpc-max-request-size 10000000 --rpc-max-response-size set.

However, as noted some tests use the TEST_WS environment variable:

async fn can_build_big_pallet() {
if std::option_env!("TEST_WS").is_none() {
return
}
init_logger();
Builder::<Block>::new()
.mode(Mode::Online(OnlineConfig {
transport: std::option_env!("TEST_WS").unwrap().to_owned().into(),

While other tests use the DEFAULT_HTTP_ENDPOINT hardcoded variable:

async fn can_create_child_snapshot() {
const CACHE: &'static str = "can_create_child_snapshot";
init_logger();
Builder::<Block>::new()
.mode(Mode::Online(OnlineConfig {
state_snapshot: Some(SnapshotConfig::new(CACHE)),
pallets: vec!["Crowdloan".to_owned()],
child_trie: true,
..Default::default()

The default for transport is DEFAULT_HTTP_ENDPOINT:

impl<B: BlockT> Default for OnlineConfig<B> {
fn default() -> Self {
Self {
transport: Transport::from(DEFAULT_HTTP_ENDPOINT.to_owned()),

Does it make sense to refactor all tests to use TEST_WS instead of allowing some tests to fallback on the hardcoded DEFAULT_HTTP_ENDPOINT?

@liamaharon
Copy link
Contributor Author

Does it make sense to refactor all tests to use TEST_WS instead of allowing some tests to fallback on the hardcoded DEFAULT_HTTP_ENDPOINT?

Yes I think so, PR would be good :)

@brunopgalvao
Copy link
Contributor

Yes I think so, PR would be good :)

PR: #3284

github-merge-queue bot pushed a commit that referenced this issue Feb 13, 2024
Refactor in accordance with
#2245 (comment)

Prior to this PR, the `remote_tests` test module would either use
`TEST_WS` or `DEFAULT_HTTP_ENDPOINT`.

With the PR, `TEST_WS` is the default for the `remote_tests` test module
and the fallback is `DEFAULT_HTTP_ENDPOINT`.

The only downside I see to this PR is that for particular tests in the
`remote_tests` module, one would want to use a different http endpoint.
If that is the case, they would have to manually hardcode the http
endpoint for that particular test.

Note: The `TEST_WS` node should fulfill the role for all test cases e.g.
include child tries.

Give it a _try_:
```
TEST_WS=wss://rococo-try-runtime-node.parity-chains.parity.io:443 cargo test --features=remote-test -p frame-remote-externalities -- --nocapture
```

---------

Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this issue Mar 25, 2024
Refactor in accordance with
paritytech#2245 (comment)

Prior to this PR, the `remote_tests` test module would either use
`TEST_WS` or `DEFAULT_HTTP_ENDPOINT`.

With the PR, `TEST_WS` is the default for the `remote_tests` test module
and the fallback is `DEFAULT_HTTP_ENDPOINT`.

The only downside I see to this PR is that for particular tests in the
`remote_tests` module, one would want to use a different http endpoint.
If that is the case, they would have to manually hardcode the http
endpoint for that particular test.

Note: The `TEST_WS` node should fulfill the role for all test cases e.g.
include child tries.

Give it a _try_:
```
TEST_WS=wss://rococo-try-runtime-node.parity-chains.parity.io:443 cargo test --features=remote-test -p frame-remote-externalities -- --nocapture
```

---------

Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C1-mentor A task where a mentor is available. Please indicate in the issue who the mentor could be. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. I5-enhancement An additional feature request.
Projects
None yet
3 participants