cumulus/minimal-node: added prometheus metrics for the RPC client #5572

iulianbarbu · 2024-09-03T13:42:42Z

Description

When we start a node with connections to external RPC servers (as a minimal node), we lack metrics around how many individual calls we're doing to the remote RPC servers and their duration. This PR adds metrics that measure durations of each RPC call made by the minimal nodes, and implicitly how many calls there are.

Closes #5409
Closes #5689

Integration

Node operators should be able to track minimal node metrics and decide appropriate actions according to how the metrics are interpreted/felt. The added metrics can be observed by curl'ing the prometheus metrics endpoint for the ~~relaychain~~ parachain (it was changed based on the review). The metrics are represented by ~~polkadot_parachain_relay_chain_rpc_interface~~ relay_chain_rpc_interface namespace (I realized lining up parachain_relay_chain in the same metric might be confusing :). Excerpt from the curl:

relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.001"} 15
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.004"} 23
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.016"} 23
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.064"} 23
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="0.256"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="1.024"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="4.096"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="16.384"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="65.536"} 24
relay_chain_rpc_interface_bucket{method="chain_getBlockHash",chain="rococo_local_testnet",le="+Inf"} 24
relay_chain_rpc_interface_sum{method="chain_getBlockHash",chain="rococo_local_testnet"} 0.11719075
relay_chain_rpc_interface_count{method="chain_getBlockHash",chain="rococo_local_testnet"} 24

Review Notes

The way we measure durations/hits is based on HistogramVec struct which allows us to collect timings for each RPC client method called from the minimal node., It can be extended to measure the RPCs against other dimensions too (status codes, response sizes, etc). The timing measuring is done at the level of the relay-chain-rpc-interface, in the RelayChainRpcClient struct's method 'request_tracing'. A single entry point for all RPC requests done through the relay-chain-rpc-interface. The requests durations will fall under exponential buckets described by start 0.001, factor 4 and count 9.

cumulus/client/relay-chain-minimal-node/src/blockchain_rpc_client.rs

paritytech-cicd-pr · 2024-09-06T17:47:20Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable-int
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7290328

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

michalkucharczyk

looks good, nits only.

cumulus/client/relay-chain-rpc-interface/src/lib.rs

cumulus/client/relay-chain-rpc-interface/src/metrics.rs

michalkucharczyk · 2024-09-17T08:48:34Z

cumulus/client/relay-chain-rpc-interface/src/metrics.rs

+							"polkadot_parachain_relay_chain_rpc_interface",
+							"Tracks stats about cumulus relay chain RPC interface",
+						),
+						buckets: prometheus::exponential_buckets(0.001, 4.0, 9)


just curiosity - any reason for these values?

Am I right that buckets will be?

0.001 0.004 0.016 0.064 0.256 1.024 4.096 16.384 65.536

Correct for the buckets.

I picked the buckets split by seeing it used for other requests timers in the code (related to substrate libp2p), although there isn't a particular relationship between them. Ideally we'll have some preliminary measurements for these first and then pick the buckets. I am ok with these values though because they correspond to some rough back of the envelope measurements for exchanging data over the network (e.g USA -> EU -> USA ~ 150 ms). I think that the higher buckets (e.g. >1s) can be considered extreme, and might correspond to super infrequent outliers (assuming the network runs fine most of the time).

LE: on my above higher buckets note, depends as usual. We measure implicitly the time it takes for the external RPC to process the request and return the response, so I think they can hold some of the observations, depending on the nature of the RPC call.

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

cumulus/client/relay-chain-rpc-interface/src/lib.rs

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

skunert · 2024-09-17T15:45:44Z

cumulus/client/relay-chain-minimal-node/src/lib.rs

@@ -127,6 +128,7 @@ pub async fn build_minimal_relay_chain_node_with_rpc(
 	let client = cumulus_relay_chain_rpc_interface::create_client_and_start_worker(
 		relay_chain_url,
 		task_manager,
+		polkadot_config.prometheus_registry(),


I am thinking about whether this is better registered on the parachain side.

Technically, this is doing relay chain calls. However, the calls to the relay chain are basically done only on collators, the code lives in cumulus. As a user I would probably expect these metrics to be attached to the parachain prometheus endpoint.

I looked for reasons to keep the metrics on the relay chain side but couldn't find any. To be honest, it is still not clear to me what kind of metrics should fall under the relay chain prometheus exporter (its concerns on the collator side are not crispy clear in my mind yet), but for our case I agree that these metrics seem more relevant to the internals of how parachains work, so it would be useful to expose them in the "parachain" prometheus exporter.

Changed this here: 309ae23.

To be honest, it is still not clear to me what kind of metrics should fall under the relay chain prometheus exporter

So when you don't use --relay-chain-rpc-urls then the collator will start an embedded node. Which is basically the same as a polkadot full node that you start with the polkadot binary.

If course this internal node will export all its metrics, there is a whole bunch of them defined in substrate. So in these scenarios it can make sense to monitor the relay chain node separately. This RPC functionality however is in the end parachain specific and therefore goes to the parachain prometheus endpoint.

...and rename for clarity Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

skunert

nice!

iulianbarbu added the T9-cumulus This PR/Issue is related to cumulus. label Sep 3, 2024

iulianbarbu self-assigned this Sep 3, 2024

niklasad1 reviewed Sep 3, 2024

View reviewed changes

cumulus/client/relay-chain-minimal-node/src/blockchain_rpc_client.rs Outdated Show resolved Hide resolved

iulianbarbu force-pushed the add-rpc-collator-metrics branch 2 times, most recently from 4c1f326 to a42b3ae Compare September 4, 2024 13:43

iulianbarbu force-pushed the add-rpc-collator-metrics branch from 6d99a1b to 2f51ebd Compare September 9, 2024 17:26

iulianbarbu mentioned this pull request Sep 12, 2024

Multiple collators started on same machine conflict on prometheus exporter default port 9616 #5689

Closed

2 tasks

iulianbarbu force-pushed the add-rpc-collator-metrics branch from cf3c960 to fa15b54 Compare September 12, 2024 12:29

iulianbarbu added 4 commits September 12, 2024 16:39

cumulus/minimal-node: added prometheus metrics for the RPC client

2544c67

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

substrate/prometheus: log error for port in use when binding

e5e5fa0

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

added prdoc

f5f375b

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

rename metric namespace

0ef2ee7

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

iulianbarbu force-pushed the add-rpc-collator-metrics branch from 4c0c7fa to 0ef2ee7 Compare September 12, 2024 14:29

iulianbarbu added 5 commits September 12, 2024 17:44

add license

cadaab6

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

formatted Cargo.toml

3859c57

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

fmt fixes

db41558

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

fixed prdoc

25be098

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

fix prdoc

24862e0

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

iulianbarbu marked this pull request as ready for review September 13, 2024 08:02

iulianbarbu requested review from skunert and michalkucharczyk September 13, 2024 08:02

iulianbarbu and others added 4 commits September 13, 2024 11:05

amend prdoc

b08d580

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

Merge branch 'master' into add-rpc-collator-metrics

6bf5031

Merge branch 'master' into add-rpc-collator-metrics

5af4d95

Merge branch 'master' into add-rpc-collator-metrics

9e85cab

michalkucharczyk approved these changes Sep 17, 2024

View reviewed changes

cumulus/client/relay-chain-rpc-interface/src/lib.rs Outdated Show resolved Hide resolved

cumulus/client/relay-chain-rpc-interface/src/metrics.rs Outdated Show resolved Hide resolved

cumulus/client/relay-chain-rpc-interface/src/metrics.rs Outdated Show resolved Hide resolved

michalkucharczyk reviewed Sep 17, 2024

View reviewed changes

iulianbarbu added 2 commits September 17, 2024 14:25

updates after review

4481f5e

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

removed semicolon

4a23266

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

michalkucharczyk reviewed Sep 17, 2024

View reviewed changes

cumulus/client/relay-chain-rpc-interface/src/lib.rs Outdated Show resolved Hide resolved

iulianbarbu and others added 2 commits September 17, 2024 16:12

one more semicolumn removed

d465d1b

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

Merge branch 'master' into add-rpc-collator-metrics

417b12a

skunert reviewed Sep 17, 2024

View reviewed changes

iulianbarbu and others added 5 commits September 18, 2024 21:54

simplify prometheus registry passing...

309ae23

...and rename for clarity Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

Merge branch 'master' into add-rpc-collator-metrics

33d2846

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

fix fmt

5252082

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

update prdoc

bbe47d7

Signed-off-by: Iulian Barbu <iulian.barbu@parity.io>

Merge branch 'master' into add-rpc-collator-metrics

8d9cc35

skunert approved these changes Sep 19, 2024

View reviewed changes

iulianbarbu added this pull request to the merge queue Sep 19, 2024

github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Sep 19, 2024

Merge branch 'master' into add-rpc-collator-metrics

623dd15

iulianbarbu enabled auto-merge September 19, 2024 16:35

iulianbarbu added this pull request to the merge queue Sep 19, 2024

Merged via the queue into paritytech:master with commit c8d5e5a Sep 19, 2024
198 of 208 checks passed

iulianbarbu deleted the add-rpc-collator-metrics branch September 19, 2024 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cumulus/minimal-node: added prometheus metrics for the RPC client #5572

cumulus/minimal-node: added prometheus metrics for the RPC client #5572

iulianbarbu commented Sep 3, 2024 •

edited

Loading

paritytech-cicd-pr commented Sep 6, 2024

michalkucharczyk left a comment

michalkucharczyk Sep 17, 2024

iulianbarbu Sep 17, 2024 •

edited

Loading

skunert Sep 17, 2024

iulianbarbu Sep 18, 2024 •

edited

Loading

skunert Sep 19, 2024

skunert left a comment

cumulus/minimal-node: added prometheus metrics for the RPC client #5572

cumulus/minimal-node: added prometheus metrics for the RPC client #5572

Conversation

iulianbarbu commented Sep 3, 2024 • edited Loading

Description

Integration

Review Notes

paritytech-cicd-pr commented Sep 6, 2024

michalkucharczyk left a comment

Choose a reason for hiding this comment

michalkucharczyk Sep 17, 2024

Choose a reason for hiding this comment

iulianbarbu Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

skunert Sep 17, 2024

Choose a reason for hiding this comment

iulianbarbu Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

skunert Sep 19, 2024

Choose a reason for hiding this comment

skunert left a comment

Choose a reason for hiding this comment

iulianbarbu commented Sep 3, 2024 •

edited

Loading

iulianbarbu Sep 17, 2024 •

edited

Loading

iulianbarbu Sep 18, 2024 •

edited

Loading