Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/blocks/:height sometimes always return error message for a block #1081

Open
Ljzn opened this issue Oct 9, 2022 · 17 comments
Open

/blocks/:height sometimes always return error message for a block #1081

Ljzn opened this issue Oct 9, 2022 · 17 comments
Labels

Comments

@Ljzn
Copy link

Ljzn commented Oct 9, 2022

Description

Sometimes the /blocks/:height api always return this message:

{"code":500,"message":"No response received from RPC endpoint in 60s","stack":"Error: No response received from RPC endpoint in 60s\n    at WsProvider.value (/usr/src/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:406:32)\n    at Timeout._onTimeout (/usr/src/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:528:182)\n    at listOnTimeout (node:internal/timers:557:17)\n    at processTimers (node:internal/timers:500:7)","level":"error"}

Steps to Reproduce

Run substrate-api-sidecar connecting to a Kusama ws endpoint.

Expected vs. Actual Behavior

expected: Get block by height success.

actual: always return same error message for specific block.

@Ljzn
Copy link
Author

Ljzn commented Oct 9, 2022

It not affect other blocks, and the problem will gone after restarting the sidecar. So this maybe related with cache.

@jsdw
Copy link
Collaborator

jsdw commented Oct 10, 2022

Can you reliably reproduce this, or did the problem go away permanently for you?

It may have been lucky timing; perhaps the Kusama node being queried was being too slow to respond for a while and it happened to improve before you restarted Sidecar?

@Ljzn
Copy link
Author

Ljzn commented Oct 14, 2022

Can you reliably reproduce this, or did the problem go away permanently for you?

It may have been lucky timing; perhaps the Kusama node being queried was being too slow to respond for a while and it happened to improve before you restarted Sidecar?

Unfortunate, I don't know how to reproduce it. It happens about once a week.

In that situation, the rest api will respond that error immediately. Just like no new query was be made.

@Ljzn
Copy link
Author

Ljzn commented Oct 20, 2022

happened again, here is more log:

2022-10-20 05:09:53 error: RPC-CORE: getRuntimeVersion(at?: BlockHash): RuntimeVersion:: No response received from RPC endpoint in 60s
2022-10-20 05:09:53 error: RPC-CORE: chain(): Text:: No response received from RPC endpoint in 60s
2022-10-20 05:09:53 error: RPC-CORE: version(): Text:: No response received from RPC endpoint in 60s
2022-10-20 05:09:53 error: No response received from RPC endpoint in 60s 
 Error: No response received from RPC endpoint in 60s
    at WsProvider.value (/usr/src/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:406:32)
    at Timeout._onTimeout (/usr/src/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:528:182)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)

@Ljzn
Copy link
Author

Ljzn commented Oct 20, 2022

hi, @jsdw . I found this error can not only happened on /blocks/:height/, but also on other routes such as /node/version.

2022-10-20 08:31:51 error: RPC-CORE: getRuntimeVersion(at?: BlockHash): RuntimeVersion:: No response received from RPC endpoint in 60s
2022-10-20 08:31:51 error: No response received from RPC endpoint in 60s 
 Error: No response received from RPC endpoint in 60s
    at WsProvider.value (/usr/src/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:406:32)
    at Timeout._onTimeout (/usr/src/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:528:182)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
2022-10-20 08:31:51 error: GET /node/version 500 15001ms

@Ljzn
Copy link
Author

Ljzn commented Oct 20, 2022

Not only Kusama, many other chains has this error today. We are using the WS api of Onfinality. So this maybe related with their rate limit policy. Feel free to close this issue if it is WS's problem.

@jsdw
Copy link
Collaborator

jsdw commented Oct 20, 2022

Mmm, the logs do very much suggest that the node simply isn't responding in time. I have found in related sorts of queries that the RPC nodes can often get very slow and fail to respond within 60s. Perhaps have a go with pointing at a different RPC node, or in the worst case, run and sycn your own node so that the reponsiveness is within your control :)

@TarikGul is the request timeout configurable per chance?

@TarikGul
Copy link
Member

@Ljzn Okay so first, I highly suggest running your own archive node. That way everything is under your control and easily monitor-able. Sidecar was never intended to be used heavily against public nodes.

is the request timeout configurable per chance?

Yes, this is configurable. I can add it as an ENV var. It gets attached to WsProvider as an option called timeout. This would be specific to the Websocket provider though, and is not an option for the http provider.

@Ljzn
Copy link
Author

Ljzn commented Oct 21, 2022

Thanks!

@cdiarra-ledger
Copy link

cdiarra-ledger commented Dec 20, 2022

Hello,
I'm facing the same issue, and I'm running my own full archive nodes (in fact 8 of them).
From time to time, I'm seeing this same error No response received from RPC endpoint in 60s on a specific node, when querying a specific block (example : blocks/13325820).
I receive this message instantly, so I find the "timeout" log quite misleading.

  • When querying the same block on a different node, it works well {…,”success”:true,…}
  • When querying an other block on the same node, it works well {…,”success”:true,…}

So it doesn't seem that there is a timeout issue or a connectivity on our components.

Like @Ljzn, restarting the substrate-api make the query work again.

I'm running sidecar v13.1.0 and node v0.9.3.0

Any ideas ?

@TarikGul
Copy link
Member

Hmm, yea this is an interesting issue becuase it's hard to reproduce. That being said few things to always potentially mitigate these issue:

  1. Use the latest version of SAS which in this case would be 14.2.1

  2. Always use the latest version of the node you are querying.

Questions:
What chain are you referencing specifically, polkadot?

I receive this message instantly, so I find the "timeout" log quite misleading.

Also this is very odd as well, do you have any logs you can attach for these errors so we can check some patterns and try to load test on our end>

@TarikGul
Copy link
Member

Also what version of node.js are you running?

@cdiarra-ledger
Copy link

Hello,
Thanks for answering !
I'm not at liberty to update to the latest versions right now, but I made sure to make this on top our priority list, I hope in the next 2 weeks !
In the meantime I had to implement some sort of self-healing script that will reboot the SAS service whenever we detect a HTTP 500 error. I only kept 2 non production servers without this script, so it will be a little bit more difficult to reproduce the error.

But I kept some logs from the last time where we encoutered this issue :

curl localhost:30002/blocks/13325820
{"code":500,"message":"No response received from RPC endpoint in 60s","stack":"Error: No response received from RPC endpoint in 60s\n    at WsProvider.value (/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:406:32)\n    at Timeout._onTimeout (/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:528:182)\n    at listOnTimeout (node:internal/timers:559:17)\n    at processTimers (node:internal/timers:502:7)","level":"error"}

At the same time, here's the output on the SAS service :

2022-12-21 14:01:26 �[31merror�[39m: RPC-CORE: getBlock(hash?: BlockHash): SignedBlock:: No response received from RPC endpoint in 60s
2022-12-21 14:01:26 �[31merror�[39m: RPC-CORE: getRuntimeVersion(at?: BlockHash): RuntimeVersion:: No response received from RPC endpoint in 60s
2022-12-21 14:01:26 �[31merror�[39m: RPC-CORE: getStorage(key: StorageKey, at?: BlockHash): StorageData:: No response received from RPC endpoint in 60s
2022-12-21 14:01:26 �[31merror�[39m: RPC-CORE: getStorage(key: StorageKey, at?: BlockHash): StorageData:: No response received from RPC endpoint in 60s
2022-12-21 14:01:26 �[31merror�[39m: No response received from RPC endpoint in 60s
Error: No response received from RPC endpoint in 60s
at WsProvider.value (/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:406:32)
at Timeout._onTimeout (/app/node_modules/@polkadot/rpc-provider/cjs/ws/index.js:528:182)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7)

I can confirm again that it took less than 1second to have this error message.

All of our nodes are connected to polkadot mainnet.

node.js version used is : v16.18.1 (LTS)

@TarikGul
Copy link
Member

Hey @cdiarra-ledger Sorry this got lost in the queue. Are you still experiencing this issue?

@TarikGul
Copy link
Member

Due to inactivity I am closing the following issue. Feel free to reopen or open a new issue if you are still facing this problem

@DamianStraszak
Copy link

DamianStraszak commented Oct 12, 2023

Hi!
The issue is still present (getting it on /runtime/spec?at version 17.2). We are quite sure this is caused by some internal caching (perhaps the first attempt fails and is cached), however with no understanding of the internals it's hard to make a diagnosis.

@TarikGul TarikGul reopened this Oct 13, 2023
@TarikGul
Copy link
Member

Hi! The issue is still present (getting it on /runtime/spec?at version 17.2). We are quite sure this is caused by some internal caching (perhaps the first attempt fails and is cached), however with no understanding of the internals it's hard to make a diagnosis.

This is interesting because we do not do any explicit caching ourselves for anything other then the /blocks/:height endpoint, as that is the only one that is actually needed. I will do some research to see if there is any underlying issue happening with the RpcProvider in polkadot-js and see what way we can mitigate this in sidecar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants