Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node sync gets stuck towards the end of the sync process #2122

Closed
nazar-pc opened this issue Oct 14, 2023 · 15 comments
Closed

Node sync gets stuck towards the end of the sync process #2122

nazar-pc opened this issue Oct 14, 2023 · 15 comments
Assignees
Labels
bug Something isn't working node Node (service library/node app) upstream Blocked by upstream

Comments

@nazar-pc
Copy link
Member

I have hit this a few times:

2023-10-14T19:30:41.992407Z [Consensus] ⚙️  Syncing  8.1 bps, target=#40841 (11 peers), best: #38703 (0x5bd7…b64a), finalized #0 (0x1aed…2474), ⬇ 37.3kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:30:46.992644Z [Consensus] ⚙️  Syncing  9.4 bps, target=#40842 (11 peers), best: #38750 (0xe55f…a7ad), finalized #0 (0x1aed…2474), ⬇ 6.3kiB/s ⬆ 25 B/s    
2023-10-14T19:30:51.999666Z [Consensus] ⚙️  Preparing 28.9 bps, target=#40842 (11 peers), best: #38895 (0xc683…2d26), finalized #0 (0x1aed…2474), ⬇ 23.7kiB/s ⬆ 21 B/s    
2023-10-14T19:30:57.000015Z [Consensus] ⚙️  Syncing  8.2 bps, target=#40844 (11 peers), best: #38936 (0x5b71…7823), finalized #0 (0x1aed…2474), ⬇ 10.9kiB/s ⬆ 0.5kiB/s    
2023-10-14T19:31:02.000378Z [Consensus] ⚙️  Syncing 12.4 bps, target=#40846 (11 peers), best: #38998 (0xb01d…9f35), finalized #0 (0x1aed…2474), ⬇ 10.8kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:31:07.000639Z [Consensus] ⚙️  Syncing  7.4 bps, target=#40846 (11 peers), best: #39035 (0x713b…ec1d), finalized #0 (0x1aed…2474), ⬇ 3.2kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:31:12.000965Z [Consensus] ⚙️  Syncing 19.8 bps, target=#40846 (11 peers), best: #39134 (0x5911…ed6e), finalized #0 (0x1aed…2474), ⬇ 3.2kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:31:17.001370Z [Consensus] ⚙️  Syncing 10.0 bps, target=#40846 (11 peers), best: #39184 (0x32f6…2a66), finalized #0 (0x1aed…2474), ⬇ 1.2kiB/s ⬆ 38 B/s    
2023-10-14T19:31:22.004236Z [Consensus] ⚙️  Syncing  6.1 bps, target=#40847 (11 peers), best: #39215 (0x3df2…4543), finalized #0 (0x1aed…2474), ⬇ 4.4kiB/s ⬆ 68 B/s    
2023-10-14T19:31:27.004478Z [Consensus] ⚙️  Syncing 16.8 bps, target=#40848 (11 peers), best: #39299 (0xb63f…a9de), finalized #0 (0x1aed…2474), ⬇ 3.9kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:31:32.006508Z [Consensus] ⚙️  Syncing  9.7 bps, target=#40848 (11 peers), best: #39348 (0x092f…fc92), finalized #0 (0x1aed…2474), ⬇ 1.2kiB/s ⬆ 38 B/s    
2023-10-14T19:31:37.006790Z [Consensus] ⚙️  Syncing 19.4 bps, target=#40850 (11 peers), best: #39445 (0xd533…a113), finalized #0 (0x1aed…2474), ⬇ 8.7kiB/s ⬆ 87 B/s    
2023-10-14T19:31:42.007087Z [Consensus] ⚙️  Syncing  8.2 bps, target=#40851 (11 peers), best: #39486 (0x0a2b…b320), finalized #0 (0x1aed…2474), ⬇ 6.2kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:31:47.007517Z [Consensus] ⚙️  Syncing 11.0 bps, target=#40851 (11 peers), best: #39541 (0x549f…90e5), finalized #0 (0x1aed…2474), ⬇ 6.1kiB/s ⬆ 38 B/s    
2023-10-14T19:31:52.008270Z [Consensus] ⚙️  Syncing 12.0 bps, target=#40852 (11 peers), best: #39601 (0x7624…4604), finalized #0 (0x1aed…2474), ⬇ 5.5kiB/s ⬆ 76 B/s    
2023-10-14T19:31:57.010480Z [Consensus] ⚙️  Syncing 12.5 bps, target=#40853 (11 peers), best: #39664 (0x5372…6398), finalized #0 (0x1aed…2474), ⬇ 11.1kiB/s ⬆ 2.2kiB/s    
2023-10-14T19:32:02.010747Z [Consensus] ⚙️  Syncing 11.2 bps, target=#40854 (11 peers), best: #39720 (0x2e82…18af), finalized #0 (0x1aed…2474), ⬇ 7.3kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:32:07.012347Z [Consensus] ⚙️  Syncing 14.1 bps, target=#40855 (11 peers), best: #39791 (0x88e5…da0f), finalized #0 (0x1aed…2474), ⬇ 9.1kiB/s ⬆ 0.3kiB/s    
2023-10-14T19:32:12.015871Z [Consensus] ⚙️  Syncing 12.7 bps, target=#40857 (11 peers), best: #39855 (0x63f1…870f), finalized #0 (0x1aed…2474), ⬇ 4.7kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:32:17.017797Z [Consensus] ⚙️  Syncing  0.1 bps, target=#40857 (11 peers), best: #39856 (0xc7ac…1e31), finalized #0 (0x1aed…2474), ⬇ 2.1kiB/s ⬆ 44 B/s    
2023-10-14T19:32:22.018099Z [Consensus] ⚙️  Syncing 17.0 bps, target=#40858 (11 peers), best: #39941 (0xf72a…a213), finalized #0 (0x1aed…2474), ⬇ 4.9kiB/s ⬆ 44 B/s    
2023-10-14T19:32:27.018588Z [Consensus] ⚙️  Syncing  8.8 bps, target=#40859 (11 peers), best: #39985 (0xa48c…0a83), finalized #0 (0x1aed…2474), ⬇ 8.3kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:32:32.018890Z [Consensus] ⚙️  Syncing  2.0 bps, target=#40860 (11 peers), best: #39995 (0x1671…0d1d), finalized #0 (0x1aed…2474), ⬇ 6.8kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:32:37.019179Z [Consensus] ⚙️  Syncing  0.0 bps, target=#40860 (11 peers), best: #39995 (0x1671…0d1d), finalized #0 (0x1aed…2474), ⬇ 3.3kiB/s ⬆ 0    
2023-10-14T19:32:42.022384Z [Consensus] ⚙️  Syncing 10.9 bps, target=#40860 (11 peers), best: #40050 (0xf1a4…8041), finalized #0 (0x1aed…2474), ⬇ 2.0kiB/s ⬆ 89 B/s    
2023-10-14T19:32:47.022648Z [Consensus] ⚙️  Syncing  0.0 bps, target=#40861 (11 peers), best: #40050 (0xf1a4…8041), finalized #0 (0x1aed…2474), ⬇ 8.1kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:32:52.025898Z [Consensus] ⚙️  Syncing 12.3 bps, target=#40861 (11 peers), best: #40112 (0x7d94…a657), finalized #0 (0x1aed…2474), ⬇ 4.9kiB/s ⬆ 0    
2023-10-14T19:32:57.028518Z [Consensus] ⚙️  Syncing 12.5 bps, target=#40862 (11 peers), best: #40175 (0xb94f…5108), finalized #0 (0x1aed…2474), ⬇ 10.3kiB/s ⬆ 0.4kiB/s    
2023-10-14T19:33:02.030402Z [Consensus] ⚙️  Syncing  0.3 bps, target=#40862 (11 peers), best: #40177 (0xed72…0921), finalized #0 (0x1aed…2474), ⬇ 6.1kiB/s ⬆ 51 B/s    
2023-10-14T19:33:07.030669Z [Consensus] ⚙️  Syncing  0.2 bps, target=#40862 (11 peers), best: #40178 (0x10f2…0cca), finalized #0 (0x1aed…2474), ⬇ 2.5kiB/s ⬆ 26 B/s    
2023-10-14T19:33:12.032394Z [Consensus] ⚙️  Syncing 24.9 bps, target=#40863 (11 peers), best: #40303 (0x9634…cad5), finalized #0 (0x1aed…2474), ⬇ 5.1kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:33:17.032658Z [Consensus] ⚙️  Syncing  1.4 bps, target=#40865 (11 peers), best: #40310 (0x57f2…ec52), finalized #0 (0x1aed…2474), ⬇ 7.8kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:33:22.034282Z [Consensus] ⚙️  Syncing 11.5 bps, target=#40865 (11 peers), best: #40368 (0xc886…600a), finalized #0 (0x1aed…2474), ⬇ 4.0kiB/s ⬆ 0    
2023-10-14T19:33:27.036154Z [Consensus] ⚙️  Syncing 12.5 bps, target=#40867 (11 peers), best: #40431 (0x5a93…b528), finalized #0 (0x1aed…2474), ⬇ 6.8kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:33:32.036370Z [Consensus] ⚙️  Syncing  2.6 bps, target=#40869 (11 peers), best: #40444 (0x3b38…6c45), finalized #0 (0x1aed…2474), ⬇ 6.2kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:33:37.036688Z [Consensus] ⚙️  Syncing 11.4 bps, target=#40869 (11 peers), best: #40501 (0x5b1c…080d), finalized #0 (0x1aed…2474), ⬇ 4.1kiB/s ⬆ 0    
2023-10-14T19:33:42.041878Z [Consensus] ⚙️  Syncing 11.7 bps, target=#40869 (11 peers), best: #40560 (0xb148…3688), finalized #0 (0x1aed…2474), ⬇ 1.2kiB/s ⬆ 89 B/s    
2023-10-14T19:33:47.044451Z [Consensus] ⚙️  Syncing 12.5 bps, target=#40869 (11 peers), best: #40623 (0x44bb…f382), finalized #0 (0x1aed…2474), ⬇ 2.2kiB/s ⬆ 51 B/s    
2023-10-14T19:33:52.046603Z [Consensus] ⚙️  Syncing 12.7 bps, target=#40870 (11 peers), best: #40687 (0x5a8a…9510), finalized #0 (0x1aed…2474), ⬇ 18.0kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:33:57.046818Z [Consensus] ⚙️  Syncing  3.4 bps, target=#40871 (11 peers), best: #40704 (0xcea3…9035), finalized #0 (0x1aed…2474), ⬇ 4.9kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:34:02.048380Z [Consensus] ⚙️  Syncing  9.3 bps, target=#40872 (11 peers), best: #40751 (0x462c…8cda), finalized #0 (0x1aed…2474), ⬇ 7.7kiB/s ⬆ 95 B/s    
2023-10-14T19:34:07.048742Z [Consensus] ⚙️  Syncing  2.4 bps, target=#40873 (11 peers), best: #40763 (0xb7f4…a8e9), finalized #0 (0x1aed…2474), ⬇ 9.0kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:34:12.049651Z [Consensus] ⚙️  Syncing 10.4 bps, target=#40874 (11 peers), best: #40815 (0x6fd2…8756), finalized #0 (0x1aed…2474), ⬇ 7.6kiB/s ⬆ 0.2kiB/s    
2023-10-14T19:34:17.049969Z [Consensus] ⚙️  Syncing  4.4 bps, target=#40875 (11 peers), best: #40837 (0x03e1…045f), finalized #0 (0x1aed…2474), ⬇ 2.6kiB/s ⬆ 69 B/s    
2023-10-14T19:34:22.050343Z [Consensus] ⚙️  Syncing  1.0 bps, target=#40875 (11 peers), best: #40842 (0xb51d…9b6b), finalized #0 (0x1aed…2474), ⬇ 2.4kiB/s ⬆ 0    
2023-10-14T19:34:27.050614Z [Consensus] ⚙️  Syncing  0.0 bps, target=#40875 (11 peers), best: #40842 (0xb51d…9b6b), finalized #0 (0x1aed…2474), ⬇ 2.5kiB/s ⬆ 89 B/s    
2023-10-14T19:34:32.050948Z [Consensus] ⚙️  Syncing  0.0 bps, target=#40876 (11 peers), best: #40842 (0xb51d…9b6b), finalized #0 (0x1aed…2474), ⬇ 3.4kiB/s ⬆ 0.1kiB/s    
2023-10-14T19:34:37.051223Z [Consensus] ⚙️  Syncing  0.0 bps, target=#40876 (11 peers), best: #40842 (0xb51d…9b6b), finalized #0 (0x1aed…2474), ⬇ 1.1kiB/s ⬆ 0    
2023-10-14T19:34:42.051635Z [Consensus] ⚙️  Syncing  0.0 bps, target=#40876 (11 peers), best: #40842 (0xb51d…9b6b), finalized #0 (0x1aed…2474), ⬇ 2.5kiB/s ⬆ 89 B/s    
2023-10-14T19:34:47.051930Z [Consensus] ⚙️  Syncing  0.0 bps, target=#40876 (11 peers), best: #40842 (0xb51d…9b6b), finalized #0 (0x1aed…2474), ⬇ 2.4kiB/s ⬆ 51 B/s    
2023-10-14T20:59:50.895531Z [Consensus] ⚙️  Syncing  0.0 bps, target=#41811 (11 peers), best: #39938 (0xe3b2…b00c), finalized #0 (0x1aed…2474), ⬇ 5.4kiB/s ⬆ 0.1kiB/s    
2023-10-14T20:59:55.896008Z [Consensus] ⚙️  Syncing  0.0 bps, target=#41813 (11 peers), best: #39938 (0xe3b2…b00c), finalized #0 (0x1aed…2474), ⬇ 6.5kiB/s ⬆ 0    
2023-10-14T21:00:00.896976Z [Consensus] ⚙️  Preparing  0.0 bps, target=#41814 (11 peers), best: #39938 (0xe3b2…b00c), finalized #0 (0x1aed…2474), ⬇ 1.1MiB/s ⬆ 0.5kiB/s    
2023-10-14T21:00:05.897407Z [Consensus] ⚙️  Preparing  0.0 bps, target=#41815 (11 peers), best: #39938 (0xe3b2…b00c), finalized #0 (0x1aed…2474), ⬇ 10.1kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:00:10.898322Z [Consensus] ⚙️  Preparing 11.4 bps, target=#41816 (11 peers), best: #39995 (0x1671…0d1d), finalized #0 (0x1aed…2474), ⬇ 9.0kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:00:15.898510Z [Consensus] ⚙️  Preparing  0.0 bps, target=#41816 (11 peers), best: #39995 (0x1671…0d1d), finalized #0 (0x1aed…2474), ⬇ 5.1kiB/s ⬆ 0    
2023-10-14T21:00:20.903920Z [Consensus] ⚙️  Preparing 13.1 bps, target=#41818 (11 peers), best: #40061 (0x5a99…59eb), finalized #0 (0x1aed…2474), ⬇ 9.2kiB/s ⬆ 0.4kiB/s    
2023-10-14T21:00:25.906345Z [Consensus] ⚙️  Preparing 14.7 bps, target=#41819 (11 peers), best: #40135 (0xa5b6…81ee), finalized #0 (0x1aed…2474), ⬇ 8.6kiB/s ⬆ 44 B/s    
2023-10-14T21:00:30.906573Z [Consensus] ⚙️  Preparing  7.0 bps, target=#41820 (11 peers), best: #40170 (0xa0e9…39c0), finalized #0 (0x1aed…2474), ⬇ 21.7kiB/s ⬆ 0.3kiB/s    
2023-10-14T21:00:35.908228Z [Consensus] ⚙️  Preparing 10.7 bps, target=#41821 (11 peers), best: #40224 (0x3cc9…f08b), finalized #0 (0x1aed…2474), ⬇ 8.3kiB/s ⬆ 0.3kiB/s    
2023-10-14T21:00:40.913853Z [Consensus] ⚙️  Preparing 10.1 bps, target=#41822 (11 peers), best: #40275 (0x4325…ecfc), finalized #0 (0x1aed…2474), ⬇ 7.3kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:00:45.914134Z [Consensus] ⚙️  Preparing  7.0 bps, target=#41822 (11 peers), best: #40310 (0x57f2…ec52), finalized #0 (0x1aed…2474), ⬇ 14.2kiB/s ⬆ 0    
2023-10-14T21:00:50.917849Z [Consensus] ⚙️  Preparing 12.7 bps, target=#41825 (11 peers), best: #40374 (0xa713…149e), finalized #0 (0x1aed…2474), ⬇ 10.7kiB/s ⬆ 0.4kiB/s    
2023-10-14T21:00:55.924043Z [Consensus] ⚙️  Preparing 10.1 bps, target=#41826 (11 peers), best: #40425 (0x809b…f1d8), finalized #0 (0x1aed…2474), ⬇ 4.5kiB/s ⬆ 44 B/s    
2023-10-14T21:01:00.924287Z [Consensus] ⚙️  Preparing  7.8 bps, target=#41827 (11 peers), best: #40464 (0x983e…cc4d), finalized #0 (0x1aed…2474), ⬇ 14.0kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:01:05.924528Z [Consensus] ⚙️  Preparing 12.8 bps, target=#41827 (11 peers), best: #40528 (0x05c6…1101), finalized #0 (0x1aed…2474), ⬇ 6.3kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:01:10.924803Z [Consensus] ⚙️  Preparing 14.8 bps, target=#41828 (11 peers), best: #40602 (0xcfa4…b5ad), finalized #0 (0x1aed…2474), ⬇ 10.6kiB/s ⬆ 92 B/s    
2023-10-14T21:01:15.925194Z [Consensus] ⚙️  Preparing 13.6 bps, target=#41830 (11 peers), best: #40670 (0xf8b9…685c), finalized #0 (0x1aed…2474), ⬇ 11.0kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:01:20.925524Z [Consensus] ⚙️  Preparing  6.8 bps, target=#41831 (11 peers), best: #40704 (0xcea3…9035), finalized #0 (0x1aed…2474), ⬇ 13.2kiB/s ⬆ 0.4kiB/s    
2023-10-14T21:01:25.935631Z [Consensus] ⚙️  Preparing 13.1 bps, target=#41832 (11 peers), best: #40770 (0xc9b9…21f5), finalized #0 (0x1aed…2474), ⬇ 11.9kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:01:30.935924Z [Consensus] ⚙️  Preparing  3.0 bps, target=#41832 (11 peers), best: #40785 (0x9099…f498), finalized #0 (0x1aed…2474), ⬇ 8.8kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:01:35.936183Z [Consensus] ⚙️  Syncing 10.4 bps, target=#41833 (11 peers), best: #40837 (0x03e1…045f), finalized #0 (0x1aed…2474), ⬇ 5.9kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:01:40.936455Z [Consensus] ⚙️  Syncing 14.4 bps, target=#41833 (11 peers), best: #40909 (0x46a0…f087), finalized #0 (0x1aed…2474), ⬇ 6.4kiB/s ⬆ 25 B/s    
2023-10-14T21:01:45.938094Z [Consensus] ⚙️  Syncing 13.1 bps, target=#41833 (11 peers), best: #40975 (0xfff2…8f38), finalized #0 (0x1aed…2474), ⬇ 3.3kiB/s ⬆ 0    
2023-10-14T21:01:50.938361Z [Consensus] ⚙️  Syncing 14.4 bps, target=#41833 (11 peers), best: #41047 (0xb7e0…3d46), finalized #0 (0x1aed…2474), ⬇ 2.3kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:01:55.938608Z [Consensus] ⚙️  Syncing  9.4 bps, target=#41835 (11 peers), best: #41094 (0xfa8e…89ca), finalized #0 (0x1aed…2474), ⬇ 14.8kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:02:00.938889Z [Consensus] ⚙️  Syncing 15.0 bps, target=#41838 (11 peers), best: #41169 (0x1d41…ffdb), finalized #0 (0x1aed…2474), ⬇ 9.8kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:02:05.941629Z [Consensus] ⚙️  Syncing 17.3 bps, target=#41839 (11 peers), best: #41256 (0xda1a…df9e), finalized #0 (0x1aed…2474), ⬇ 10.6kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:02:10.941913Z [Consensus] ⚙️  Syncing  7.8 bps, target=#41840 (11 peers), best: #41295 (0xa232…4bca), finalized #0 (0x1aed…2474), ⬇ 5.7kiB/s ⬆ 70 B/s    
2023-10-14T21:02:15.942132Z [Consensus] ⚙️  Syncing 13.2 bps, target=#41841 (11 peers), best: #41361 (0x60ed…9562), finalized #0 (0x1aed…2474), ⬇ 8.5kiB/s ⬆ 60 B/s    
2023-10-14T21:02:20.942392Z [Consensus] ⚙️  Syncing 13.0 bps, target=#41842 (11 peers), best: #41426 (0xfb98…bf89), finalized #0 (0x1aed…2474), ⬇ 11.6kiB/s ⬆ 0.5kiB/s    
2023-10-14T21:02:25.942761Z [Consensus] ⚙️  Syncing 14.8 bps, target=#41843 (11 peers), best: #41500 (0x3e89…bb9f), finalized #0 (0x1aed…2474), ⬇ 4.4kiB/s ⬆ 44 B/s    
2023-10-14T21:02:30.943370Z [Consensus] ⚙️  Syncing 14.6 bps, target=#41844 (11 peers), best: #41573 (0x25fc…193e), finalized #0 (0x1aed…2474), ⬇ 8.8kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:02:35.947137Z [Consensus] ⚙️  Syncing 13.7 bps, target=#41844 (11 peers), best: #41642 (0x8fe2…e223), finalized #0 (0x1aed…2474), ⬇ 5.6kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:02:40.949799Z [Consensus] ⚙️  Syncing 13.1 bps, target=#41845 (11 peers), best: #41708 (0x24d1…b162), finalized #0 (0x1aed…2474), ⬇ 7.4kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:02:45.950072Z [Consensus] ⚙️  Syncing 11.0 bps, target=#41847 (11 peers), best: #41763 (0x22ae…5bbe), finalized #0 (0x1aed…2474), ⬇ 7.3kiB/s ⬆ 63 B/s    
2023-10-14T21:02:50.950336Z [Consensus] ⚙️  Syncing  6.8 bps, target=#41849 (11 peers), best: #41797 (0x300a…920e), finalized #0 (0x1aed…2474), ⬇ 7.7kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:02:55.950594Z [Consensus] ⚙️  Syncing  3.8 bps, target=#41850 (11 peers), best: #41816 (0x607e…1fe9), finalized #0 (0x1aed…2474), ⬇ 10.8kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:03:00.950834Z [Consensus] ⚙️  Syncing  1.6 bps, target=#41851 (11 peers), best: #41824 (0x6f38…585a), finalized #0 (0x1aed…2474), ⬇ 6.1kiB/s ⬆ 52 B/s    
2023-10-14T21:03:05.951075Z [Consensus] ⚙️  Syncing  1.4 bps, target=#41853 (11 peers), best: #41831 (0x9c6a…f33f), finalized #0 (0x1aed…2474), ⬇ 5.8kiB/s ⬆ 0.2kiB/s    
2023-10-14T21:03:10.951323Z [Consensus] ⚙️  Syncing  0.2 bps, target=#41856 (11 peers), best: #41832 (0xe575…f284), finalized #0 (0x1aed…2474), ⬇ 9.2kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:03:15.951578Z [Consensus] ⚙️  Syncing  0.0 bps, target=#41856 (11 peers), best: #41832 (0xe575…f284), finalized #0 (0x1aed…2474), ⬇ 4.1kiB/s ⬆ 0    
2023-10-14T21:03:20.951882Z [Consensus] ⚙️  Syncing  0.0 bps, target=#41857 (11 peers), best: #41832 (0xe575…f284), finalized #0 (0x1aed…2474), ⬇ 11.4kiB/s ⬆ 2.4kiB/s    
2023-10-14T21:03:25.952136Z [Consensus] ⚙️  Syncing  0.0 bps, target=#41858 (11 peers), best: #41832 (0xe575…f284), finalized #0 (0x1aed…2474), ⬇ 10.7kiB/s ⬆ 0.1kiB/s    
2023-10-14T21:03:30.952399Z [Consensus] ⚙️  Syncing  0.0 bps, target=#41859 (11 peers), best: #41832 (0xe575…f284), finalized #0 (0x1aed…2474), ⬇ 11.0kiB/s ⬆ 0.3kiB/s    

In both cases node exits "Preparing" state back into "Syncing" (though I wouldn't expect that to happen) and then gets stuck at whatever height was the target at the moment of exiting "Preparing" state. Syncs successfully on restart and doesn't always get stuck in the first place.

Not yet sure what the reason is, will try to collect trace logs for Substrate sync.

@nazar-pc nazar-pc added bug Something isn't working node Node (service library/node app) labels Oct 14, 2023
@nazar-pc nazar-pc added this to the Protocol UX Improvements milestone Oct 14, 2023
@nazar-pc
Copy link
Member Author

Logs are filled with these (block number increases over time, same number for all peers, but always off by one):

2023-10-14T21:36:42.406577Z TRACE tokio-runtime-worker sync: [Consensus] Out of range for peer 12D3KooWM6vKrQ7n7AHinAddGQhQBNg4ctfM1PLHvNXqVPbfjiKa (42219 vs 42218)    
2023-10-14T21:36:42.406618Z TRACE tokio-runtime-worker sync: [Consensus] Out of range for peer 12D3KooWFroqEkrSuW3XNqrHM5e3n6fdTMM43vKxXSAkBzkd8Bo9 (42219 vs 42218)    

Looks like off by one somewhere.

Full log:
node-debug-sync.log.zip

@rahulksnv
Copy link
Contributor

@nazar-pc I didn't have much luck reproducing the issue (in close to 10 attempts). Also that the devnet chain is almost 90k blocks now makes the retries time consuming.

I also see these logs in my node frequently, but the node is able to sync fine. On reading the code, these logs should are expected when the peer.common == peer.best (i.e) all caught up. Think these can be muted separately.

Since you are able to see the issue frequently on your set up, can you help me with these:

  1. I have some debug logs + fixes for failing tests in our upstream fork: autonomys/polkadot-sdk@c03d8ea (would be good to merge these in our fork as well, for any future debug). Can you please patch this in and give a few tries?

    I already have a frontier branch that points to this version if you want to use: autonomys/frontier@86da7ec. (and point monorepo to these updated upstream + frontier commits)

  2. Attach gdb when this happens, to rule out any issues like we had last week. I don't see anything bad in your logs, the sync just seems stuck for some reason

@dariolina
Copy link
Member

FWIW my node has not unstuck yet so I can archive the whole thing if needed

@rahulksnv
Copy link
Contributor

rahulksnv commented Oct 19, 2023

Think I understand the sequence now. While the node is syncing, a reorg needs to happen among the peers for the issue to happen. As the reorg is more likely near the tip, this would explain why nodes get stuck as they near the end of the sync process.

The node is at 93313 during the sync, which is close to the tip/completion.

  1. Peers announce block 93314/0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184. Node downloads the block from one of them, tip moves to 93314, peer common number is updated: 93313 -> 93314
7308:2023-10-18T19:36:06.362861Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWLdPnDksKX43hEgV4N9EeJzm6x8wPWkHCvRaFBZvj4GBg: AnnouncementSummary { block_hash: 0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7309 2023-10-18T19:36:06.362920Z TRACE tokio-runtime-worker sync: [Consensus] New block request for 12D3KooWLdPnDksKX43hEgV4N9EeJzm6x8wPWkHCvRaFBZvj4GBg, (best:93314, common:93313) BlockRequest { id:       0, fields: HEADER | BODY | JUSTIFICATION, from: Hash(0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184), direction: Descending, max: Some(1) } 
7311:2023-10-18T19:36:06.376254Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWFroqEkrSuW3XNqrHM5e3n6fdTMM43vKxXSAkBzkd8Bo9: AnnouncementSummary { block_hash: 0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7314:2023-10-18T19:36:06.386013Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWJ216EP67meQU8fGJcYk1o4XcgUhyJMfDygaEjJstSQhJ: AnnouncementSummary { block_hash: 0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7317:2023-10-18T19:36:06.435406Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWMyNP2Pad4VuDgsiwXnmjrfvJXYWVRjn1QRmwwxnLBKq8: AnnouncementSummary { block_hash: 0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7320:2023-10-18T19:36:06.450356Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWMJEpocBKY6fZyjnb9zNwn3KuTHHTZ8hGdeDxBLR5C4Ri: AnnouncementSummary { block_hash: 0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7323:2023-10-18T19:36:06.452631Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWM6vKrQ7n7AHinAddGQhQBNg4ctfM1PLHvNXqVPbfjiKa: AnnouncementSummary { block_hash: 0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7326:2023-10-18T19:36:06.456067Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWLxpv7TaSBvJfRdfRahiy78Zoxg4eJteFWtRW99d5Fnv9: AnnouncementSummary { block_hash: 0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7332 2023-10-18T19:36:06.472946Z TRACE tokio-runtime-worker sync: [Consensus] BlockResponse 0 from 12D3KooWLdPnDksKX43hEgV4N9EeJzm6x8wPWkHCvRaFBZvj4GBg with 1 blocks  (93314)
7333 2023-10-18T19:36:06.472956Z TRACE tokio-runtime-worker sync: [Consensus] Reversing incoming block list
7334 2023-10-18T19:36:06.472995Z TRACE tokio-runtime-worker sync: [Consensus] 1 blocks ready for import
7335 2023-10-18T19:36:06.473000Z TRACE tokio-runtime-worker sync: [Consensus] Accepted 1 blocks (0xc14b44f7e8fc8415292dd485c01933815f41d7cbd10ab65276d3a6f7a63d9184) with origin NetworkInitialSync
7336 2023-10-18T19:36:06.473005Z TRACE tokio-runtime-worker sync: [Consensus] Updating peer 12D3KooWGU5VUqPyBUzg9kFKz7YGHNcYfupfCVJuHoaocVzMLN4B info, ours=93314, common=29238->29238, their best=29238
7337 2023-10-18T19:36:06.473011Z TRACE tokio-runtime-worker sync: [Consensus] Updating peer 12D3KooWMyNP2Pad4VuDgsiwXnmjrfvJXYWVRjn1QRmwwxnLBKq8 info, ours=93314, common=93313->93314, their best=93314
7338 2023-10-18T19:36:06.473017Z TRACE tokio-runtime-worker sync: [Consensus] Updating peer 12D3KooWLxpv7TaSBvJfRdfRahiy78Zoxg4eJteFWtRW99d5Fnv9 info, ours=93314, common=93313->93314, their best=93314
7339 2023-10-18T19:36:06.473021Z TRACE tokio-runtime-worker sync: [Consensus] Updating peer 12D3KooWFroqEkrSuW3XNqrHM5e3n6fdTMM43vKxXSAkBzkd8Bo9 info, ours=93314, common=93313->93314, their best=93314
7340 2023-10-18T19:36:06.473026Z TRACE tokio-runtime-worker sync: [Consensus] Updating peer 12D3KooWScSDffzjSAVftWfYs7326JgFUBx7qeaTwfmirSLouDLF info, ours=93314, common=29558->29558, their best=29558
7341 2023-10-18T19:36:06.473031Z TRACE tokio-runtime-worker sync: [Consensus] Updating peer 12D3KooWJVvRRaHjYzFuSUW3vaaMq71YUeb38qi9iXeazFv5HwPA info, ours=93314, common=93313->93313, their best=93313
7342 2023-10-18T19:36:06.473036Z TRACE tokio-runtime-worker sync: [Consensus] Updating peer 12D3KooWM6vKrQ7n7AHinAddGQhQBNg4ctfM1PLHvNXqVPbfjiKa info, ours=93314, common=93313->93314, their best=93314
  1. Now there appears to be re-org in the network. Peers re-announce same block with different hash: 93314/0x20aba21707f04b462277815feb939c10952498511cead0fbd0c36a651f062f99
7350:2023-10-18T19:36:06.495300Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWR7N8P4sLiW8ptWhWgCQQ8ruZ1CgM2P27NJRLSEFTKLF1: AnnouncementSummary { block_hash: 0x20aba21707f04b462277815feb939c10952498511cead0fbd0c36a651f062f99, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7365:2023-10-18T19:36:06.626703Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWKpf6e8CJ3sCEeFhLLns9kLWuPCipWwVdcEXrn8HJ4FWq: AnnouncementSummary { block_hash: 0x20aba21707f04b462277815feb939c10952498511cead0fbd0c36a651f062f99, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true. 
7367:2023-10-18T19:36:06.633286Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWMJEpocBKY6fZyjnb9zNwn3KuTHHTZ8hGdeDxBLR5C4Ri: AnnouncementSummary { block_hash: 0x20aba21707f04b462277815feb939c10952498511cead0fbd0c36a651f062f99, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true.
7369:2023-10-18T19:36:06.642785Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWM6vKrQ7n7AHinAddGQhQBNg4ctfM1PLHvNXqVPbfjiKa: AnnouncementSummary { block_hash: 0x20aba21707f04b462277815feb939c10952498511cead0fbd0c36a651f062f99, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true. 
7371:2023-10-18T19:36:06.645986Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWLxpv7TaSBvJfRdfRahiy78Zoxg4eJteFWtRW99d5Fnv9: AnnouncementSummary { block_hash: 0x20aba21707f04b462277815feb939c10952498511cead0fbd0c36a651f062f99, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true. 
7373:2023-10-18T19:36:06.648252Z TRACE tokio-runtime-worker sync: [Consensus] Block announcement validated successfully: from 12D3KooWMyNP2Pad4VuDgsiwXnmjrfvJXYWVRjn1QRmwwxnLBKq8: AnnouncementSummary { block_hash: 0x20aba21707f04b462277815feb939c10952498511cead0fbd0c36a651f062f99, number: 93314, parent_hash: 0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e, state: Some(Best) }. Local best: true. 
  1. The issue: the new block is on a different fork. So the common number should have moved back to the 93313 /0x2ad32e691b237aebea6d2d6fa90b3f1c28518b2c0846a72058c0e7ab5889a34e. Or should have triggered an ancestor search. But neither is done in the current sync code.

After this, we download blocks 93315 onwards from the new fork. But can't find the parent since 93314 is from the old fork, and we stall.

Am in the process of adding an unit test to verify the scenario, and look at potential fixes (also start with upstream folks with more concrete data)

@nazar-pc
Copy link
Member Author

Thanks for the details!
Is this related to recent changes aimed at sync fix?
I just want to make sure we have a "plan B" in case we don't fix this today/tomorrow.

@rahulksnv
Copy link
Contributor

With the sync fix, we don't try to import. Instead we see this message:
7531:2023-10-18T19:36:15.908789Z TRACE tokio-runtime-worker sync: [Consensus] Ready block parent not on chain: block_status=Ok(Unknown), best_queued=93314/0xc14b…9184, first_ready=93315/0xa13e…45d8, first_ready_parent=0x20ab…2f99

Without the fix, we would have tried to import that would have failed as parent not found, and restarted the sync process(the original symptom fixed by paritytech/polkadot-sdk#1812). Now instead we stall, for different reason though. If the common number is somehow restored, the sync should complete because of 1812.

I currently don't have concrete proposal for a fix, need to think more. One option is to trigger ancestry search, but the sync state machine is pretty obtuse, worried about breaking something else.

Worst case work around would be to just kill/restart the node binary if it gets stuck :-

@rahulksnv
Copy link
Contributor

The last fix didn't cover all the cases. I am reverting it and going to think about it from a high level, instead of doing bandaids. The main issue can be seen here:

4787:2023-10-19T14:26:37.229090Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWMJEpocBKY6fZyjnb9zNwn3KuTHHTZ8hGdeDxBLR5C4Ri: 104381, hash 0xa0c4…2c0e -> 0x6b9f…b856, known_parent = true, AnnouncementSummary { block_hash: 0x6b9f932e4f734a4baa6946bdcf1572450d2ac9c5b9292bd1a88bf1dddac5b856, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4792:2023-10-19T14:26:37.240025Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWMyNP2Pad4VuDgsiwXnmjrfvJXYWVRjn1QRmwwxnLBKq8: 104381, hash 0xa0c4…2c0e -> 0x6b9f…b856, known_parent = true, AnnouncementSummary { block_hash: 0x6b9f932e4f734a4baa6946bdcf1572450d2ac9c5b9292bd1a88bf1dddac5b856, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4796:2023-10-19T14:26:37.265168Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWM6vKrQ7n7AHinAddGQhQBNg4ctfM1PLHvNXqVPbfjiKa: 104381, hash 0xa0c4…2c0e -> 0x6b9f…b856, known_parent = true, AnnouncementSummary { block_hash: 0x6b9f932e4f734a4baa6946bdcf1572450d2ac9c5b9292bd1a88bf1dddac5b856, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4799:2023-10-19T14:26:37.265493Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWLxpv7TaSBvJfRdfRahiy78Zoxg4eJteFWtRW99d5Fnv9: 104381, hash 0xa0c4…2c0e -> 0x6b9f…b856, known_parent = true, AnnouncementSummary { block_hash: 0x6b9f932e4f734a4baa6946bdcf1572450d2ac9c5b9292bd1a88bf1dddac5b856, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4803:2023-10-19T14:26:37.280099Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWKpf6e8CJ3sCEeFhLLns9kLWuPCipWwVdcEXrn8HJ4FWq: 104381, hash 0x6b9f…b856 -> 0xa0c4…2c0e, known_parent = true, AnnouncementSummary { block_hash: 0xa0c47672b8f1af0fd3d4d49b7c56ae9fc7e636deff697a7c127ab4a3b8582c0e, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4806:2023-10-19T14:26:37.313727Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWR7N8P4sLiW8ptWhWgCQQ8ruZ1CgM2P27NJRLSEFTKLF1: 104381, hash 0xa0c4…2c0e -> 0x6b9f…b856, known_parent = true, AnnouncementSummary { block_hash: 0x6b9f932e4f734a4baa6946bdcf1572450d2ac9c5b9292bd1a88bf1dddac5b856, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4829:2023-10-19T14:26:37.471679Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWFroqEkrSuW3XNqrHM5e3n6fdTMM43vKxXSAkBzkd8Bo9: 104381, hash 0x6b9f…b856 -> 0xa0c4…2c0e, known_parent = true, AnnouncementSummary { block_hash: 0xa0c47672b8f1af0fd3d4d49b7c56ae9fc7e636deff697a7c127ab4a3b8582c0e, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4832:2023-10-19T14:26:37.480082Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWLdPnDksKX43hEgV4N9EeJzm6x8wPWkHCvRaFBZvj4GBg: 104381, hash 0x6b9f…b856 -> 0xa0c4…2c0e, known_parent = true, AnnouncementSummary { block_hash: 0xa0c47672b8f1af0fd3d4d49b7c56ae9fc7e636deff697a7c127ab4a3b8582c0e, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }
4835:2023-10-19T14:26:37.480660Z TRACE tokio-runtime-worker sync: [Consensus] Reorg detected on block announce from 12D3KooWJ216EP67meQU8fGJcYk1o4XcgUhyJMfDygaEjJstSQhJ: 104381, hash 0x6b9f…b856 -> 0xa0c4…2c0e, known_parent = true, AnnouncementSummary { block_hash: 0xa0c47672b8f1af0fd3d4d49b7c56ae9fc7e636deff697a7c127ab4a3b8582c0e, number: 104381, parent_hash: 0xb24afe5f38b11c1a207b9e8c7f41b0c65772258d5862fbea52b547ed1202fcb3, state: Some(Best) }

The reorg happens and the peers are split in two groups (0xa0c4…2c0e, 0x6b9f…b856). From what I can tell, the sync logic to download blocks (BlockCollection) works only on block number ranges and requests ranges across the peers and tries to merge them into one sequence, even though the peers are on different chains. It seems to assume peers are on same chain, unless I am missing something.

@nazar-pc
Copy link
Member Author

nazar-pc commented Oct 20, 2023

Are you able to create a test case with this information? Also can you create an upstream discussion about this now that you know more about it?

@rahulksnv
Copy link
Contributor

rahulksnv commented Oct 20, 2023

yeah I can create a test the test(there are some existing tests in blocks.rs, we would need a repro test from the sync layer instead). And let me start something on the polkadot forum

@rahulksnv
Copy link
Contributor

A simplified explanation of the issue:

  1. Node downloads/imports block 93314 from peer 1
  2. Reorg happens, peer 2 announces blocks 93314, 93315 from a different fork. Node downloads block 93315 from peer 2. Now 93314 and 93315 are from different branches
  3. Because of Check for parent of first ready block being on chain paritytech/polkadot-sdk#1812, the downloaded 93315 is not imported as parent is not known
  4. This would have been fine. 1812 expects 93314 to be re-downloaded from the new branch and blocks 93314, 93315 re-imported (the PR had unit tests that verified this works as expected). Except that an unrelated path(re-org related) advances the common number for the peers, which prevents 93314 to be re-imported.

So node remains stuck. The downloaded blocks are not imported, no new requests go out after this.

@nazar-pc
Copy link
Member Author

So looks like numbers should be advanced individually and only if common block is actually common, not just at the same block height.

@rahulksnv
Copy link
Contributor

yes, it goes back to the analysis above: #2122 (comment)

It seems to assume peers are on same chain

This works fine if no re-org (nodes are on same chain), so just block number based logic is good. In the presence of re-orgs, this breaks and run into issues.

Similarly, BlockCollection::needed_blocks() and friends seem to assume there is single chain and work with block number ranges. They mostly manage "holes" in block number range and try to download from peers that advertised the range, without worrying about the parent hash relationship

I am trying to add tests in the upstream to confirm this as well.

TL:DR - the sync layer should manage a tree of forks, where blocks are linked to parent blocks by hash. This understandably will be a bigger change

@rahulksnv
Copy link
Contributor

Yesterday I tried a simpler approach to move the common number back on re-org, but that ran into other issues

bkchr pushed a commit to paritytech/polkadot-sdk that referenced this issue Oct 20, 2023
…#1950)

This reverts #1812 until
we know why it causes syncing issues reported in
autonomys/subspace#2122.
@rahulksnv
Copy link
Contributor

So looks like numbers should be advanced individually and only if common block is actually common, not just at the same block height.

I was looking at something similar last two days, am going to continue with this today and see how it goes (and hopefully we can bring back the rolled back 1812)

@nazar-pc nazar-pc added the upstream Blocked by upstream label Oct 25, 2023
@nazar-pc
Copy link
Member Author

paritytech/polkadot-sdk#493 was re-resolved with paritytech/polkadot-sdk#2045, which landed in #2424, so I'm closing this for now

bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this issue Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working node Node (service library/node app) upstream Blocked by upstream
Projects
Development

No branches or pull requests

3 participants