Problem: wrong Block.Header.AppHash crashes #284

hpmv · 2021-12-29T19:14:38Z

Describe the bug
My node once in a while crashes with errors like this:

panic: Failed to process committed block (780414:CF2396E752FF8BD24E64AB4F926EB4B2EB488AB070611DFD29F1EFB8AA45B9A0): wrong Block.Header.AppHash.  Expected FB6CC3B73190529D528E5EB3│································
18CBA4FB1E47D48030C952A47C6E2429958F4E29, got 6875B5CB3A6F7FD9C1AC10FE1F16D4DD5A6CE28727408669C4254E36F3A4B351

restarting the node gives a similar error.

To Reproduce
Cannot reproduce reliably other than running the node and this might happen once in a couple of days. I'm using version v0.6.1.

Expected behavior
App really should recover from such errors by automatically reverting to the previous height. Or, a manual tool like state_recover from BSC would also be great. Right now there's no solution other than recovering from a disk backup.

Could a dev tell me how to revert back to the previous height manually via setting leveldb keys? I know I need to set a couple of keys in the Tendermint side, but the app side is too confusing to dig in for me. Thanks!

The text was updated successfully, but these errors were encountered:

tomtau · 2021-12-30T01:20:57Z

@hpmv Thanks for reporting the issue. For the wrong app hash error, it'd be helpful if you could provide more details: which block heights, which network (I assume the mainnet beta?), etc. this happens on. Given it was on v0.6.1, it could also be the case that there were some unnoticed consensus state breaking changes between 0.6.1 and 0.6.5.

The latest Tendermint has a rollback feature, but it hasn't been used in Cosmos SDK yet: cosmos/cosmos-sdk#10281

@yihuang @JayT106 may advise if there's a manual workaround in the meantime.

hpmv · 2021-12-30T04:40:56Z

Thanks @tomtau! The network is mainnet beta, and the height was as shown in the error message: 780414. I just got it again at another height 789757.

What's the version (commit hash) that's supposed to be running in Mainnet beta?

JayT106 · 2021-12-30T04:53:01Z

@hpmv you might try to update the DB state via modifying the wal, remove the latest messages until the previous EndHeight # message. Be careful to back up your data first when using it.
the script tool you can find it in the Tendermint project:
https://github.com/tendermint/tendermint/tree/master/scripts

However, it is not guaranteed work due to the unknown root cause of the appHash crashes. We need more investigations to understand the issue.

hpmv · 2021-12-30T04:55:25Z

Thanks Jay! Is this a known issue in the community (I see a previous bug filed about this too)?

tomtau · 2021-12-30T05:17:35Z

ok, it seems this may be a duplicate issue: #256
That issue was with 0.6.4. It may be also be due to non-deterministic operations, so it could happen irrespective of changes between 0.6.1 and 0.6.5

yihuang · 2022-03-08T03:14:17Z

we observed this in one of our RPC nodes after upgrading to 0.6.6, after inspecting and comparing the iavl storage using iaview tool, we found that this transaction's sender's balance is different between the problematic node and normal node, and the numbers match the hypothesis that the tx is reverted on the problematic node(and the sender's balance is deducted by "gas limit * gas price"), but successfully executed on the normal nodes.

JayT106 · 2022-03-08T16:35:27Z

we observed this in one of our RPC nodes after upgrading to 0.6.6, after inspecting and comparing the iavl storage using iaview tool, we found that this transaction's sender's balance is different between the problematic node and normal node, and the numbers match the hypothesis that the tx is reverted on the problematic node(and the sender's balance is deducted by "gas limit * gas price"), but successfully executed on the normal nodes.

Is the PR #377 the root cause of the AppHash mismatch in 0.6.6?

yihuang · 2022-03-08T16:38:52Z

we observed this in one of our RPC nodes after upgrading to 0.6.6, after inspecting and comparing the iavl storage using iaview tool, we found that this transaction's sender's balance is different between the problematic node and normal node, and the numbers match the hypothesis that the tx is reverted on the problematic node(and the sender's balance is deducted by "gas limit * gas price"), but successfully executed on the normal nodes.

Is the PR #377 the root cause of the AppHash mismatch in 0.6.6?

no, that one is released in 0.6.8, but the issue that happens today is for 0.6.6 and above.

JayT106 · 2022-03-11T15:36:20Z

@hpmv , what's your setup for fast_sync= and statesync.enable=. I am trying to reproduce this, it would be great if you can provide it. Thanks.

JayT106 · 2022-04-05T18:20:26Z

From investigating the recent crashes cases, suspect the EVM module might cause the indeterministic result. But we need more crashed databases to identify which part of the EVM module causes the issue.

yihuang · 2022-05-03T09:16:29Z

App really should recover from such errors by automatically reverting to the previous height. Or, a manual tool like state_recover from BSC would also be great. Right now there's no solution other than recovering from a disk backup.

This rollback command may help in the future: cosmos/cosmos-sdk#11361

yihuang · 2022-05-26T08:35:16Z

cosmos/cosmos-sdk#12012

We believe the root cause is found, and the workaround for now is to increase the file open limit using ulimit.

tomtau assigned yihuang and JayT106 Dec 30, 2021

tomtau changed the title ~~Node crashes with corrupted database: wrong Block.Header.AppHash~~ Problem: wrong Block.Header.AppHash crashes Dec 30, 2021

yihuang closed this as completed Apr 19, 2022

yihuang reopened this Apr 19, 2022

This was referenced May 3, 2022

node losing sync sometimes #314

Closed

Problem: Node stopped working #256

Closed

yihuang mentioned this issue May 18, 2022

Problem: no rollback command to help with app-hash mismatch situation #481

Closed

13 tasks

yihuang closed this as completed May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: wrong Block.Header.AppHash crashes #284

Problem: wrong Block.Header.AppHash crashes #284

hpmv commented Dec 29, 2021

tomtau commented Dec 30, 2021

hpmv commented Dec 30, 2021

JayT106 commented Dec 30, 2021 •

edited

Loading

hpmv commented Dec 30, 2021

tomtau commented Dec 30, 2021

yihuang commented Mar 8, 2022 •

edited

Loading

JayT106 commented Mar 8, 2022

yihuang commented Mar 8, 2022

JayT106 commented Mar 11, 2022 •

edited

Loading

JayT106 commented Apr 5, 2022 •

edited

Loading

yihuang commented May 3, 2022

yihuang commented May 26, 2022

Problem: wrong Block.Header.AppHash crashes #284

Problem: wrong Block.Header.AppHash crashes #284

Comments

hpmv commented Dec 29, 2021

tomtau commented Dec 30, 2021

hpmv commented Dec 30, 2021

JayT106 commented Dec 30, 2021 • edited Loading

hpmv commented Dec 30, 2021

tomtau commented Dec 30, 2021

yihuang commented Mar 8, 2022 • edited Loading

JayT106 commented Mar 8, 2022

yihuang commented Mar 8, 2022

JayT106 commented Mar 11, 2022 • edited Loading

JayT106 commented Apr 5, 2022 • edited Loading

yihuang commented May 3, 2022

yihuang commented May 26, 2022

JayT106 commented Dec 30, 2021 •

edited

Loading

yihuang commented Mar 8, 2022 •

edited

Loading

JayT106 commented Mar 11, 2022 •

edited

Loading

JayT106 commented Apr 5, 2022 •

edited

Loading