-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Thread 'IO Worker #0' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"' #7334
Comments
@The-Raa Could you please scan your hard drive for issues? It looks like a hardware issue to me. |
I have run fsck.ext4 after the last failure of the bug #7424, it said everything was ok. |
I'm having the same issues. SSD seems ok. |
This is what I get with a custom-built parity. I hope it will help.
|
Log from a dev [unoptimized + debuginfo] build:
|
We’ve encountered the same issue on multiple machines running on ssd drives too |
I also experienced a db corruption issue on a running Parity node during the last days (dedicated server hardware, SSD disk):
|
Do we have to wait for 1.9 to have this fixed? I can sync in light mode, but then I don't seem to have access to any tokens or Dapp interaction, so not really ideal for my uses. Im tempted to try on a new SSD but it seems there are already users who experience this issue across multiple SSDs `==================== stack backtrace: Thread 'Verifier #0' panicked at 'Low-level database error. Some issue with your hard disk?: "Corruption: Snappy not supported or corrupted Snappy compressed block contents"', /checkout/src/libcore/result.rs:906 |
The error |
I always close these type of reports as "hardware failure", but the recent spike of reports indicates some other issues. Also, users were checking their devices and couldn't find any indicators for hardware issues. |
The issue can be reproduced more frequently on the latest release by killing the daemon without doing a proper shutdown. Didn’t see this so often on 1.7.10 I don’t think it’s hardware related as we encountered this running on different azure instances and various different bare metal servers all running enterprise level ssd/nvme drives |
@mtbitcoin did you experience it when doing proper shtudown as well? Perhaps some db tuning in recent versions increased the amount of data that needs to be synchronized to disk, that would explain why it happens more often now. Seems more like a db-synchronization-on-shutdown issue then. |
@tomusdrw I cannot say for sure. We run a lot of nodes that get auto-restarted. But i did notice that with 1.7.11 it happened more often and normally after a restart. Then again it could have been the monitoring service restarting the node because it had already crashed. We've moved to "graceful" shutdowns vs a task kill and haven't seen much of this anymore. |
Might be related to the segfault on shutdown. Can't find the related ticket. |
@mtbitcoin How do you execute a "graceful" shutdown? |
Or any one have specs I'd need to run on a cloud somewhere? Or be patient and wait for 1.9? |
I also encounter similar issue. Maybe downgrading rocksdb would help? I didn't experience this kind of error few months before. |
If it's any help, I encountered this error about 1h after freshly warping with 1.8.5 beta. https://gist.github.com/danuker/ec350847ca0ce7784d1183b8147ffecf However, after restarting warp with 1.8.6 beta, it didn't happen (at least the first time). |
I have the same issue, I'm using 1.8.6-stable. I'm subscribing to this topic. |
We only try to repair the DB when we get a corruption error on open, maybe we should check all the calls to RocksDB for corruption and trigger a repair? |
Built from the Last night's attempt:
Followed by many more status messages and the occasional repeat of the "DB corrupted" message. It did not crash at this point, but it stopped syncing and never resumed. I killed it manually this morning, since it had been running all night. Upon restarting:
After running a
Upon restarting:
I've been experiencing these corruption issues for a while now, through many upgrades (was on 1.8.6 until last night), with both HDD and SSD (with appropriate settings in I'm attempting to run a full archive sync from scratch, with transaction tracing enabled. Relevant section of
Prior to this I've changed the Third attempt:
After restarting:
|
@DeviateFish-2 thanks for confirming, we were already suspecting something like this. but we are not out of ideas yet :) cc @andresilva |
I have found a possible source of database corruption, we're not shutting down cleanly. I'm working on a fix. |
any updates on a fix yet?
…On Wed, Jan 24, 2018 at 2:03 PM, André Silva ***@***.***> wrote:
I have found a possible source of database corruption, we're not shutting
down cleanly. I'm working on a fix.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7334 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AdW5mgPW5necJa0_zGNjl8-_Vb6on0uzks5tN5qPgaJpZM4RG9dx>
.
|
Yeah upgrade to 1.9.2 |
Before filing a new issue, please provide the following information.
Your issue description goes here below. Try to include actual vs. expected behavior and steps to reproduce the issue.
trying to sync parity and receiving an error of the following;
2017-12-19 12:00:42 UTC Syncing #2158564 6b84…a609 306 blk/s 2302 tx/s 64 Mgas/s 0+ 7476 Qed #2166040 25/25 peers 6 MiB chain 100 MiB db 55 MiB queue 19 MiB sync RPC: 1 conn, 12 req/s, 32 µs
2017-12-19 12:00:52 UTC Syncing #2160982 c978…5f4b 243 blk/s 2445 tx/s 70 Mgas/s 0+ 5310 Qed #2166294 25/25 peers 5 MiB chain 100 MiB db 36 MiB queue 20 MiB sync RPC: 1 conn, 13 req/s, 32 µs
2017-12-19 12:01:02 UTC Syncing #2163741 342a…c1f9 273 blk/s 1836 tx/s 53 Mgas/s 0+ 2583 Qed #2166327 25/25 peers 4 MiB chain 100 MiB db 19 MiB queue 25 MiB sync RPC: 1 conn, 14 req/s, 32 µs
====================
stack backtrace:
0: 0x7ff704834812 - hid_error
1: 0x7ff704834cf3 - hid_error
2: 0x7ff70406b124 -
3: 0x7ff7049a5544 - hid_error
4: 0x7ff7049a53b9 - hid_error
5: 0x7ff7049a5292 - hid_error
6: 0x7ff7049a5200 - hid_error
7: 0x7ff7049ae86f - hid_error
8: 0x7ff7041ef8b1 -
9: 0x7ff7042c1bff -
10: 0x7ff704199f3d -
11: 0x7ff70419fdea -
12: 0x7ff7049a69d2 - hid_error
13: 0x7ff7041efa46 -
14: 0x7ff7049a33dc - hid_error
15: 0x7fff4f0a1fe4 - BaseThreadInitThunk
Thread 'IO Worker #0' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', src\libcore\result.rs:906
This is a bug. Please report it at:
====================
stack backtrace:
0: 0x7ff704834812 - hid_error
1: 0x7ff704834cf3 - hid_error
2: 0x7ff70406b124 -
3: 0x7ff7049a5544 - hid_error
4: 0x7ff7049a53b9 - hid_error
5: 0x7ff7049a5292 - hid_error
6: 0x7ff7049a5200 - hid_error
7: 0x7ff7049ae86f - hid_error
8: 0x7ff7041ef8b1 -
9: 0x7ff7042c1bff -
10: 0x7ff704199f3d -
11: 0x7ff70419fdea -
12: 0x7ff7049a69d2 - hid_error
13: 0x7ff7041efa46 -
14: 0x7ff7049a33dc - hid_error
15: 0x7fff4f0a1fe4 - BaseThreadInitThunk
Thread 'IO Worker #2' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', src\libcore\result.rs:906
This is a bug. Please report it at:
====================
stack backtrace:
0: 0x7ff704834812 - hid_error
1: 0x7ff704834cf3 - hid_error
2: 0x7ff70406b124 -
3: 0x7ff7049a5544 - hid_error
4: 0x7ff7049a53b9 - hid_error
5: 0x7ff7049a5292 - hid_error
6: 0x7ff7049a5200 - hid_error
7: 0x7ff7049ae86f - hid_error
8: 0x7ff7041ef8b1 -
9: 0x7ff7042c1bff -
10: 0x7ff704199f3d -
11: 0x7ff70419fdea -
12: 0x7ff7049a69d2 - hid_error
13: 0x7ff7041efa46 -
14: 0x7ff7049a33dc - hid_error
15: 0x7fff4f0a1fe4 - BaseThreadInitThunk
Thread 'IO Worker #1' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', src\libcore\result.rs:906
This is a bug. Please report it at:
====================
stack backtrace:
0: 0x7ff704834812 - hid_error
1: 0x7ff704834cf3 - hid_error
2: 0x7ff70406b124 -
3: 0x7ff7049a5544 - hid_error
4: 0x7ff7049a53b9 - hid_error
5: 0x7ff7049a5292 - hid_error
6: 0x7ff7049a5200 - hid_error
7: 0x7ff7049ae86f - hid_error
8: 0x7ff7041ef8b1 -
9: 0x7ff7042c1bff -
10: 0x7ff704199f3d -
11: 0x7ff70419fdea -
12: 0x7ff7049a69d2 - hid_error
13: 0x7ff7041efa46 -
14: 0x7ff7049a33dc - hid_error
15: 0x7fff4f0a1fe4 - BaseThreadInitThunk
Thread 'IO Worker #3' panicked at 'DB flush failed.: "Corruption: block checksum mismatch"', src\libcore\result.rs:906
This is a bug. Please report it at:
I have been able to sync in the past week, but today it has stopped and crashes immediately once it reaches the final few blocks. I have tried manually deleting the blockchain and using "db kill" to no avail. Any help would be appreciated.
Expected behaviour is parity will sync and not force close.
This is reproducable by launching parity and letting it sync.
I have tried deleting all parity related files, registry keys and uninstalling using the uninstaller. I am attempting a fresh install on a seperate pc.
The text was updated successfully, but these errors were encountered: