Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grin 3.1.0 crash on start #3261

Closed
shrikus opened this issue Mar 7, 2020 · 11 comments
Closed

Grin 3.1.0 crash on start #3261

shrikus opened this issue Mar 7, 2020 · 11 comments
Assignees

Comments

@shrikus
Copy link

shrikus commented Mar 7, 2020

Just updated grin node from 3.1.0-beta.2 to 3.1.0 (amd64 release binaries), and it's crashing on start:

# gdb --args grin "server" "run"
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from grin...done.
(gdb) r
Starting program: /usr/local/bin/grin server run
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000555555d8a3cb in ra_portable_deserialize ()
(gdb) bt
#0  0x0000555555d8a3cb in ra_portable_deserialize ()
#1  0x0000555555d8a811 in roaring_bitmap_portable_deserialize_safe ()
#2  0x0000555555cc7f28 in grin_store::read_bitmap ()
#3  0x0000555555cc830d in grin_store::leaf_set::LeafSet::open ()
#4  0x0000555555d0b443 in grin_store::pmmr::PMMRBackend<T>::new ()
#5  0x0000555555c9437b in grin_chain::txhashset::txhashset::TxHashSet::open ()
#6  0x0000555555ce4bad in grin_chain::chain::Chain::init ()
#7  0x00005555558b9db1 in grin_servers::grin::server::Server::new ()
#8  0x0000555555686e0d in grin_servers::grin::server::Server::start ()
#9  0x00005555556b0e3d in grin::cmd::server::start_server_tui ()
#10 0x00005555556b1917 in grin::cmd::server::server_command ()
#11 0x00005555556be8b1 in grin::real_main ()
#12 0x00005555556bd5f6 in grin::main ()
#13 0x00005555556951d3 in std::rt::lang_start::{{closure}} ()
#14 0x0000555555ee1e23 in std::rt::lang_start_internal::{{closure}} () at src/libstd/rt.rs:52
#15 std::panicking::try::do_call () at src/libstd/panicking.rs:292
#16 0x0000555555ee7e7a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:78
#17 0x0000555555ee2870 in std::panicking::try () at src/libstd/panicking.rs:270
#18 std::panic::catch_unwind () at src/libstd/panic.rs:394
#19 std::rt::lang_start_internal () at src/libstd/rt.rs:51
#20 0x00005555556bf6d2 in main ()
@shrikus
Copy link
Author

shrikus commented Mar 7, 2020

tried to update on another server... from 3.0.0... same result:

0200307 15:06:20.553 INFO grin - This is Grin version 3.0.0 (git v3.0.0), built for x86_64-unknown-linux-gnu by rustc 1.39.0 (4560ea788 2019-11-04).
20200307 15:06:20.553 WARN grin::cmd::server - Starting GRIN w/o UI...
20200307 15:06:20.558 INFO grin_servers::grin::server - Starting server, genesis block: 40adad0aec27
20200307 15:06:26.034 INFO grin_servers::grin::server - Starting rest apis at: 127.0.0.1:3413
20200307 15:06:26.035 WARN grin_api::handlers - Starting HTTP Node APIs server at 127.0.0.1:3413.
20200307 15:06:26.037 WARN grin_api::handlers - HTTP Node listener started.
20200307 15:06:26.037 INFO grin_servers::grin::server - Starting dandelion monitor: 127.0.0.1:3413
20200307 15:06:26.038 WARN grin_servers::grin::server - Grin server started.
20200307 15:06:41.470 INFO grin - This is Grin version 3.1.0 (git v3.1.0), built for x86_64-unknown-linux-gnu by rustc 1.41.0 (5e1a79984 2020-01-27).
20200307 15:06:41.470 WARN grin::cmd::server - Starting GRIN w/o UI...
20200307 15:06:41.470 INFO grin_servers::grin::server - Starting server, genesis block: 40adad0aec27
Illegal instruction (core dumped)

@shrikus
Copy link
Author

shrikus commented Mar 12, 2020

Tried to delete chain_data, and run it with clean config... same results

Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-34-generic x86_64)
root@slava-test-pool:~/.grin/floo# gdb --args /root/grin/grin "--floonet"
GNU gdb (Ubuntu 8.2-0ubuntu1~18.04) 8.2
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /root/grin/grin...done.
(gdb) r
Starting program: /root/grin/grin --floonet
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
20200312 05:45:22.408 INFO grin_util::logger - log4rs is initialized, file level: Info, stdout level: Debug, min. level: Debug
20200312 05:45:22.408 INFO grin - Using configuration file at /mnt/hundred/.grin/floo/grin-server.toml
20200312 05:45:22.408 INFO grin - This is Grin version 3.1.0 (git v3.1.0), built for x86_64-unknown-linux-gnu by rustc 1.41.0 (5e1a79984 2020-01-27).
20200312 05:45:22.408 DEBUG grin - Built with profile "release", features "".
20200312 05:45:22.408 WARN grin::cmd::server - Starting GRIN w/o UI...
20200312 05:45:22.408 INFO grin_servers::grin::server - Starting server, genesis block: edc758c1370d
20200312 05:45:22.409 DEBUG grin_store::lmdb - DB Mapsize for /root/.grin/floo/chain_data/lmdb is 1048576
20200312 05:45:22.417 DEBUG grin_chain::txhashset::bitmap_accumulator - applied 1 chunks from idx 0 to idx 0 (0ms)
20200312 05:45:22.417 DEBUG grin_chain::txhashset::txhashset - attempting to open (empty) kernel PMMR using ProtocolVersion(2) - SUCCESS
20200312 05:45:22.417 INFO grin_store::lmdb - Resized database from 1048576 to 134217728
20200312 05:45:22.455 DEBUG grin_chain::txhashset::bitmap_accumulator - applied 1 chunks from idx 0 to idx 0 (0ms)

Program received signal SIGILL, Illegal instruction.
0x0000555555d89e11 in ra_portable_serialize ()
(gdb) bt
#0  0x0000555555d89e11 in ra_portable_serialize ()
#1  0x0000555555d1d54f in grin_store::save_via_temp_file ()
#2  0x0000555555d1e7ec in grin_store::leaf_set::LeafSet::flush ()
#3  0x0000555555d0ef85 in grin_store::pmmr::PMMRBackend<T>::sync ()
#4  0x0000555555ca2588 in grin_chain::txhashset::txhashset::extending ()
#5  0x0000555555ce6544 in grin_chain::chain::Chain::init ()
#6  0x00005555558b9db1 in grin_servers::grin::server::Server::new ()
#7  0x0000555555686e0d in grin_servers::grin::server::Server::start ()
#8  0x00005555556b0e3d in grin::cmd::server::start_server_tui ()
#9  0x00005555556b18ac in grin::cmd::server::server_command ()
#10 0x00005555556bee1a in grin::real_main ()
#11 0x00005555556bd5f6 in grin::main ()
#12 0x00005555556951d3 in std::rt::lang_start::{{closure}} ()
#13 0x0000555555ee1e23 in std::rt::lang_start_internal::{{closure}} () at src/libstd/rt.rs:52
#14 std::panicking::try::do_call () at src/libstd/panicking.rs:292
#15 0x0000555555ee7e7a in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:78
#16 0x0000555555ee2870 in std::panicking::try () at src/libstd/panicking.rs:270
#17 std::panic::catch_unwind () at src/libstd/panic.rs:394
#18 std::rt::lang_start_internal () at src/libstd/rt.rs:51
#19 0x00005555556bf6d2 in main ()

@shrikus
Copy link
Author

shrikus commented Mar 12, 2020

Rebuilded it from git (533da2d rev) and it's not crash anymore...

20200312 06:22:40.356 INFO grin_util::logger - log4rs is initialized, file level: Info, stdout level: Debug, min. level: Debug
20200312 06:22:40.356 INFO grin - Using configuration file at /mnt/hundred/.grin/floo/grin-server.toml
20200312 06:22:40.357 INFO grin - This is Grin version 3.1.0 (git v3.1.0), built for x86_64-unknown-linux-gnu by rustc 1.40.0 (73528e339 2019-12-16).
20200312 06:22:40.357 WARN grin::cmd::server - Starting GRIN w/o UI...
20200312 06:22:40.357 INFO grin_servers::grin::server - Starting server, genesis block: edc758c1370d
20200312 06:22:40.444 INFO grin_servers::grin::server - Starting rest apis at: 46.101.93.177:13413
20200312 06:22:40.445 WARN grin_api::handlers - Starting HTTP Node APIs server at 46.101.93.177:13413.
20200312 06:22:40.445 WARN grin_api::handlers - HTTP Node listener started.
20200312 06:22:40.446 INFO grin_servers::grin::server - Starting dandelion monitor: 46.101.93.177:13413
20200312 06:22:40.446 WARN grin_servers::grin::server - Grin server started.
20200312 06:22:40.446 INFO grin_servers::common::types - DandelionEpoch: next_epoch: is_stem: true (90%), relay: None
20200312 06:22:40.446 INFO grin_servers::mining::stratumserver - (Server ID: 0) Starting stratum server with edge_bits = 31, proof_size = 42
20200312 06:22:40.447 WARN grin_servers::mining::stratumserver - Stratum server started on 127.0.0.1:3416
20200312 06:22:40.447 INFO grin_servers::mining::stratumserver - Start tokio stratum server

So i think there is a problem with x86_64 release binaries.

@jaspervdm jaspervdm self-assigned this Mar 17, 2020
@de1acr0ix
Copy link
Contributor

de1acr0ix commented Mar 22, 2020

I think this is similar to #2494 and #2519.

@shrikus
Copy link
Author

shrikus commented Mar 22, 2020

previous binaries worked well.
cat /proc/cpuinfo showing available avx2 on all our servers.

and we didn't have this problem prior to 3.1.0.

processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz
stepping	: 4
microcode	: 0x2000064
cpu MHz		: 1200.006
cache size	: 11264 KB
physical id	: 0
siblings	: 16
core id		: 7
cpu cores	: 8
apicid		: 15
initial apicid	: 15
fpu		: yes
fpu_exception	: yes
cpuid level	: 22
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req md_clear flush_l1d
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 7400.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

@shrikus
Copy link
Author

shrikus commented Mar 22, 2020

and yeah, it's crashing on:

vmovdqa64 0x26ec2b(%rip),%zmm0

@jaspervdm
Copy link
Contributor

I have a fix for it, just waiting for it to be merged into croaring-rs. If anyone would like to try, you can test this binary

@shrikus
Copy link
Author

shrikus commented Mar 22, 2020

yeah, this one not crashing.

@jaspervdm
Copy link
Contributor

We have just released a new beta version: https://github.com/mimblewimble/grin/releases/tag/v3.1.1-beta.1
Could you confirm the binaries fix the issue for you on linux? If so we can release a v3.1.1

@shrikus
Copy link
Author

shrikus commented Mar 26, 2020

yeah. 3.1.1-beta.1 working well, not crashing.

@jaspervdm
Copy link
Contributor

Thanks for checking! If the problem re-appears in the future, please open a new issue and we will investigate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants