Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing with DaemonError: {'code': -32700, 'message': 'Parse error'}. (RPC_PARSE_ERROR) #238

Open
freerko opened this issue Nov 25, 2023 · 12 comments

Comments

@freerko
Copy link

freerko commented Nov 25, 2023

Hi, I am running the newest commit on Ubuntu 22.04 LTS and ElectrumX crashed with this "Fatal error on SSL transport" and "Bad file descriptor" error. Afterwards it was possible to start it again without issues, I hope there are no corrupt files. Maybe issue #22 was similar?

Nov 24 20:30:20 server electrumx_server[64013]: INFO:ElectrumX:[30450] TCP xx.xx.xx.xx:xx, 192 total
Nov 24 20:30:21 server electrumx_server[64013]: INFO:ElectrumX:[30451] SSL xx.xx.xx.xx:xx, 193 total
Nov 24 20:30:21 server electrumx_server[64013]: INFO:ElectrumX:[30452] SSL xx.xx.xx.xx:xx, 193 total
Nov 24 20:30:22 server electrumx_server[64013]: INFO:ElectrumX:[30453] SSL xx.xx.xx.xx:xx, 194 total
Nov 24 20:30:23 server electrumx_server[64013]: INFO:ElectrumX:[30454] SSL xx.xx.xx.xx:xx, 193 total
Nov 24 20:30:24 server electrumx_server[64013]: INFO:ElectrumX:[30455] SSL xx.xx.xx.xx:xx, 194 total
Nov 24 20:30:24 server electrumx_server[64013]: INFO:ElectrumX:[30455] client requested future protocol version 1.5.1 - is your software out of date?
Nov 24 20:30:24 server electrumx_server[64013]: INFO:Prefetcher:cancelled; prefetcher stopping
Nov 24 20:30:24 server electrumx_server[64013]: INFO:BlockProcessor:flushing to DB for a clean shutdown...
Nov 24 20:30:24 server electrumx_server[64013]: INFO:SessionManager:closing down server for wss://all_interfaces:50004
Nov 24 20:30:24 server electrumx_server[64013]: INFO:SessionManager:closing down server for rpc://localhost:8000
Nov 24 20:30:24 server electrumx_server[64013]: INFO:SessionManager:closing down server for tcp://all_interfaces:50001
Nov 24 20:30:24 server electrumx_server[64013]: INFO:SessionManager:closing down server for ssl://all_interfaces:50002
Nov 24 20:30:24 server electrumx_server[64013]: INFO:ElectrumX:[30373] disconnected whilst throttled
Nov 24 20:30:25 server electrumx_server[64013]: INFO:Controller:shutting down
Nov 24 20:30:25 server electrumx_server[64013]: INFO:Controller:shutdown complete
Nov 24 20:30:25 server electrumx_server[64013]: ERROR:electrumx:ElectrumX server terminated abnormally
Nov 24 20:30:25 server electrumx_server[64013]: Traceback (most recent call last):
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/EGG-INFO/scripts/electrumx_server", line 35, in main
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
Nov 24 20:30:25 server electrumx_server[64013]:     return loop.run_until_complete(main)
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
Nov 24 20:30:25 server electrumx_server[64013]:     return future.result()
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/server_base.py", line 129, in run
Nov 24 20:30:25 server electrumx_server[64013]:     await server_task
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/server_base.py", line 102, in serve
Nov 24 20:30:25 server electrumx_server[64013]:     await self.serve(shutdown_event)
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/controller.py", line 134, in serve
Nov 24 20:30:25 server electrumx_server[64013]:     async with OldTaskGroup() as group:
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/aiorpcx/curio.py", line 297, in __aexit__
Nov 24 20:30:25 server electrumx_server[64013]:     await self.join()
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/util.py", line 370, in join
Nov 24 20:30:25 server electrumx_server[64013]:     task.result()
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/mempool.py", line 377, in keep_synchronized
Nov 24 20:30:25 server electrumx_server[64013]:     async with OldTaskGroup() as group:
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/aiorpcx/curio.py", line 297, in __aexit__
Nov 24 20:30:25 server electrumx_server[64013]:     await self.join()
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/util.py", line 370, in join
Nov 24 20:30:25 server electrumx_server[64013]:     task.result()
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/mempool.py", line 259, in _refresh_hashes
Nov 24 20:30:25 server electrumx_server[64013]:     hex_hashes = await self.api.mempool_hashes()
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 248, in mempool_hashes
Nov 24 20:30:25 server electrumx_server[64013]:     return await self._send_single('getrawmempool')
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 191, in _send_single
Nov 24 20:30:25 server electrumx_server[64013]:     return await self._send(payload, processor)
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 150, in _send
Nov 24 20:30:25 server electrumx_server[64013]:     result = processor(result)
Nov 24 20:30:25 server electrumx_server[64013]:   File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 186, in processor
Nov 24 20:30:25 server electrumx_server[64013]:     raise DaemonError(err)
Nov 24 20:30:25 server electrumx_server[64013]: electrumx.server.daemon.DaemonError: {'code': -32700, 'message': 'Parse error'}
Nov 24 20:30:26 server electrumx_server[64013]: Fatal error on SSL transport
Nov 24 20:30:26 server electrumx_server[64013]: protocol: <asyncio.sslproto.SSLProtocol object at 0x7f87d62b2980>
Nov 24 20:30:26 server electrumx_server[64013]: transport: <_SelectorSocketTransport closing fd=194>
Nov 24 20:30:26 server electrumx_server[64013]: Traceback (most recent call last):
Nov 24 20:30:26 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/selector_events.py", line 924, in write
Nov 24 20:30:26 server electrumx_server[64013]:     n = self._sock.send(data)
Nov 24 20:30:26 server electrumx_server[64013]: OSError: [Errno 9] Bad file descriptor
Nov 24 20:30:26 server electrumx_server[64013]: During handling of the above exception, another exception occurred:
Nov 24 20:30:26 server electrumx_server[64013]: Traceback (most recent call last):
Nov 24 20:30:26 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/sslproto.py", line 690, in _process_write_backlog
Nov 24 20:30:26 server electrumx_server[64013]:     self._transport.write(chunk)
Nov 24 20:30:26 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/selector_events.py", line 930, in write
Nov 24 20:30:26 server electrumx_server[64013]:     self._fatal_error(exc, 'Fatal write error on socket transport')
Nov 24 20:30:26 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/selector_events.py", line 725, in _fatal_error
Nov 24 20:30:26 server electrumx_server[64013]:     self._force_close(exc)
Nov 24 20:30:26 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/selector_events.py", line 737, in _force_close
Nov 24 20:30:26 server electrumx_server[64013]:     self._loop.call_soon(self._call_connection_lost, exc)
Nov 24 20:30:26 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
Nov 24 20:30:26 server electrumx_server[64013]:     self._check_closed()
Nov 24 20:30:26 server electrumx_server[64013]:   File "/usr/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
Nov 24 20:30:26 server electrumx_server[64013]:     raise RuntimeError('Event loop is closed')
Nov 24 20:30:26 server electrumx_server[64013]: RuntimeError: Event loop is closed
Nov 24 20:30:27 server systemd[1]: electrumx.service: Deactivated successfully.
Nov 24 20:30:27 server systemd[1]: electrumx.service: Consumed 1h 16min 43.903s CPU time.
@SomberNight
Copy link
Member

It is the first trace that is more interesting.

  File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 248, in mempool_hashes
    return await self._send_single('getrawmempool')
  File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 191, in _send_single
    return await self._send(payload, processor)
  File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 150, in _send
    result = processor(result)
  File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 186, in processor
    raise DaemonError(err)
electrumx.server.daemon.DaemonError: {'code': -32700, 'message': 'Parse error'}

Looks like e-x sent a getrawmempool RPC to bitcoind, and bitcoind responded with an RPC_PARSE_ERROR.
This caused e-x to shutdown. (and then while shutting down, the other trace likely happened due to a race)

What version of bitcoind do you use? How are the two processes configured, e.g. is bitcoind on the same machine as electrumx?

@SomberNight
Copy link
Member

Can you check the bitcoind logs around the same time the crash happened? Perhaps it logged something more about the parse error.

@freerko
Copy link
Author

freerko commented Nov 25, 2023

Its bitcoind 25.1.0 locally on the same machine configured with rpcuser/rpcpassword and rpcworkqueue of 250. Nothing special in the bitcoind logs during that time, I guess:

2023-11-24T20:10:17Z New outbound peer connected: version: 70016, blocks=818307, peer=528964 (block-relay-only)
2023-11-24T20:16:35Z New outbound peer connected: version: 70015, blocks=818307, peer=532813 (block-relay-only)
2023-11-24T20:18:18Z Flushed fee estimates to fee_estimates.dat.
2023-11-24T20:30:11Z Saw new header hash=000000000000000000001ba08686151462f26ba16563503d5ce6822f91c148c4 height=818308
2023-11-24T20:30:12Z UpdateTip: new best=000000000000000000001ba08686151462f26ba16563503d5ce6822f91c148c4 height=818308 version=0x2000c000 log2_work=94.559874 tx=925364346 date='2023-
11-24T20:29:44Z' progress=1.000000 cache=247.4MiB(1705214txo)
2023-11-24T20:33:50Z New outbound peer connected: version: 70016, blocks=818308, peer=543617 (block-relay-only)
2023-11-24T20:34:53Z Saw new header hash=0000000000000000000448495577f0d417a7d73a82001f67c879ac8155adb229 height=818309
2023-11-24T20:34:54Z UpdateTip: new best=0000000000000000000448495577f0d417a7d73a82001f67c879ac8155adb229 height=818309 version=0x2fd36000 log2_work=94.559888 tx=925366081 date='2023-
11-24T20:34:40Z' progress=1.000000 cache=250.3MiB(1728117txo)

Something else seems broken: With "electrumx_rpc getinfo" I get for both daemon height and db height the current latest blockchain height, but when I connect with Electrum to my server it tells me my server is lagging 30 blocks and https://1209k.com/bitcoin-eye/ele.php reports it as BEHIND as well.

@SomberNight
Copy link
Member

electrumx.server.daemon.DaemonError: {'code': -32700, 'message': 'Parse error'}

Did this error only happen once? Or does it reproduce somehow?
Let me know if it happens again, we could add a special-case for it to wait a bit and retry, and perhaps more importantly, to log more details when it happens.

Something else seems broken: With "electrumx_rpc getinfo" I get for both daemon height and db height the current latest blockchain height, but when I connect with Electrum to my server it tells me my server is lagging 30 blocks

Is this still the case or did the server recover? Is the process at 100% cpu usage?

Its bitcoind 25.1.0 locally on the same machine configured with rpcuser/rpcpassword and rpcworkqueue of 250.

How beefy is the hardware? Is there anything else connected to bitcoind besides electrumx?

@smmalis37
Copy link
Contributor

I have also just started seeing the height mismatch on my server, cpu usage is fine and I don't see any other errors though.

@freerko
Copy link
Author

freerko commented Nov 27, 2023

The parse error happened afterwards again and before as well, I have 5 findings in my syslog (in the old .gz ones it didn't happen):

root@server:# grep Parse /var/log/syslog*
/var/log/syslog:Nov 27 16:48:00 server electrumx_server[104490]: INFO:Prefetcher:ignoring daemon error: {'code': -32700, 'message': 'Parse error'}
/var/log/syslog.1:Nov 21 14:58:50 server electrumx_server[25828]: INFO:Prefetcher:ignoring daemon error: {'code': -32700, 'message': 'Parse error'}
/var/log/syslog.1:Nov 24 04:59:09 server electrumx_server[57701]: electrumx.server.daemon.DaemonError: {'code': -32700, 'message': 'Parse error'}
/var/log/syslog.1:Nov 24 19:53:43 server electrumx_server[64013]: INFO:Prefetcher:ignoring daemon error: {'code': -32700, 'message': 'Parse error'}
/var/log/syslog.1:Nov 24 20:30:25 server electrumx_server[64013]: electrumx.server.daemon.DaemonError: {'code': -32700, 'message': 'Parse error'}
root@server:#

The one on Nov 27 16:48:00 was just that single line, nothing else, no crash. On the other parse errors the server did crash another time (Nov 24 04:59) but on Nov 21 14:58 and Nov 24 19:53 it didn't crash. I've looked into Bitcoins debug.log at the times of all 5 Parse errors but never saw something special.

Nov 21 14:58:49 server electrumx_server[25828]: INFO:ElectrumX:[32041] SSL xx.xx.xx.xx:xx, 204 total
Nov 21 14:58:49 server electrumx_server[25828]: INFO:ElectrumX:[32042] SSL xx.xx.xx.xx:xx, 203 total
Nov 21 14:58:50 server electrumx_server[25828]: INFO:Prefetcher:ignoring daemon error: {'code': -32700, 'message': 'Parse error'}
Nov 21 14:58:50 server electrumx_server[25828]: ERROR:Daemon:daemon service refused:
Nov 21 14:58:50 server electrumx_server[25828]: <TITLE/>400 Bad Request</TITLE>
Nov 21 14:58:50 server electrumx_server[25828]: </HEAD/>
Nov 21 14:58:50 server electrumx_server[25828]: <H 1>Bad Request</H 1>
Nov 21 14:58:50 server electrumx_server[25828]: </BO DY></H TML>. Retrying occasionally...
Nov 21 14:58:50 server electrumx_server[25828]: INFO:Daemon:running normally
Nov 21 14:58:50 server electrumx_server[25828]: INFO:ElectrumX:[32043] SSL xx.xx.xx.xx:xx, 204 total
Nov 21 14:58:51 server electrumx_server[25828]: INFO:ElectrumX:[32044] SSL xx.xx.xx.xx:xx, 205 total
Nov 21 14:58:51 server electrumx_server[25828]: INFO:ElectrumX:[32045] SSL xx.xx.xx.xx:xx, 206 total

On Nov 21 I still had Bitcoin version 0.24 running.
Next one:

Nov 24 04:59:06 server electrumx_server[57701]: INFO:ElectrumX:[37162] SSL xx.xx.xx.xx:xx, 178 total
Nov 24 04:59:07 server electrumx_server[57701]: INFO:ElectrumX:[37163] TCP xx.xx.xx.xx:xx, 174 total
Nov 24 04:59:07 server electrumx_server[57701]: INFO:ElectrumX:[37164] SSL xx.xx.xx.xx:xx, 175 total
Nov 24 04:59:08 server electrumx_server[57701]: INFO:ElectrumX:[37165] SSL xx.xx.xx.xx:xx, 176 total
Nov 24 04:59:08 server electrumx_server[57701]: INFO:Prefetcher:cancelled; prefetcher stopping
Nov 24 04:59:08 server electrumx_server[57701]: INFO:BlockProcessor:flushing to DB for a clean shutdown...
Nov 24 04:59:08 server electrumx_server[57701]: INFO:SessionManager:closing down server for ssl://all_interfaces:50002
Nov 24 04:59:08 server electrumx_server[57701]: INFO:SessionManager:closing down server for tcp://all_interfaces:50001
Nov 24 04:59:08 server electrumx_server[57701]: INFO:SessionManager:closing down server for rpc://localhost:8000
Nov 24 04:59:08 server electrumx_server[57701]: INFO:SessionManager:closing down server for wss://all_interfaces:50004
Nov 24 04:59:08 server electrumx_server[57701]: INFO:ElectrumX:[37166] SSL xx.xx.xx.xx:xx, 83 total
Nov 24 04:59:08 server electrumx_server[57701]: INFO:ElectrumX:[37167] SSL xx.xx.xx.xx:xx, 18 total
Nov 24 04:59:09 server electrumx_server[57701]: INFO:ElectrumX:[36267] disconnected whilst throttled
Nov 24 04:59:09 server electrumx_server[57701]: INFO:ElectrumX:[36281] disconnected whilst throttled
Nov 24 04:59:09 server electrumx_server[57701]: INFO:ElectrumX:[36355] disconnected whilst throttled
Nov 24 04:59:09 server electrumx_server[57701]: INFO:Controller:shutting down
Nov 24 04:59:09 server electrumx_server[57701]: INFO:Controller:shutdown complete
Nov 24 04:59:09 server electrumx_server[57701]: ERROR:electrumx:ElectrumX server terminated abnormally
Nov 24 04:59:09 server electrumx_server[57701]: Traceback (most recent call last):
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/EGG-INFO/scripts/electrumx_server", line 35, in main
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
Nov 24 04:59:09 server electrumx_server[57701]: return loop.run_until_complete(main)
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
Nov 24 04:59:09 server electrumx_server[57701]: return future.result()
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/server_base.py", line 129, in run
Nov 24 04:59:09 server electrumx_server[57701]: await server_task
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/server_base.py", line 102, in serve
Nov 24 04:59:09 server electrumx_server[57701]: await self.serve(shutdown_event)
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/controller.py", line 134, in serve
Nov 24 04:59:09 server electrumx_server[57701]: async with OldTaskGroup() as group:
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/aiorpcx/curio.py", line 297, in aexit
Nov 24 04:59:09 server electrumx_server[57701]: await self.join()
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/util.py", line 370, in join
Nov 24 04:59:09 server electrumx_server[57701]: task.result()
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/mempool.py", line 377, in keep_synchronized
Nov 24 04:59:09 server electrumx_server[57701]: async with OldTaskGroup() as group:
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/aiorpcx/curio.py", line 297, in aexit
Nov 24 04:59:09 server electrumx_server[57701]: await self.join()
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/lib/util.py", line 370, in join
Nov 24 04:59:09 server electrumx_server[57701]: task.result()
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/mempool.py", line 259, in _refresh_hashes
Nov 24 04:59:09 server electrumx_server[57701]: hex_hashes = await self.api.mempool_hashes()
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 248, in mempool_hashes
Nov 24 04:59:09 server electrumx_server[57701]: return await self._send_single('getrawmempool')
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 191, in _send_single
Nov 24 04:59:09 server electrumx_server[57701]: return await self._send(payload, processor)
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 150, in _send
Nov 24 04:59:09 server electrumx_server[57701]: result = processor(result)
Nov 24 04:59:09 server electrumx_server[57701]: File "/usr/local/lib/python3.10/dist-packages/e_x-1.16.0-py3.10.egg/electrumx/server/daemon.py", line 186, in processor
Nov 24 04:59:09 server electrumx_server[57701]: raise DaemonError(err)
Nov 24 04:59:09 server electrumx_server[57701]: electrumx.server.daemon.DaemonError: {'code': -32700, 'message': 'Parse error'}
Nov 24 04:59:11 server systemd[1]: electrumx.service: Deactivated successfully.
Nov 24 04:59:11 server systemd[1]: electrumx.service: Consumed 1h 14min 12.093s CPU time.
Nov 24 15:47:11 server systemd[1]: Started Electrumx.
Nov 24 15:47:12 server electrumx_server[64013]: INFO:electrumx:ElectrumX server starting

On Nov 24 I already had Bitcoin version 0.25. Next one:

Nov 24 19:53:35 server electrumx_server[64013]: INFO:ElectrumX:[26623] SSL xx.xx.xx.xx:xx, 197 total
Nov 24 19:53:36 server electrumx_server[64013]: INFO:ElectrumX:[26624] SSL xx.xx.xx.xx:xx, 198 total
Nov 24 19:53:36 server electrumx_server[64013]: INFO:ElectrumX:[26625] SSL xx.xx.xx.xx:xx, 199 total
Nov 24 19:53:39 server electrumx_server[64013]: INFO:ElectrumX:[26626] SSL xx.xx.xx.xx:xx, 198 total
Nov 24 19:53:41 server electrumx_server[64013]: INFO:ElectrumX:[26627] SSL xx.xx.xx.xx:xx, 198 total
Nov 24 19:53:41 server electrumx_server[64013]: INFO:ElectrumX:[26628] SSL xx.xx.xx.xx:xx, 199 total
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[6031] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[13921] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[15866] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[16567] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[16745] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[18720] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[21444] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[25922] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[26350] changing task concurrency from 10 to 0
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[26629] TCP xx.xx.xx.xx:xx, 198 total
Nov 24 19:53:42 server electrumx_server[64013]: INFO:ElectrumX:[26630] SSL xx.xx.xx.xx:xx, 199 total
Nov 24 19:53:43 server electrumx_server[64013]: INFO:SessionManager:closing stale session ids 22163, 25412
Nov 24 19:53:43 server electrumx_server[64013]: INFO:ElectrumX:[26631] SSL xx.xx.xx.xx:xx, 197 total
Nov 24 19:53:43 server electrumx_server[64013]: INFO:ElectrumX:[26632] SSL xx.xx.xx.xx:xx, 198 total
Nov 24 19:53:43 server electrumx_server[64013]: INFO:Prefetcher:ignoring daemon error: {'code': -32700, 'message': 'Parse error'}
Nov 24 19:53:43 server electrumx_server[64013]: ERROR:Daemon:daemon service refused:
Nov 24 19:53:43 server electrumx_server[64013]: 400 Bad Request</TIT LE>
Nov 24 19:53:43 server electrumx_server[64013]: </HE AD>
Nov 24 19:53:43 server electrumx_server[64013]: <H 1>Bad Request</H 1>
Nov 24 19:53:43 server electrumx_server[64013]: </BO DY></HT ML>. Retrying occasionally...
Nov 24 19:53:43 server electrumx_server[64013]: INFO:ElectrumX:[26633] SSL xx.xx.xx.xx:xx, 199 total
Nov 24 19:53:43 server electrumx_server[64013]: INFO:ElectrumX:[26634] SSL xx.xx.xx.xx:xx, 200 total
Nov 24 19:53:43 server electrumx_server[64013]: INFO:Daemon:running normally
Nov 24 19:53:44 server electrumx_server[64013]: INFO:ElectrumX:[25922] closing session over res usage. ip: 185.220.101.xxx. groups: ['185.220.101.0/24']
Nov 24 19:53:44 server electrumx_server[64013]: INFO:MemPool:91,011 txs 72.94 MB touching 389,616 addresses
Nov 24 19:53:44 server electrumx_server[64013]: INFO:ElectrumX:[25922] disconnected whilst throttled
Nov 24 19:53:45 server electrumx_server[64013]: INFO:ElectrumX:[26635] SSL xx.xx.xx.xx:xx, 193 total
Nov 24 19:53:45 server electrumx_server[64013]: INFO:ElectrumX:[26636] SSL xx.xx.xx.xx:xx, 194 total
Nov 24 19:53:46 server electrumx_server[64013]: INFO:ElectrumX:[26637] SSL xx.xx.xx.xx:xx, 193 total
Nov 24 19:53:46 server electrumx_server[64013]: INFO:ElectrumX:[26638] SSL xx.xx.xx.xx:xx, 194 total
Nov 24 19:53:46 server electrumx_server[64013]: INFO:ElectrumX:[26639] SSL xx.xx.xx.xx:xx, 194 total

The other issue: Electrumx apparently didn't parse any new blocks at all and after a day or so I restarted it and now they are up to date again. During the lag I didn't notice any high CPU usage nor RAM or I/O.
Its a VM with 30GB RAM, 8 Cores AMD EPYC 7282 and 800GB SSD and only bitcoind, tor and electrumx are running. It transfers about 700-800GB (total, rx+tx) per day. Tor consumes about 50% of one core, bitcoind about 50% of one core and electrum about 15-30% of one core. I never see IO wait.

Edit: Electrumx stopped processing blocks again without anything interesting in the logfile during that time. Block processing usually takes between 1s and 11s on this system and there is no high CPU nor IO load. In the electrumx log I see about 10 "timed out after 30 secs" messages for transaction.get or transaction.broadcast per minute - its the about the same amount in the beginning when still processing blocks and afterwards without processing blocks. Currently "electrumx_rpc getinfo" gives me:

"daemon height": 818745,
"db height": 818735,

Where 818735 is the height reported when connecting via Electrum as well. At the same time "bitcoind getblockcount" on that system reports the correct current block 818756. With "bitcoin-cli getrpcinfo" I see only the active getrpcinfo call, nothing else.

SomberNight added a commit that referenced this issue Nov 28, 2023
see #238

So far it is unclear why bitcoind sends RPC_PARSE_ERROR, but
instead of doing a full shutdown, let's log the error(+request) and retry.
@SomberNight
Copy link
Member

The parse error happened afterwards again and before as well, I have 5 findings in my syslog (in the old .gz ones it didn't happen)

Ok, so it was not a one-off for you then. In that case, I've pushed a small change, please try with that: 3148936

The other issue: Electrumx apparently didn't parse any new blocks at all and after a day or so I restarted it and now they are up to date again. During the lag I didn't notice any high CPU usage nor RAM or I/O.
[...]
Edit: Electrumx stopped processing blocks again without anything interesting in the logfile during that time.

Perhaps this is not related at all then.

Still I am interested in what will get logged re RPC_PARSE_ERROR with the new commit.

@SomberNight SomberNight changed the title Crashing with "Fatal error on SSL transport" and "Bad file descriptor" Crashing with DaemonError: {'code': -32700, 'message': 'Parse error'}. (RPC_PARSE_ERROR) Nov 28, 2023
@freerko
Copy link
Author

freerko commented Nov 28, 2023

Thanks! I am now running the current main branch.

@biatwc
Copy link

biatwc commented Dec 13, 2023

I have seen similar on an LTC electrumx, I tried to get the full aiohttp error with rich.console which I've attached.
In my case the LTC RPC node is on a separate VM and a wireguard tunnel linking the electrumx (there are 4 electrumx all using the same LTC RPC node - I've increased rpcworkqueue and rpcthreads from the defaults). What tends to happen is the prefetcher stops after a parse error and so electrumx falls behind tip and loses peers.

I have seen the prefetcher stopping without a parse error, it will just stop and I don't see any error around that happening to suggest why. Is there something that could be done to make the prefetcher more resilient when the RPC node isn't located on the same VM or LAN segment?

The attachment should be read bottom to top.
err_out.txt


edit: attachment inlined and reversed:

log
Dec 08 11:20: INFO:Prefetcher:ignoring daemon error: {'code': -32700, 'message': 'Parse error'}
Dec 08 11:20: ERROR:Daemon:daemon service refused: <HTML><HEAD>
Dec 08 11:20: <TITLE>400 Bad Request</TITLE>
Dec 08 11:20: </HEAD><BODY>
Dec 08 11:20: <H1>Bad Request</H1>
Dec 08 11:20: </BODY></HTML>.  Retrying occasionally...
Dec 08 11:20: ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
Dec 08 11:20: │ /opt/electrumx/env/lib64/python3.9/site-packages/aiohttp/client_reqrep.p │
Dec 08 11:20: │ y:609 in write_bytes                                                         │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │    606 │   │                                                                 │
Dec 08 11:20: │    607 │   │   return True                                                   │
Dec 08 11:20: │    608 │                                                                     │
Dec 08 11:20: │ ❱  609 │   async def write_bytes(                                            │
Dec 08 11:20: │    610 │   │   self, writer: AbstractStreamWriter, conn: "Connection"        │
Dec 08 11:20: │    611 │   ) -> None:                                                        │
Dec 08 11:20: │    612 │   │   """Support coroutines that yields bytes objects."""           │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │   conn = Connection<ConnectionKey(host='192.168.0.2', port=9332,         │ │
Dec 08 11:20: │ │          is_ssl=False, ssl=None, proxy=None, proxy_auth=None,            │ │
Dec 08 11:20: │ │          proxy_headers_hash=None)>                                       │ │
Dec 08 11:20: │ │   self = <aiohttp.client_reqrep.ClientRequest object at 0x7f28267b50a0>  │ │
Dec 08 11:20: │ │ writer = <aiohttp.http_writer.StreamWriter object at 0x7f28267b5e20>     │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: ╰──────────────────────────────────────────────────────────────────────────────╯
Dec 08 11:20: CancelledError
Dec 08 11:20: During handling of the above exception, another exception occurred:
Dec 08 11:20: ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
Dec 08 11:20: │ /opt/electrumx/env/lib/python3.9/site-packages/electrumx/server/block_pr │
Dec 08 11:20: │ ocessor.py:71 in main_loop                                                   │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │    68 │   │   │   try:                                                       │
Dec 08 11:20: │    69 │   │   │   │   # Sleep a while if there is nothing to prefetch        │
Dec 08 11:20: │    70 │   │   │   │   await self.refill_event.wait()                         │
Dec 08 11:20: │ ❱  71 │   │   │   │   if not await self._prefetch_blocks():                  │
Dec 08 11:20: │    72 │   │   │   │   │   await asyncio.sleep(self.polling_delay)            │
Dec 08 11:20: │    73 │   │   │   except DaemonError as e:                                   │
Dec 08 11:20: │    74 │   │   │   │   self.logger.info(f'ignoring daemon error: {e}')        │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │ bp_height = 2589692                                                      │ │
Dec 08 11:20: │ │   console = <console width=80 None>                                      │ │
Dec 08 11:20: │ │         e = CancelledError()                                             │ │
Dec 08 11:20: │ │      self = <electrumx.server.block_processor.Prefetcher object at       │ │
Dec 08 11:20: │ │             0x7f2859fa26d0>                                              │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib/python3.9/site-packages/electrumx/server/block_pr │
Dec 08 11:20: │ ocessor.py:120 in _prefetch_blocks                                           │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   117 │   │   Repeats until the queue is full or caught up.                  │
Dec 08 11:20: │   118 │   │   '''                                                            │
Dec 08 11:20: │   119 │   │   daemon = self.daemon                                           │
Dec 08 11:20: │ ❱ 120 │   │   daemon_height = await daemon.height()                          │
Dec 08 11:20: │   121 │   │   async with self.semaphore:                                     │
Dec 08 11:20: │   122 │   │   │   while self.cache_size < self.min_cache_size:               │
Dec 08 11:20: │   123 │   │   │   │   first = self.fetched_height + 1                        │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │ daemon = <electrumx.server.daemon.Daemon object at 0x7f2859f8b700>       │ │
Dec 08 11:20: │ │   self = <electrumx.server.block_processor.Prefetcher object at          │ │
Dec 08 11:20: │ │          0x7f2859fa26d0>                                                 │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib/python3.9/site-packages/electrumx/server/daemon.p │
Dec 08 11:20: │ y:302 in height                                                              │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   299 │                                                                      │
Dec 08 11:20: │   300 │   async def height(self):                                            │
Dec 08 11:20: │   301 │   │   '''Query the daemon for its current height.'''                 │
Dec 08 11:20: │ ❱ 302 │   │   self._height = await self._send_single('getblockcount')        │
Dec 08 11:20: │   303 │   │   return self._height                                            │
Dec 08 11:20: │   304 │                                                                      │
Dec 08 11:20: │   305 │   def cached_height(self):                                           │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────── locals ─────────────────────────────╮         │
Dec 08 11:20: │ │ self = <electrumx.server.daemon.Daemon object at 0x7f2859f8b700> │         │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────╯         │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib/python3.9/site-packages/electrumx/server/daemon.p │
Dec 08 11:20: │ y:191 in _send_single                                                        │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   188 │   │   payload = {'method': method, 'id': next(self.id_counter)}      │
Dec 08 11:20: │   189 │   │   if params:                                                     │
Dec 08 11:20: │   190 │   │   │   payload['params'] = params                                 │
Dec 08 11:20: │ ❱ 191 │   │   return await self._send(payload, processor)                    │
Dec 08 11:20: │   192 │                                                                      │
Dec 08 11:20: │   193 │   async def _send_vector(self, method, params_iterable, replace_errs │
Dec 08 11:20: │   194 │   │   '''Send several requests of the same method.                   │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │    method = 'getblockcount'                                              │ │
Dec 08 11:20: │ │    params = None                                                         │ │
Dec 08 11:20: │ │   payload = {'method': 'getblockcount', 'id': 75371}                     │ │
Dec 08 11:20: │ │ processor = <function Daemon._send_single.<locals>.processor at          │ │
Dec 08 11:20: │ │             0x7f2842a668b0>                                              │ │
Dec 08 11:20: │ │      self = <electrumx.server.daemon.Daemon object at 0x7f2859f8b700>    │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib/python3.9/site-packages/electrumx/server/daemon.p │
Dec 08 11:20: │ y:149 in _send                                                               │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   146 │   │   retry = self.init_retry                                        │
Dec 08 11:20: │   147 │   │   while True:                                                    │
Dec 08 11:20: │   148 │   │   │   try:                                                       │
Dec 08 11:20: │ ❱ 149 │   │   │   │   result = await self._send_data(data)                   │
Dec 08 11:20: │   150 │   │   │   │   result = processor(result)                             │
Dec 08 11:20: │   151 │   │   │   │   if on_good_message:                                    │
Dec 08 11:20: │   152 │   │   │   │   │   self.logger.info(on_good_message)                  │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │            data = '{"method": "getblockcount", "id": 75371}'             │ │
Dec 08 11:20: │ │  last_error_log = 1776533.851677362                                      │ │
Dec 08 11:20: │ │       log_error = <function Daemon._send.<locals>.log_error at           │ │
Dec 08 11:20: │ │                   0x7f2842a66550>                                        │ │
Dec 08 11:20: │ │ on_good_message = 'running normally'                                     │ │
Dec 08 11:20: │ │         payload = {'method': 'getblockcount', 'id': 75371}               │ │
Dec 08 11:20: │ │       processor = <function Daemon._send_single.<locals>.processor at    │ │
Dec 08 11:20: │ │                   0x7f2842a668b0>                                        │ │
Dec 08 11:20: │ │           retry = 0.5                                                    │ │
Dec 08 11:20: │ │            self = <electrumx.server.daemon.Daemon object at              │ │
Dec 08 11:20: │ │                   0x7f2859f8b700>                                        │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib/python3.9/site-packages/electrumx/server/daemon.p │
Dec 08 11:20: │ y:123 in _send_data                                                          │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   120 │   │   │   async with self.session.post(self.current_url(), data=data │
Dec 08 11:20: │   121 │   │   │   │   kind = resp.headers.get('Content-Type', None)          │
Dec 08 11:20: │   122 │   │   │   │   if kind == 'application/json':                         │
Dec 08 11:20: │ ❱ 123 │   │   │   │   │   return await resp.json(loads=json_deserialize)     │
Dec 08 11:20: │   124 │   │   │   │   text = await resp.text()                               │
Dec 08 11:20: │   125 │   │   │   │   text = text.strip() or resp.reason                     │
Dec 08 11:20: │   126 │   │   │   │   raise ServiceRefusedError(text)                        │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │ data = '{"method": "getblockcount", "id": 75371}'                        │ │
Dec 08 11:20: │ │ kind = 'application/json'                                                │ │
Dec 08 11:20: │ │ resp = <ClientResponse(http://192.168.0.2:9332/) [200 OK]>               │ │
Dec 08 11:20: │ │        <CIMultiDictProxy('Content-Type': 'application/json', 'Date':     │ │
Dec 08 11:20: │ │        'Fri, 08 Dec 2023 10:20:23 GMT', 'Content-Length': '660')>        │ │
Dec 08 11:20: │ │ self = <electrumx.server.daemon.Daemon object at 0x7f2859f8b700>         │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib64/python3.9/site-packages/aiohttp/client_reqrep.p │
Dec 08 11:20: │ y:1161 in json                                                               │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   1158 │   ) -> Any:                                                         │
Dec 08 11:20: │   1159 │   │   """Read and decodes JSON response."""                         │
Dec 08 11:20: │   1160 │   │   if self._body is None:                                        │
Dec 08 11:20: │ ❱ 1161 │   │   │   await self.read()                                         │
Dec 08 11:20: │   1162 │   │                                                                 │
Dec 08 11:20: │   1163 │   │   if content_type:                                              │
Dec 08 11:20: │   1164 │   │   │   ctype = self.headers.get(hdrs.CONTENT_TYPE, "").lower()   │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │ content_type = 'application/json'                                        │ │
Dec 08 11:20: │ │     encoding = None                                                      │ │
Dec 08 11:20: │ │        loads = <function loads at 0x7f285aa94ca0>                        │ │
Dec 08 11:20: │ │         self = <ClientResponse(http://192.168.0.2:9332/) [200 OK]>       │ │
Dec 08 11:20: │ │                <CIMultiDictProxy('Content-Type': 'application/json',     │ │
Dec 08 11:20: │ │                'Date': 'Fri, 08 Dec 2023 10:20:23 GMT',                  │ │
Dec 08 11:20: │ │                'Content-Length': '660')>                                 │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib64/python3.9/site-packages/aiohttp/client_reqrep.p │
Dec 08 11:20: │ y:1114 in read                                                               │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   1111 │   │   elif self._released:  # Response explicitly released          │
Dec 08 11:20: │   1112 │   │   │   raise ClientConnectionError("Connection closed")          │
Dec 08 11:20: │   1113 │   │                                                                 │
Dec 08 11:20: │ ❱ 1114 │   │   await self._wait_released()  # Underlying connection released │
Dec 08 11:20: │   1115 │   │   return self._body  # type: ignore[no-any-return]              │
Dec 08 11:20: │   1116 │                                                                     │
Dec 08 11:20: │   1117 │   def get_encoding(self) -> str:                                    │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │ self = <ClientResponse(http://192.168.0.2:9332/) [200 OK]>               │ │
Dec 08 11:20: │ │        <CIMultiDictProxy('Content-Type': 'application/json', 'Date':     │ │
Dec 08 11:20: │ │        'Fri, 08 Dec 2023 10:20:23 GMT', 'Content-Length': '660')>        │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ /opt/electrumx/env/lib64/python3.9/site-packages/aiohttp/client_reqrep.p │
Dec 08 11:20: │ y:1080 in _wait_released                                                     │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │   1077 │                                                                     │
Dec 08 11:20: │   1078 │   async def _wait_released(self) -> None:                           │
Dec 08 11:20: │   1079 │   │   if self._writer is not None:                                  │
Dec 08 11:20: │ ❱ 1080 │   │   │   await self._writer                                        │
Dec 08 11:20: │   1081 │   │   self._release_connection()                                    │
Dec 08 11:20: │   1082 │                                                                     │
Dec 08 11:20: │   1083 │   def _cleanup_writer(self) -> None:                                │
Dec 08 11:20: │                                                                              │
Dec 08 11:20: │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
Dec 08 11:20: │ │ self = <ClientResponse(http://192.168.0.2:9332/) [200 OK]>               │ │
Dec 08 11:20: │ │        <CIMultiDictProxy('Content-Type': 'application/json', 'Date':     │ │
Dec 08 11:20: │ │        'Fri, 08 Dec 2023 10:20:23 GMT', 'Content-Length': '660')>        │ │
Dec 08 11:20: │ ╰──────────────────────────────────────────────────────────────────────────╯ │
Dec 08 11:20: ╰──────────────────────────────────────────────────────────────────────────────╯
Dec 08 11:20: CancelledError
Dec 08 11:20: INFO:Prefetcher:cancelled; prefetcher stopping

@SomberNight
Copy link
Member

SomberNight commented Dec 13, 2023

@biatwc in your log, there is:

  1. INFO:Prefetcher:ignoring daemon error: {'code': -32700, 'message': 'Parse error'}
  2. ERROR:Daemon:daemon service refused
  3. and then there is a CancelledError

The first two are handled properly I think, they are just logged and then we retry. I am not sure where the cancellation originates from -- that is what ultimately kills the prefetcher.
Some more log lines before/after might help.

@biatwc
Copy link

biatwc commented Dec 20, 2023

I've looked at the logs around the time of the CancelledError on a number of occasions and there is nothing other than the usual INFO level INFO:ElectrumX:[26054] TCP [::1]:46343, 676 total. No errors, nothing out of the ordinary. The various tunnels I've tried over time remain up and traffic flows fine.
What does CancelledError actually mean in this context - the RPC daemon tears down all tcp sessions from the electrumx node? Can electrumx catch CancalledError and retry, or mark the daemon as temporarily unavailable and retry occasionally? Would that help flush out what is going on?
Sorry I can't be more help.

@SomberNight
Copy link
Member

How often does this bug happen for you? Is it only the prefetcher stopping, is the process otherwise still running and serving sessions? I guess you don't have a reliable way to reproduce, do you? :P

What does CancelledError actually mean in this context - the RPC daemon tears down all tcp sessions from the electrumx node? Can electrumx catch CancalledError and retry, or mark the daemon as temporarily unavailable and retry occasionally? Would that help flush out what is going on?

It can be a lot of things. For example, if you intentionally shut down electrumx in a clean way, that too will happen via cancellations (explicit calls to task.cancel and multiple CancelledErrors propagating out and caught at some locations), so we can't just catch CancelledError and retry :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants