Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic segfaults in a python module when using ApiClient to query a Tapo P110 #228

Open
gsaviane opened this issue Jun 12, 2024 · 8 comments

Comments

@gsaviane
Copy link

gsaviane commented Jun 12, 2024

I get random segfaults executing a python module that uses ApiClient to query a P110 device. Unfortunately the Python stack frame is not available when the process receives the signal, but I could catch a core dump that I analyzed with gdb.
This is what it says

0 __pthread_kill_implementation (threadid=548354912640, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
1 0x0000007faeab0a64 in __pthread_kill_internal (signo=11, threadid=) at ./nptl/pthread_kill.c:78
2 0x0000007faea6a76c in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
3
4 0x0000000000490d30 in ?? ()
5 0x0000007fad41f054 in pyo3::types::any::PyAny::call_method::hacc5388e8a698dd3 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
6 0x0000007fad41cd00 in pyo3_asyncio::call_soon_threadsafe::h189597b182986bc9 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
7 0x0000007fad41b164 in pyo3_asyncio::generic::set_result::h943446c3e13924f1 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
8 0x0000007fad2d5e08 in <pyo3_asyncio::tokio::TokioRuntime as pyo3_asyncio::generic::Runtime>::spawn::{{closure}}::h4499f884fd2416d9 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
9 0x0000007fad2c7184 in tokio::runtime::task::core::Core<T,S>::poll::h6f19112f45830e8f ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
10 0x0000007fad2827f8 in tokio::runtime::task::harness::Harness<T,S>::poll::h75be9602525f0e15 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
11 0x0000007fad4314a8 in tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h75332304a442adb5 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
12 0x0000007fad4306f0 in tokio::runtime::scheduler::multi_thread::worker::Context::run::ha0d088c158a7571f ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
13 0x0000007fad42e714 in tokio::runtime::context::set_scheduler::hb924c7a4ab654997 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
14 0x0000007fad429174 in tokio::runtime::context::runtime::enter_runtime::h2dc922c95f430ff4 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
15 0x0000007fad430558 in tokio::runtime::scheduler::multi_thread::worker::run::hcd92fda4015a4913 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
16 0x0000007fad424638 in <tokio::runtime::blocking::task::BlockingTask as core::future::future::Future>::poll::hf97cccf4b76a37d5 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
17 0x0000007fad42b478 in tokio::runtime::task::core::Core<T,S>::poll::h66ccbdb9dab0dc88 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
18 0x0000007fad42b9b8 in tokio::runtime::task::harness::Harness<T,S>::poll::hc66c5f1784947dac ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
19 0x0000007fad424884 in std::sys_common::backtrace::__rust_begin_short_backtrace::h51368bf6bad8d526 ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
20 0x0000007fad436388 in core::ops::function::FnOnce::call_once{{vtable.shim}}::h5d7b156d45e1195b ()
from /root/.venv/lib/python3.11/site-packages/tapo/tapo.cpython-311-aarch64-linux-gnu.so
21 0x0000007fad7b6d30 in alloc::boxed::{impl#47}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> ()
at library/alloc/src/boxed.rs:2020
22 alloc::boxed::{impl#47}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> ()
at library/alloc/src/boxed.rs:2020
23 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/pal/unix/thread.rs:108
24 0x0000007faeaaee58 in start_thread (arg=0x7fe90c4f57) at ./nptl/pthread_create.c:442
25 0x0000007faeb17f9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

I can provide the dumped file if needed.
Seen on 0.2.1 and 0.3.0 versions

@mihai-dinculescu
Copy link
Owner

Thank you for raising the issue. Debugging this is going to be fun :)
Are you able to isolate the Python code that's causing the issue and share it?

@gsaviane
Copy link
Author

gsaviane commented Jun 13, 2024

Thank you for raising the issue. Debugging this is going to be fun :) Are you able to isolate the Python code that's causing the issue and share it?

Unfortunately stderr is only receiving this from the dying process

Thread 0x0000007fa31164c0 (most recent call first):
(no Python frame)
Fatal Python error: Segmentation fault

The code causing the segfault is attached here [removed]

@gsaviane
Copy link
Author

OK, I might have found the cause. This little piece of python code is executed as a telegraf input plugin to collect data from the Tapos and forward it to an MQTT topic. The plugin was set to run every 5 secs with a 4 sec timeout if it does not complete in that time window. Normally the plugin takes less than 1 sec to complete, but in some occasions (network lags, device not ready) it goes past the timeout, and Telegraf preempts it with a SIGTERM. Just by increasing the timeout it's not happening again, and the problem is reproducible. If you have a Tapo P1XX device, just execute it as a normal python program and send it a SIGTERM before it exits (probably you would need a sleep()). It may lack some graceful disposal of the threads created in by tokio upon a SIGTERM

@mihai-dinculescu
Copy link
Owner

Thank you for the update.

I was not able to replicate the issue without TaskGroups. Did you?
Have you tried handling the SIGTERM on the Python side and cancelling the TaskGroups?

PS: I removed the zip file you've attached because it looked like it contained your Tapo password. You might want to change it.

@gsaviane
Copy link
Author

gsaviane commented Jun 15, 2024 via email

@gsaviane
Copy link
Author

Hi, I tried with a SIGTERM handler to cancel the tasks but it fails in the same way. However, now I can get logged more details about the error

Thread 0x0000007f9df514c0 (most recent call first):

Signal 15 received. Cancelling tasks ...
Error unhandled errors in a TaskGroup (1 sub-exception)
Traceback (most recent call last):
File "/usr/local/bin/tapo-telegraf-multi-async.py", line 149, in
asyncio.run(main())
File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/usr/local/bin/tapo-telegraf-multi-async.py", line 143, in main
if task.result() == "":
^^^^^^^^^^^^^
File "/usr/local/bin/tapo-telegraf-multi-async.py", line 109, in gen_influx_str_from_ip
p110 = await init_p110(ip)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/bin/tapo-telegraf-multi-async.py", line 73, in init_p110
p110 = await client.p110(ip)
^^^^^^^^^^^^^^^^^^^^^
Exception: Tapo(Unknown(1003))
Signal 15 received. Cancelling tasks ...
thread 'tokio-runtime-worker' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.3/src/gil.rs:199:21:
assertion left != right failed: The Python interpreter is not initialized and the auto-initialize feature is not enabled.

Consider calling pyo3::prepare_freethreaded_python() before attempting to use Python APIs.
left: 0
right: 0
stack backtrace:
0: rust_begin_unwind
at ./rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at ./rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
4: parking_lot::once::Once::call_once_force::{{closure}}
5: parking_lot::once::Once::call_once_slow
6: pyo3::gil::GILGuard::acquire
7: <pyo3_asyncio::tokio::TokioRuntime as pyo3_asyncio::generic::Runtime>::spawn::{{closure}}
8: tokio::runtime::task::core::Core<T,S>::poll
9: tokio::runtime::task::harness::Harness<T,S>::poll
10: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
11: tokio::runtime::scheduler::multi_thread::worker::Context::run
12: tokio::runtime::context::set_scheduler
13: tokio::runtime::context::runtime::enter_runtime
14: tokio::runtime::scheduler::multi_thread::worker::run
15: <tokio::runtime::blocking::task::BlockingTask as core::future::future::Future>::poll
16: tokio::runtime::task::core::Core<T,S>::poll
Signal 15 received. Cancelling tasks ...
Signal 15 received. Cancelling tasks ...
Fatal Python error: Segmentation fault

@mihai-dinculescu
Copy link
Owner

Are you able to try out the suggested solution to use the pyo3 auto-initiatize feature?

@gsaviane
Copy link
Author

gsaviane commented Jul 4, 2024

What you suggest is to rebuild the python package with that PyO3 feature enabled? If so, I need some guidance.
By the way, I upgraded the tapo Python package to 0.3.1, and now my script hangs occasionally requiring a sigterm to exit. I needed to revert to 0.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants