Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

program deadlocks with ofi_uffd_handler() #5580

Closed
krehm opened this issue Jan 28, 2020 · 5 comments
Closed

program deadlocks with ofi_uffd_handler() #5580

krehm opened this issue Jan 28, 2020 · 5 comments

Comments

@krehm
Copy link

krehm commented Jan 28, 2020

Looking for some advice. I am running daos_test on CentOS 7 with 'verbs;ofi_rxm'. I have set RDMAV_HUGEPAGES_SAFE=1. The program makes several calls to ucm_set_event_handler() from openmpi3, two calls from function mca_pml_ucx_open() in mca_pml_ucx.so and two from function component_init() in mca_osc_ucx.so. I don't actually know what the handlers are, haven't found the code yet. Don't know if this is even important.

daos_test runs for a while, then hangs every time. Two threads are deadlocked against each other.

The first thread is the following, note that the call is coming from ucm_dlfree(). The daos_test code is just calling free() here which results in a call to ucm_dlfree().

Thread 3 (Thread 0x7f22807bc300 (LWP 2614)):
#0 0x00007f227eb63ba9 in syscall () from /lib64/libc.so.6
#1 0x00007f227d47cd52 in intercept_munmap ()
from /usr/lib64/openmpi3/lib/libopen-pal.so.40
#2 0x00007f226802c732 in ucm_event_call_orig () from /lib64/libucm.so.0
#3 0x00007f226802c642 in ucm_event_dispatch () from /lib64/libucm.so.0
#4 0x00007f226802c997 in ucm_munmap () from /lib64/libucm.so.0
#5 0x00007f22680340db in ucm_dlfree () from /lib64/libucm.so.0
#6 0x000000000043a5ff in array_simple (state=)
at src/tests/suite/daos_obj_array.c:182
#7 0x00007f227f059da2 in cmocka_run_one_test_or_fixture ()
from /lib64/libcmocka.so.0
#8 0x00007f227f05a7a5 in _cmocka_run_group_tests () from /lib64/libcmocka.so.0
#9 0x000000000043b73d in run_daos_obj_array_test (rank=rank@entry=0,
size=size@entry=1) at src/tests/suite/daos_obj_array.c:990
#10 0x0000000000406e64 in run_specified_tests (sub_tests_size=0,
sub_tests=0x0, size=1, rank=0, tests=0x7ffcfa5333d8 "ADKCoROdrFNv")
at src/tests/suite/daos_test.c:162
#11 main (argc=1, argv=0x7ffcfa533538) at src/tests/suite/daos_test.c:470

Performing a cat on /proc//stack I see something like this for the above thread:

[] userfaultfd_event_wait_completion+0xf5/0x220
[] userfaultfd_unmap_complete+0x86/0xd0
[] vm_munmap+0x7b/0xb0
[] SyS_munmap+0x22/0x30
[] system_call_fastpath+0x25/0x2a
[] 0xffffffffffffffff

Here is the second thread. This is the libfabric ofi_uffd_handler() function. I believe it is processing the event from the thread above. It calls ibv_dereg_mr() which calls mlx4_dereg_mr, which calls ucm_dlfree(). That deadlocks because the spinlock is already held locked by the thread above.

Thread 2 (Thread 0x7f22658fe700 (LWP 2670)):
#0 0x00007f227eb4e727 in sched_yield () from /lib64/libc.so.6
#1 0x00007f226803095e in spin_acquire_lock () from /lib64/libucm.so.0
#2 0x00007f2268033f7f in ucm_dlfree () from /lib64/libucm.so.0
#3 0x00007f226bbd26dd in mlx4_dereg_mr ()
from /usr/lib64/libibverbs/libmlx4-rdmav22.so
#4 0x00007f227b85d7b3 in ibv_dereg_mr () from /lib64/libibverbs.so.1
#5 0x00007f227c12575c in fi_ibv_mr_cache_delete_region (cache=0x273b5f8,
entry=0x266c950) at prov/verbs/src/verbs_mr.c:229
#6 0x00007f227c0e6b60 in util_mr_free_entry (cache=0x273b5f8, entry=0x266c950)
at prov/util/src/util_mr_cache.c:81
#7 0x00007f227c0e6c1c in util_mr_uncache_entry (cache=0x273b5f8,
entry=0x266c950) at prov/util/src/util_mr_cache.c:106
#8 0x00007f227c0e6cdd in ofi_mr_cache_notify (cache=0x273b5f8,
addr=0x7f2261ac6000, len=10489856) at prov/util/src/util_mr_cache.c:125
#9 0x00007f227c0e4341 in ofi_monitor_notify (monitor=0x7f227c46e840 ,
addr=0x7f2261ac6000, len=10489856) at prov/util/src/util_mem_monitor.c:189
#10 0x00007f227c0e45ee in ofi_uffd_handler (arg=0x7f227c46e840 )
at prov/util/src/util_mem_monitor.c:256
#11 0x00007f227ee40e65 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f227eb6988d in clone () from /lib64/libc.so.6

The two threads are now deadlocked against each other. Ideas?

@shefty
Copy link
Member

shefty commented Jan 28, 2020

Can you please retest with the tip of master? Specifically, there's a series of changes ending with commit

3d01df7

added about a week ago that should address this issue.

@krehm
Copy link
Author

krehm commented Jan 28, 2020

That does seem to fix the problem. Doesn't this open a tiny window in which the NIC could write to the memory that has just been unmapped?

You can close this ticket, and thanks for the quick response!

@shefty
Copy link
Member

shefty commented Jan 28, 2020

It doesn't open a window that didn't already exists. A peer application could always target a region that the local process has freed, but which still holds a registration. Calling free on memory doesn't necessarily result in that memory being unmapped. That is dependent on the behavior of the memory manager. The unmap could happen at a much later than free, and from another thread.

Because we hold a registration on the pages, they cannot be passed to another process. So, we shouldn't leak data or corrupt another process space. If the peer cannot be trusted to avoid writing into memory that they were given RDMA access to outside of an accessible window, then caching would need to be disabled completely.

@krehm
Copy link
Author

krehm commented Jan 29, 2020

I was concerned more about FI_CONTEXT/FI_CONTEXT2 where the NIC is allowed to scribble at the front of an allocated context. Not sure if scribbling could happen after the unmap but before the NIC is notified that the memory is no longer registered. But the fi_getinfo man page seems to imply that the NIC can only use those bytes while an operation is in progress, so maybe there is no risk here.

@shefty
Copy link
Member

shefty commented Jan 29, 2020

Correct, the NIC should not modify FI_CONTEXT after generating a completion. That's separate from MR caching though. Attempting to modify FI_CONTEXT outside of the operation being in progress could definitely corrupt memory.

@shefty shefty closed this as completed Jan 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants