Skip to content

Commit

Permalink
prov/verbs: Add missing lock to protect SRX
Browse files Browse the repository at this point in the history
When compiled with --enable-debug, the test fi_shared_ctx fails on
the following assertion failure:
fi_shared_ctx: prov/verbs/src/verbs_ofi.h:993: vrb_alloc_ctx: Assertion `ofi_genlock_held(progress->active_lock)' failed.

And here is the associated stack:
6  0x00007ffff7a39e96 in __GI___assert_fail (assertion=0x7ffff7f2ec28 "ofi_genlock_held(progress->active_lock)", file=0x7ffff7f2ec08 "prov/verbs/src/verbs_ofi.h", line=993, function=0x7ffff7f2f338 <__PRETTY_FUNCTION__.45> "vrb_alloc_ctx") at ./assert/assert.c:101
7  0x00007ffff7e06b9b in vrb_alloc_ctx (progress=0x5555555ca1a0) at prov/verbs/src/verbs_ofi.h:993
8  0x00007ffff7e0b9fe in vrb_post_srq (srx=0x5555555ca7c0, wr=0x7fffffffe120) at prov/verbs/src/verbs_ep.c:1603
9  0x00007ffff7e0bd2d in vrb_srx_recv (ep_fid=0x5555555ca7c0, buf=0x0, len=1024, desc=0x0, src_addr=18446744073709551615, context=0x55555556b740 <rx_ctx>) at prov/verbs/src/verbs_ep.c:1649
10 0x000055555555d720 in fi_recv (context=0x55555556b740 <rx_ctx>, src_addr=<optimized out>, desc=0x0, len=1024, buf=0x0, ep=0x5555555ca7c0) at /home/sdidelot/libfabric/include/rdma/fi_endpoint.h:297
11 ft_post_rx_buf (ep=0x5555555ca7c0, size=1024, ctx=0x55555556b740 <rx_ctx>, op_buf=0x0, op_mr_desc=0x0, op_tag=0) at common/shared.c:2392
12 0x000055555555d937 in ft_post_rx (ep=<optimized out>, size=<optimized out>, ctx=<optimized out>) at common/shared.c:2400
13 0x00005555555576cf in server_connect () at functional/shared_ctx.c:502
14 run () at functional/shared_ctx.c:547
15 main (argc=<optimized out>, argv=<optimized out>) at functional/shared_ctx.c:629

The problem is that vrb_post_srq() doesn't acquire progress->active_lock
when vrb_alloc_ctx() is called, which may result in a race condition
if multiple threads concurrently access the same SRX queue.

Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
  • Loading branch information
sydidelot committed Nov 15, 2023
1 parent 0087703 commit c4bfd92
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions prov/verbs/src/verbs_ep.c
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -1533,9 +1533,12 @@ ssize_t vrb_post_srq(struct vrb_srx *srx, struct ibv_recv_wr *wr)
struct ibv_recv_wr *bad_wr;
int ret;

ofi_genlock_lock(vrb_srx2_progress(srx)->active_lock);
ctx = vrb_alloc_ctx(vrb_srx2_progress(srx));
if (!ctx)
return -FI_EAGAIN;
if (!ctx) {
ret = -FI_EAGAIN;
goto unlock;
}

ctx->srx = srx;
ctx->user_ctx = (void *) (uintptr_t) wr->wr_id;
Expand All @@ -1549,6 +1552,9 @@ ssize_t vrb_post_srq(struct vrb_srx *srx, struct ibv_recv_wr *wr)
vrb_free_ctx(vrb_srx2_progress(srx), ctx);
ret = FI_EAGAIN;
}

unlock:
ofi_genlock_unlock(vrb_srx2_progress(srx)->active_lock);
return ret;
}

Expand Down

0 comments on commit c4bfd92

Please sign in to comment.