Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fi_rdm_tagged_peek failures on occasional CI runs #8249

Closed
shefty opened this issue Nov 17, 2022 · 5 comments
Closed

fi_rdm_tagged_peek failures on occasional CI runs #8249

shefty opened this issue Nov 17, 2022 · 5 comments
Labels

Comments

@shefty
Copy link
Member

shefty commented Nov 17, 2022

Intel CI failure:

name:   fi_rdm_tagged_peek -p "net"
11:39:38    timestamp: 20221116-193938+0000
11:39:38    result: Fail
11:39:38    time:   2
11:39:38    server_cmd: /home/cstbuild/ofi-Install/libfabric/ofi_libfabric/PR-8246/1/dbg/bin/fi_rdm_tagged_peek -p "net"   -s ci5-eth2
11:39:38    server_stdout: |
11:39:38      Sending five tagged messages
11:39:38      Waiting for messages to complete
11:39:38    client_cmd: /home/cstbuild/ofi-Install/libfabric/ofi_libfabric/PR-8246/1/dbg/bin/fi_rdm_tagged_peek -p "net"   -s ci6-eth2 ci5-eth2
11:39:38    client_stdout: |
11:39:38      fi_rdm_tagged_peek: prov/util/src/util_buf.c:256: ofi_bufpool_destroy: Assertion `(pool->attr.flags & OFI_BUFPOOL_NO_TRACK) || !ofi_atomic_get32(&buf_region->use_cnt)' failed.

AppVeyor failure:

name:   rdm_tagged_peek -p sockets
  timestamp: Wed 11/16/2022 21:52:28.13
  result: Fail
  time:   183
  server_cmd: C:\projects\libfabric\fabtests\x64\Release-v141\rdm_tagged_peek -p sockets  -s 127.0.0.1
  server_stdout:
  client_cmd: C:\projects\libfabric\fabtests\x64\Release-v141\rdm_tagged_peek -p sockets  -s 127.0.0.1 127.0.0.1
  client_stdout

Maybe it's coincidence, but it looks like the rdm_tagged_peek test might have a race bug. I've seen both failures a couple of times on PRs.

@shefty shefty added the bug label Nov 17, 2022
@aingerson
Copy link
Contributor

@shefty Adding backtrace from net provider assertion

12:36:00 fi_rdm_tagged_peek: prov/util/src/util_buf.c:256: ofi_bufpool_destroy: Assertion `(pool->attr.flags & OFI_BUFPOOL_NO_TRACK) || !ofi_atomic_get32(&buf_region->use_cnt)' failed.
12:36:00
12:36:00 Program received signal SIGABRT, Aborted.
12:36:00 0x00007ffff670137f in raise () from /lib64/libc.so.6
12:36:00 #0 0x00007ffff670137f in raise () from /lib64/libc.so.6
12:36:00 #1 0x00007ffff66ebdb5 in abort () from /lib64/libc.so.6
12:36:00 #2 0x00007ffff66ebc89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
12:36:00 #3 0x00007ffff66f9a76 in __assert_fail () from /lib64/libc.so.6
12:36:00 #4 0x00007ffff796fccb in ofi_bufpool_destroy (pool=0x63a750) at prov/util/src/util_buf.c:255
12:36:00 #5 0x00007ffff7a42b4b in xnet_close_progress (progress=0x63a4d8) at prov/net/src/xnet_progress.c:1367
12:36:00 #6 0x00007ffff7a2e151 in xnet_domain_close (fid=0x63a3e0) at prov/net/src/xnet_domain.c:178
12:36:00 #7 0x00000000004028f1 in fi_close (fid=0x63a3e0) at /home/cstbuild/ofi-Install/libfabric/aingerson_main/ci/70/dbg/include/rdma/fabric.h:621
12:36:00 #8 0x0000000000409181 in ft_close_fids () at common/shared.c:1642
12:36:00 #9 0x00000000004095db in ft_free_res () at common/shared.c:1695
12:36:00 #10 0x00000000004028c4 in main (argc=6, argv=0x7fffffffe858) at functional/rdm_tagged_peek.c:269

@shefty
Copy link
Member Author

shefty commented Dec 6, 2022

I believe I have a fix for the net provider failure. The socket issue is a separate problem.

@ooststep
Copy link
Contributor

ooststep commented Dec 6, 2022

I believe I have a fix for the net provider failure. The socket issue is a separate problem.

is that #8306?

@shefty
Copy link
Member Author

shefty commented Dec 6, 2022

No - the fix is in my main branch, not yet in a PR.

@shefty
Copy link
Member Author

shefty commented Apr 28, 2023

Problem was recently fixed. Issue was in the test itself.

@shefty shefty closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants