remove FIXME comments #2

tromey · 2016-04-14T16:00:10Z

Upstream doesn't allow new FIXME comments. Make sure all the new ones are addressed or removed.

immediate_quit used to be necessary back when prompt_for_continue used blocking fread, but nowadays it uses gdb_readline_wrapper, which is implemented in terms of a nested event loop, which already knows how to react to SIGINT: #0 throw_it (reason=RETURN_QUIT, error=GDB_NO_ERROR, fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88) at .../src/gdb/common/common-exceptions.c:324 #1 0x00000000007bab5d in throw_vquit (fmt=0x9d6d7e "Quit", ap=0x7fffffffcb88) at .../src/gdb/common/common-exceptions.c:366 #2 0x00000000007bac9f in throw_quit (fmt=0x9d6d7e "Quit") at .../src/gdb/common/common-exceptions.c:385 #3 0x0000000000773a2d in quit () at .../src/gdb/utils.c:1039 #4 0x000000000065d81b in async_request_quit (arg=0x0) at .../src/gdb/event-top.c:893 #5 0x000000000065c27b in invoke_async_signal_handlers () at .../src/gdb/event-loop.c:949 #6 0x000000000065aeef in gdb_do_one_event () at .../src/gdb/event-loop.c:280 #7 0x0000000000770838 in gdb_readline_wrapper (prompt=0x7fffffffcd40 "---Type <return> to continue, or q <return> to quit---") at .../src/gdb/top.c:873 The need for the QUIT in stdin_event_handler is then exposed by the gdb.base/double-prompt-target-event-error.exp test, which has: # We're now stopped in a pagination query while handling a # target event (printing where the program stopped). Quitting # the pagination should result in only one prompt being # output. send_gdb "\003p 1\n" Without that change we'd get: Continuing. ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination ^CpQuit (gdb) 1 Undefined command: "1". Try "help". (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt ERROR: Undefined command "". UNRESOLVED: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt Vs: Continuing. ---Type <return> to continue, or q <return> to quit---PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: continue to pagination ^CQuit (gdb) p 1 $1 = 1 (gdb) PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: first prompt PASS: gdb.base/double-prompt-target-event-error.exp: ctrlc target event: continue: no double prompt gdb/ChangeLog: 2016-04-12 Pedro Alves <palves@redhat.com> * event-top.c (stdin_event_handler): Call QUIT; (prompt_for_continue): Don't run with immediate_quit set.

I see the following test fail in arm-linux with -marm and -fomit-frame-pointer, step callee () at /home/yao/SourceCode/gnu/gdb/git/gdb/testsuite/gdb.reverse/step-reverse.c:27 27 } /* RETURN FROM CALLEE */ (gdb) step main () at /home/yao/SourceCode/gnu/gdb/git/gdb/testsuite/gdb.reverse/step-reverse.c:58 58 callee(); /* STEP INTO THIS CALL */ (gdb) FAIL: gdb.reverse/step-precsave.exp: reverse step into fn call As we can see, the "step" has already stepped into the function callee, but in the last line. The second "step" attempts to step to function body, but it goes out of callee, which isn't expected. The program is compiled with -marm and -fomit-frame-pointer, the function callee is prologue-less, because nothing needs to be saved on stack, (gdb) disassemble callee Dump of assembler code for function callee: 0x00010680 <+0>: movw r3, #2364 ; 0x93c 0x00010684 <+4>: movt r3, #2 0x00010688 <+8>: ldr r3, [r3] 0x0001068c <+12>: add r2, r3, #1 0x00010690 <+16>: movw r3, #2364 ; 0x93c 0x00010694 <+20>: movt r3, #2 0x00010698 <+24>: str r2, [r3] 0x0001069c <+28>: mov r3, #0 0x000106a0 <+32>: mov r0, r3 0x000106a4 <+36>: bx lr program stops at the 0x106a0 (passed the epilogue) after the first "step". When second "step" is executed, the stepping range is [0x10680-0x106a0], which starts from the first instruction of function callee (because it doesn't have prologue). infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [LWP 2461] at 0x1069c^M infrun: prepare_to_wait^M infrun: target_wait (-1.0.0, status) =^M infrun: 2461.2461.0 [LWP 2461],^M infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP^M infrun: TARGET_WAITKIND_STOPPED^M infrun: stop_pc = 0x10698^M infrun: stepping inside range [0x10680-0x106a0] When program goes out of the range, it stops at the caller of callee, and test fails. IOW, if function callee has prologue, the stepping range won't start from the first instruction of the function, and program stops at the prologue and test passes. IMO, GDB does nothing wrong, but test shouldn't expect the program stops in callee after the second "step". I decide to fix test rather than GDB. In this patch, I change to test to do one "step", and check the program is still in callee, then, do multiple "step" until program goes out of the callee. gdb/testsuite: 2016-04-22 Yao Qi <yao.qi@linaro.org> * gdb.reverse/step-precsave.exp: Do one step and test program stops in "callee" and do multiple steps until program goes out of "callee". * gdb.reverse/step-reverse.exp: Likewise.

tromey · 2016-05-27T20:49:53Z

Nothing left to do.

Nowadays, read_memory may throw NOT_AVAILABLE_ERROR (it is done by patch http://sourceware.org/ml/gdb-patches/2013-08/msg00625.html) however, read_stack and read_code still throws MEMORY_ERROR only. This causes PR 19947, that is prologue unwinder is unable unwind because code memory isn't available, but MEMORY_ERROR is thrown, while unwinder catches NOT_AVAILABLE_ERROR. #0 memory_error (err=err@entry=TARGET_XFER_E_IO, memaddr=memaddr@entry=140737349781158) at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:217 tromey#1 0x000000000065f5ba in read_code (memaddr=memaddr@entry=140737349781158, myaddr=myaddr@entry=0x7fffffffd7b0 "\340\023<\001", len=len@entry=1) at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:288 tromey#2 0x000000000065f7b5 in read_code_unsigned_integer (memaddr=memaddr@entry=140737349781158, len=len@entry=1, byte_order=byte_order@entry=BFD_ENDIAN_LITTLE) at /home/yao/SourceCode/gnu/gdb/git/gdb/corefile.c:363 tromey#3 0x00000000004717e0 in amd64_analyze_prologue (gdbarch=gdbarch@entry=0x13c13e0, pc=140737349781158, current_pc=140737349781165, cache=cache@entry=0xda0cb0) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2267 tromey#4 0x0000000000471f6d in amd64_frame_cache_1 (cache=0xda0cb0, this_frame=0xda0bf0) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2437 tromey#5 amd64_frame_cache (this_frame=0xda0bf0, this_cache=<optimised out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2508 tromey#6 0x000000000047214d in amd64_frame_this_id (this_frame=<optimised out>, this_cache=<optimised out>, this_id=0xda0c50) at /home/yao/SourceCode/gnu/gdb/git/gdb/amd64-tdep.c:2541 tromey#7 0x00000000006b94c4 in compute_frame_id (fi=0xda0bf0) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:481 tromey#8 get_prev_frame_if_no_cycle (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1809 tromey#9 0x00000000006bb6c9 in get_prev_frame_always_1 (this_frame=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1983 tromey#10 get_prev_frame_always (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1999 tromey#11 0x00000000006bbe11 in get_prev_frame (this_frame=this_frame@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:2241 tromey#12 0x00000000006bc13c in unwind_to_current_frame (ui_out=<optimised out>, args=args@entry=0xda0b20) at /home/yao/SourceCode/gnu/gdb/git/gdb/frame.c:1485 The fix is to let read_stack and read_code throw NOT_AVAILABLE_ERROR too, in order to align with read_memory. gdb: 2016-05-04 Yao Qi <yao.qi@linaro.org> PR gdb/19947 * corefile.c (read_memory): Rename it to ... (read_memory_object): ... it. Add parameter object. (read_memory): Call read_memory_object. (read_stack): Likewise. (read_code): Likewise.

Nowadays, GDB can't insert breakpoint on the return address of the exception handler on ARM M-profile, because the address is a magic one 0xfffffff9, (gdb) bt #0 CT32B1_IRQHandler () at ../src/timer.c:67 tromey#1 <signal handler called> tromey#2 main () at ../src/timer.c:127 (gdb) info frame Stack level 0, frame at 0x200ffa8: pc = 0x4ec in CT32B1_IRQHandler (../src/timer.c:67); saved pc = 0xfffffff9 called by frame at 0x200ffc8 source language c. Arglist at 0x200ffa0, args: Locals at 0x200ffa0, Previous frame's sp is 0x200ffa8 Saved registers: r7 at 0x200ffa0, lr at 0x200ffa4 (gdb) x/x 0xfffffff9 0xfffffff9: Cannot access memory at address 0xfffffff9 (gdb) finish Run till exit from #0 CT32B1_IRQHandler () at ../src/timer.c:67 Ed:15: Target error from Set break/watch: Et:96: Pseudo-address (0xFFFFFFxx) for EXC_RETURN is invalid (GDB error?) Warning: Cannot insert hardware breakpoint 0. Could not insert hardware breakpoints: You may have requested too many hardware breakpoints/watchpoints. Command aborted. even some debug probe can't set hardware breakpoint on the magic address too, (gdb) hbreak *0xfffffff9 Hardware assisted breakpoint 2 at 0xfffffff9 (gdb) c Continuing. Ed:15: Target error from Set break/watch: Et:96: Pseudo-address (0xFFFFFFxx) for EXC_RETURN is invalid (GDB error?) Warning: Cannot insert hardware breakpoint 2. Could not insert hardware breakpoints: You may have requested too many hardware breakpoints/watchpoints. Command aborted. The problem described above is quite similar to PR 8841, in which GDB can't set breakpoint on signal trampoline, which is mapped to a read-only page by kernel. The rationale of this patch is to skip "unwritable" frames when looking for caller frames in command "finish", and a new gdbarch method code_of_frame_writable is added. This patch fixes the problem on ARM cortex-m target, but it can be used to fix PR 8841 too. gdb: 2016-05-10 Yao Qi <yao.qi@arm.com> * arch-utils.c (default_code_of_frame_writable): New function. * arch-utils.h (default_code_of_frame_writable): Declare. * arm-tdep.c (arm_code_of_frame_writable): New function. (arm_gdbarch_init): Install gdbarch method code_of_frame_writable if the target is M-profile. * frame.c (skip_unwritable_frames): New function. * frame.h (skip_unwritable_frames): Declare. * gdbarch.sh (code_of_frame_writable): New. * gdbarch.c, gdbarch.h: Re-generated. * infcmd.c (finish_command): Call skip_unwritable_frames.

As reported in PR 19998, after type ctrl-c, GDB hang there and does not send interrupt. It causes a fail in gdb.base/interrupt.exp. All targets support remote fileio should be affected. When we type ctrc-c, SIGINT is handled by remote_fileio_sig_set, as shown below, #0 remote_fileio_sig_set (sigint_func=0x4495d0 <remote_fileio_ctrl_c_signal_handler(int)>) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:325 tromey#1 0x00000000004495de in remote_fileio_ctrl_c_signal_handler (signo=<optimised out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:349 tromey#2 <signal handler called> tromey#3 0x00007ffff647ed83 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81 tromey#4 0x00000000005530ce in interruptible_select (n=10, readfds=readfds@entry=0x7fffffffd730, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x0) at /home/yao/SourceCode/gnu/gdb/git/gdb/event-top.c:1017 tromey#5 0x000000000061ab20 in stdio_file_read (file=<optimised out>, buf=0x12d02e0 "\n\022-\001", length_buf=16383) at /home/yao/SourceCode/gnu/gdb/git/gdb/ui-file.c:577 tromey#6 0x000000000044a4dc in remote_fileio_func_read (buf=0x12c0360 "") at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:583 tromey#7 0x0000000000449598 in do_remote_fileio_request (uiout=<optimised out>, buf_arg=buf_arg@entry=0x12c0340) at /home/yao/SourceCode/gnu/gdb/git/gdb/remote-fileio.c:1179 we don't set quit_serial_event, do { res = gdb_select (n, readfds, writefds, exceptfds, timeout); } while (res == -1 && errno == EINTR); if (res == 1 && FD_ISSET (fd, readfds)) { errno = EINTR; return -1; } return res; we can't go out of the loop above, and that is why GDB can't send interrupt. Recently, we stop throwing exception from SIGINT handler (remote_fileio_ctrl_c_signal_handler) https://sourceware.org/ml/gdb-patches/2016-03/msg00372.html, which is correct, because gdb_select is interruptible. However, in the same patch series, we add interruptible_select later as a wrapper to gdb_select, https://sourceware.org/ml/gdb-patches/2016-03/msg00375.html and it is not interruptible (because of the loop in it) unless select/poll-able file descriptors are marked. This fix in this patch is to call quit_serial_event_set, so that we can go out of the loop above, return -1 and set errno to EINTR. 2016-06-01 Yao Qi <yao.qi@linaro.org> PR remote/19998 * remote-fileio.c (remote_fileio_ctrl_c_signal_handler): Call quit_serial_event_set.

I see the following GDBserver internal error in two cases, gdb/gdbserver/linux-low.c:1922: A problem internal to GDBserver has been detected. unsuspend LWP 17200, suspended=-1 1. step over a breakpoint on fork/vfork syscall instruction, 2. step over a breakpoint on clone syscall instruction and child threads hits a breakpoint, the stack backtrace is #0 internal_error (file=file@entry=0x44c4c0 "gdb/gdbserver/linux-low.c", line=line@entry=1922, fmt=fmt@entry=0x44c7d0 "unsuspend LWP %ld, suspended=%d\n") at gdb/gdbserver/../common/errors.c:51 tromey#1 0x0000000000424014 in lwp_suspended_decr (lwp=<optimised out>, lwp=<optimised out>) at gdb/gdbserver/linux-low.c:1922 tromey#2 0x000000000042403a in unsuspend_one_lwp (entry=<optimised out>, except=0x66e8c0) at gdb/gdbserver/linux-low.c:2885 tromey#3 0x0000000000405f45 in find_inferior (list=<optimised out>, func=func@entry=0x424020 <unsuspend_one_lwp>, arg=arg@entry=0x66e8c0) at gdb/gdbserver/inferiors.c:243 tromey#4 0x00000000004297de in unsuspend_all_lwps (except=0x66e8c0) at gdb/gdbserver/linux-low.c:2895 tromey#5 linux_wait_1 (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, target_options=target_options@entry=0) at gdb/gdbserver/linux-low.c:3632 tromey#6 0x000000000042a764 in linux_wait (ptid=..., ourstatus=0x665ec0 <last_status>, target_options=0) at gdb/gdbserver/linux-low.c:3770 tromey#7 0x0000000000411163 in mywait (ptid=..., ourstatus=ourstatus@entry=0x665ec0 <last_status>, options=options@entry=0, connected_wait=connected_wait@entry=1) at gdb/gdbserver/target.c:214 tromey#8 0x000000000040b1f2 in resume (actions=0x66f800, num_actions=1) at gdb/gdbserver/server.c:2757 tromey#9 0x000000000040f660 in handle_v_cont (own_buf=0x66a630 "vCont;c:p45e9.-1") at gdb/gdbserver/server.c:2719 when GDBserver steps over a thread, other threads have been suspended, the "stepping" thread may create new thread, but GDBserver doesn't set it suspend count to 1. When GDBserver unsuspend threads, the child's suspend count goes to -1, and the assert is triggered. In fact, GDBserver has already taken care of suspend count of new thread when GDBserver is suspending all threads except the one GDBserver wants to step over by https://sourceware.org/ml/gdb-patches/2015-07/msg00946.html + /* If we're suspending all threads, leave this one suspended + too. */ + if (stopping_threads == STOPPING_AND_SUSPENDING_THREADS) + { + if (debug_threads) + debug_printf ("HEW: leaving child suspended\n"); + child_lwp->suspended = 1; + } but that is not enough, because new thread is still can be spawned in the thread which is being stepped over. This patch extends the condition that GDBserver set child's suspend count to one if it is suspending threads or stepping over the thread. gdb/gdbserver: 2016-03-03 Yao Qi <yao.qi@linaro.org> PR server/19736 * linux-low.c (handle_extended_wait): Set child suspended if event_lwp->bp_reinsert isn't zero.

Fix this GDB crash: $ gdb -ex "set architecture mips:10000" Segmentation fault (core dumped) Backtrace: Program received signal SIGSEGV, Segmentation fault. 0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at /home/pedro/gdb/mygit/cxx-convertion/src/gdb/mips-tdep.c:8436 8436 if (bfd_get_flavour (info.abfd) == bfd_target_elf_flavour (top-gdb) bt #0 0x0000000000495b1b in mips_gdbarch_init (info=..., arches=0x0) at .../src/gdb/mips-tdep.c:8436 tromey#1 0x00000000007348a6 in gdbarch_find_by_info (info=...) at .../src/gdb/gdbarch.c:5155 tromey#2 0x000000000073563c in gdbarch_update_p (info=...) at .../src/gdb/arch-utils.c:522 tromey#3 0x0000000000735585 in set_architecture (ignore_args=0x0, from_tty=1, c=0x26bc870) at .../src/gdb/arch-utils.c:496 tromey#4 0x00000000005f29fd in do_sfunc (c=0x26bc870, args=0x0, from_tty=1) at .../src/gdb/cli/cli-decode.c:121 tromey#5 0x00000000005fd3f3 in do_set_command (arg=0x7fffffffdcdd "mips:10000", from_tty=1, c=0x26bc870) at .../src/gdb/cli/cli-setshow.c:455 tromey#6 0x0000000000836157 in execute_command (p=0x7fffffffdcdd "mips:10000", from_tty=1) at .../src/gdb/top.c:460 tromey#7 0x000000000071abfb in catch_command_errors (command=0x835f6b <execute_command>, arg=0x7fffffffdccc "set architecture mips:10000", from_tty=1) at .../src/gdb/main.c:368 tromey#8 0x000000000071bf4f in captured_main (data=0x7fffffffd750) at .../src/gdb/main.c:1132 tromey#9 0x0000000000716737 in catch_errors (func=0x71af44 <captured_main>, func_args=0x7fffffffd750, errstring=0x106b9a1 "", mask=RETURN_MASK_ALL) at .../src/gdb/exceptions.c:240 tromey#10 0x000000000071bfe6 in gdb_main (args=0x7fffffffd750) at .../src/gdb/main.c:1164 tromey#11 0x000000000040a6ad in main (argc=4, argv=0x7fffffffd858) at .../src/gdb/gdb.c:32 (top-gdb) We already check whether info.abfd is NULL before all other bfd_get_flavour calls in the same function. Just this one case was missing. (This was exposed by a WIP test that tries all "set architecture ARCH" values.) gdb/ChangeLog: 2016-03-07 Pedro Alves <palves@redhat.com> * mips-tdep.c (mips_gdbarch_init): Check whether info.abfd is NULL before calling bfd_get_flavour.

This patch adds some sanity check that reinsert breakpoints must be there when doing step-over on software single step target. The check triggers an assert when running forking-threads-plus-breakpoint.exp on arm-linux target, gdb/gdbserver/linux-low.c:4714: A problem internal to GDBserver has been detected.^M int finish_step_over(lwp_info*): Assertion `has_reinsert_breakpoints ()' failed. the error happens when GDBserver has already resumed a thread of process A for step-over (and wait for it hitting reinsert breakpoint), but receives detach request for process B from GDB, which is shown in the backtrace below, (gdb) bt tromey#2 0x000228aa in finish_step_over (lwp=0x12bbd98) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4703 tromey#3 0x00025a50 in finish_step_over (lwp=0x12bbd98) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4749 tromey#4 complete_ongoing_step_over () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4760 tromey#5 linux_detach (pid=25228) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1503 tromey#6 0x00012bae in process_serial_event () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3974 tromey#7 handle_serial_event (err=<optimized out>, client_data=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:4347 tromey#8 0x00016d68 in handle_file_event (event_file_desc=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:429 tromey#9 0x000173ea in process_event () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:184 tromey#10 start_event_loop () at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/event-loop.c:547 tromey#11 0x0000aa2c in captured_main (argv=<optimized out>, argc=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3719 tromey#12 main (argc=<optimized out>, argv=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/server.c:3804 the sanity check tries to find the reinsert breakpoint from process B, but nothing is found. It is wrong, we need to search in process A, since we started step-over of a thread of process A. (gdb) p lwp->thread->entry.id $3 = {pid = 25120, lwp = 25131, tid = 0} (gdb) p current_thread->entry.id $4 = {pid = 25228, lwp = 25228, tid = 0} This patch switched current_thread to the thread we are doing step-over in finish_step_over. gdb/gdbserver: 2016-06-17 Yao Qi <yao.qi@linaro.org> * linux-low.c (maybe_hw_step): New function. (linux_resume_one_lwp_throw): Call maybe_hw_step. (finish_step_over): Switch current_thread to lwp temporarily, and assert has_reinsert_breakpoints returns true. (proceed_one_lwp): Call maybe_hw_step. * mem-break.c (has_reinsert_breakpoints): New function. * mem-break.h (has_reinsert_breakpoints): Declare.

When I run process-dies-while-detaching.exp with GDBserver, I see many warnings printed by GDBserver, ptrace(regsets_fetch_inferior_registers) PID=26183: No such process ptrace(regsets_fetch_inferior_registers) PID=26183: No such process ptrace(regsets_fetch_inferior_registers) PID=26184: No such process ptrace(regsets_fetch_inferior_registers) PID=26184: No such process regsets_fetch_inferior_registers is called when GDBserver resumes each lwp. tromey#2 0x0000000000428260 in regsets_fetch_inferior_registers (regsets_info=0x4690d0 <aarch64_regsets_info>, regcache=0x31832020) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:5412 tromey#3 0x00000000004070e8 in get_thread_regcache (thread=0x31832940, fetch=fetch@entry=1) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/regcache.c:58 tromey#4 0x0000000000429c40 in linux_resume_one_lwp_throw (info=<optimized out>, signal=0, step=0, lwp=0x31832830) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4463 tromey#5 linux_resume_one_lwp (lwp=0x31832830, step=<optimized out>, signal=<optimized out>, info=<optimized out>) at /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:4573 The is the case that threads are disappeared when GDB/GDBserver resumes them. We check errno for ESRCH, and don't print error messages, like what we are doing in regsets_store_inferior_registers. gdb/gdbserver: 2016-08-04 Yao Qi <yao.qi@linaro.org> * linux-low.c (regsets_fetch_inferior_registers): Check errno is ESRCH or not.

… out value With something like: struct A { int bitfield:4; } var; If 'var' ends up wholly-optimized out, printing 'var.bitfield' crashes gdb here: (top-gdb) bt #0 0x000000000058b89f in extract_unsigned_integer (addr=0x2 <error: Cannot access memory at address 0x2>, len=2, byte_order=BFD_ENDIAN_LITTLE) at /home/pedro/gdb/mygit/src/gdb/findvar.c:109 tromey#1 0x00000000005a187a in unpack_bits_as_long (field_type=0x16cff70, valaddr=0x0, bitpos=16, bitsize=12) at /home/pedro/gdb/mygit/src/gdb/value.c:3347 tromey#2 0x00000000005a1b9d in unpack_value_bitfield (dest_val=0x1b5d9d0, bitpos=16, bitsize=12, valaddr=0x0, embedded_offset=0, val=0x1b5d8d0) at /home/pedro/gdb/mygit/src/gdb/value.c:3441 tromey#3 0x00000000005a2a5f in value_fetch_lazy (val=0x1b5d9d0) at /home/pedro/gdb/mygit/src/gdb/value.c:3958 tromey#4 0x00000000005a10a7 in value_primitive_field (arg1=0x1b5d8d0, offset=0, fieldno=0, arg_type=0x16d04c0) at /home/pedro/gdb/mygit/src/gdb/value.c:3161 tromey#5 0x00000000005b01e5 in do_search_struct_field (name=0x1727c60 "bitfield", arg1=0x1b5d8d0, offset=0, type=0x16d04c0, looking_for_baseclass=0, result_ptr=0x7fffffffcaf8, [...] unpack_value_bitfield is already optimized-out/unavailable -aware: (...) VALADDR points to the contents of VAL. If the VAL's contents required to extract the bitfield from are unavailable/optimized out, DEST_VAL is correspondingly marked unavailable/optimized out. however, it is not considering the case of the value having no contents buffer at all, as can happen through allocate_optimized_out_value. gdb/ChangeLog: 2016-08-09 Pedro Alves <palves@redhat.com> * value.c (unpack_value_bitfield): Skip unpacking if the parent has no contents buffer to begin with. gdb/testsuite/ChangeLog: 2016-08-09 Pedro Alves <palves@redhat.com> * gdb.dwarf2/bitfield-parent-optimized-out.exp: New file.

I build GDB with -fsanitize=address, and see the error in tests, (gdb) PASS: gdb.linespec/ls-errs.exp: lang=C++: break 3 foo break -line 3 foo^M =================================================================^M ==4401==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000047487 at pc 0x819d8e bp 0x7fff4e4e6bb0 sp 0x7fff4e4e6ba8^M READ of size 1 at 0x603000047487 thread T0^[[1m^[[0m^M #0 0x819d8d in explicit_location_lex_one /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:502^M tromey#1 0x81a185 in string_to_explicit_location(char const**, language_defn const*, int) /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:556^M tromey#2 0x81ac10 in string_to_event_location(char**, language_defn const*) /home/yao/SourceCode/gnu/gdb/git/gdb/location.c:687^ the code in question is: > /* Special case: C++ operator,. */ > if (language->la_language == language_cplus > && strncmp (*inp, "operator", 8) <--- [1] > && (*inp)[9] == ',') > (*inp) += 9; > ++(*inp); The error is caused by the access to (*inp)[9] if 9 is out of its bounds. However [1] looks odd to me, because if strncmp returns true (non-zero), the following check "(*inp)[9] == ','" makes no sense any more. I suspect it was a typo in the code we meant to "strncmp () == 0". Another problem in the code above is that if *inp is "operator,", we first increment *inp by 9, and then increment it by one again, which is wrong to me. We should only increment *inp by 8 to skip "operator", and go back to the loop header to decide where we stop. gdb: 2016-08-15 Yao Qi <yao.qi@linaro.org> * location.c (explicit_location_lex_one): Compare the return value of strncmp with zero. Don't check (*inp)[9]. Increment *inp by 8.

If I build gdb with -fsanitize=address and run tests, I get error, malformed linespec error: unexpected colon^M (gdb) PASS: gdb.linespec/ls-errs.exp: lang=C: break : break :=================================================================^M ==3266==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000051451 at pc 0x2b5797a972a8 bp 0x7fffd8e0f3c0 sp 0x7fffd8e0f398^M READ of size 2 at 0x602000051451 thread T0 #0 0x2b5797a972a7 in __interceptor_strlen (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x322a7)^M tromey#1 0x7bd004 in compare_filenames_for_search(char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:316^M tromey#2 0x7bd310 in iterate_over_some_symtabs(char const*, char const*, int (*)(symtab*, void*), void*, compunit_symtab*, compunit_symtab*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:411^M tromey#3 0x7bd775 in iterate_over_symtabs(char const*, int (*)(symtab*, void*), void*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:481^M tromey#4 0x7bda15 in lookup_symtab(char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:527^M tromey#5 0x7d5e2a in make_file_symbol_completion_list_1 /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5635^M tromey#6 0x7d61e1 in make_file_symbol_completion_list(char const*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/symtab.c:5684^M tromey#7 0x88dc06 in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:288 .... 0x602000051451 is located 0 bytes to the right of 1-byte region [0x602000051450,0x602000051451)^M mallocated by thread T0 here: #0 0x2b5797ab97ef in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x547ef)^M tromey#1 0xbbfb8d in xmalloc /home/yao/SourceCode/gnu/gdb/git/gdb/common/common-utils.c:43^M tromey#2 0x88dabd in linespec_location_completer /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:273^M tromey#3 0x88e5ef in location_completer(cmd_list_element*, char const*, char const*) /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:531^M tromey#4 0x8902e7 in complete_line_internal /home/yao/SourceCode/gnu/gdb/git/gdb/completer.c:964^ The code in question is here file_to_match = (char *) xmalloc (colon - text + 1); strncpy (file_to_match, text, colon - text + 1); it is likely that file_to_match is not null-terminated. The patch is to strncpy 'colon - text' bytes and explicitly set '\0'. gdb: 2016-08-19 Yao Qi <yao.qi@linaro.org> * completer.c (linespec_location_completer): Make file_to_match null-terminated.

This test case verifies that GDB will not attempt to invoke a python unwinder recursively. At the moment, the behavior exhibited by GDB looks like this: (gdb) source py-recurse-unwind.py Python script imported (gdb) b ccc Breakpoint 1 at 0x4004bd: file py-recurse-unwind.c, line 23. (gdb) run Starting program: py-recurse-unwind TestUnwinder: Recursion detected - returning early. TestUnwinder: Recursion detected - returning early. TestUnwinder: Recursion detected - returning early. TestUnwinder: Recursion detected - returning early. Breakpoint 1, ccc (arg=<unavailable>) at py-recurse-unwind.c:23 23 } (gdb) bt #-1 ccc (arg=<unavailable>) at py-recurse-unwind.c:23 Backtrace stopped: previous frame identical to this frame (corrupt stack?) [I've shortened pathnames for easier reading.] The desired / expected behavior looks like this: (gdb) source py-recurse-unwind.py Python script imported (gdb) b ccc Breakpoint 1 at 0x4004bd: file py-recurse-unwind.c, line 23. (gdb) run Starting program: py-recurse-unwind Breakpoint 1, ccc (arg=789) at py-recurse-unwind.c:23 23 } (gdb) bt #0 ccc (arg=789) at py-recurse-unwind.c:23 tromey#1 0x00000000004004d5 in bbb (arg=456) at py-recurse-unwind.c:28 tromey#2 0x00000000004004ed in aaa (arg=123) at py-recurse-unwind.c:34 tromey#3 0x00000000004004fe in main () at py-recurse-unwind.c:40 Note that GDB's problems go well beyond the fact that it invokes the unwinder recursively. In the process it messes up some internal state (the frame stash) leading to display of (only) the sentinel frame in the backtrace. gdb/testsuite/ChangeLog: * gdb.python/py-recurse-unwind.c: New file. * gdb.python/py-recurse-unwind.py: New file. * gdb.python/py-recurse-unwind.exp: New file.

…_eval This fixes the problem exercised by Kevin's test at: https://sourceware.org/ml/gdb-patches/2016-08/msg00216.html This was originally exposed by the OpenJDK Python-based unwinder. If an unwinder attempts to call parse_and_eval from within its sniffing method, GDB's unwinding machinery enters infinite recursion. However, parse_and_eval is a pretty reasonable thing to call, because Python/Scheme-based unwinders will often need to read globals out of inferior memory. The recursion happens because: - get_current_frame() is called soon after the target stops. - current_frame is NULL, and so we unwind it from the sentinel frame (which is special and has level == -1). - We reach get_prev_frame_if_no_cycle, which does cycle detection based on frame id, and thus tries to compute the frame id of the new frame. - Frame id computation requires an unwinder, so we go through all unwinder sniffers trying to see if one accepts the new frame (the current frame). - the unwinder's sniffer calls parse_and_eval(). - parse_and_eval depends on the selected frame/block, and if not set yet, the selected frame is set to the current frame. - get_current_frame () is called again. current_frame is still NULL, so ... - recurse forever. In Kevin's test at: https://sourceware.org/ml/gdb-patches/2016-08/msg00216.html gdb doesn't recurse forever simply because the Python unwinder contains code to detect and stop the recursion itself. However, GDB goes downhill from here, e.g., by showing the sentinel frame as current frame (note the -1): Breakpoint 1, ccc (arg=<unavailable>) at py-recurse-unwind.c:23 23 } (gdb) bt #-1 ccc (arg=<unavailable>) at py-recurse-unwind.c:23 Backtrace stopped: previous frame identical to this frame (corrupt stack?) That "-1" frame level comes from this: if (catch_exceptions (current_uiout, unwind_to_current_frame, sentinel_frame, RETURN_MASK_ERROR) != 0) { /* Oops! Fake a current frame? Is this useful? It has a PC of zero, for instance. */ current_frame = sentinel_frame; } which is bogus. It's never correct to set the current frame to the sentinel frame. The only reason this has survived so long is that getting here normally indicates something wrong has already happened before and we fix that. And this case is no exception -- it doesn't really matter how precisely we managed to get to that bogus code (it has to do with the the stash), because anything after recursion happens is going to be invalid. So the fix is to avoid the recursion in the first place. Observations: #1 - The recursion happens because we try to do cycle detection from within get_prev_frame_if_no_cycle. That requires computing the frame id of the frame being unwound, and that itself requires calling into the unwinders. #2 - But, the first time we're unwinding from the sentinel frame, when we reach get_prev_frame_if_no_cycle, there's no frame chain at all yet: - current_frame is NULL. - the frame stash is empty. Thus, there's really no need to do cycle detection the first time we reach get_prev_frame_if_no_cycle, when building the current frame. So we can break the recursion by making get_current_frame call a simplified version of get_prev_frame_if_no_cycle that results in setting the current_frame global _before_ computing the current frame's id. But, we can go a little bit further. As there's really no reason anymore to compute the current frame's frame id immediately, we can defer computing it to when some caller of get_current_frame might need it. This was actually how the frame id was computed for all frames before the stash-based cycle detection was added. So in a way, this patch reintroduces the lazy frame id computation, but unlike before, only for the case of the current frame, which turns out to be special. This lazyness, however, requires adjusting gdb.python/py-unwind-maint.exp, because that assumes unwinders are immediately called as side effect of some commands. I didn't see a need to preserve the behavior expected by that test (all it would take is call get_frame_id inside get_current_frame), so I adjusted the test. gdb/ChangeLog: 2016-09-05 Pedro Alves <palves@redhat.com> PR backtrace/19927 * frame.c (get_frame_id): Compute the frame id if not computed yet. (unwind_to_current_frame): Delete. (get_current_frame): Use get_prev_frame_always_1 to get the current frame and assert that that always succeeds. (get_prev_frame_if_no_cycle): Skip cycle detection if returning the current frame. gdb/testsuite/ChangeLog: 2016-09-05 Pedro Alves <palves@redhat.com> PR backtrace/19927 * gdb.python/py-unwind-maint.exp: Adjust tests to not expect that unwinders are immediately called as side effect of "source" or "disable unwinder" commands. * gdb.python/py-recurse-unwind.exp: Remove setup_kfail calls.

Previously: fmov d0, #2 would give an error: Operand 2 should be an integer register whereas the user probably just forgot to add the ".0" to make: fmov d0, #2.0 This patch reports an invalid floating point constant unless the operand is obviously a register. The FPIMM8 handling is only relevant for SVE. Without it: fmov z0, z1 would try to parse z1 as an integer immediate zero (the res2 path), whereas it's more likely that the user forgot the predicate. This is tested by the final patch. gas/ * config/tc-aarch64.c (parse_aarch64_imm_float): Report a specific low-severity error for registers. (parse_operands): Report an invalid floating point constant for if parsing an FPIMM8 fails, and if no better error has been recorded. * testsuite/gas/aarch64/diagnostic.s, testsuite/gas/aarch64/diagnostic.l: Add tests for integer operands to FMOV.

Some SVE instructions count the number of elements in a given vector pattern and allow a scale factor of [1, 16] to be applied to the result. This scale factor is written ", MUL #n", where "MUL" is a new operator. E.g.: UQINCD X0, POW2, MUL #2 This patch adds support for this kind of operand. All existing operators were shifts of some kind, so there was a natural range of [0, 63] regardless of context. This was then narrowered further by later checks (e.g. to [0, 31] when used for 32-bit values). In contrast, MUL doesn't really have a natural context-independent range. Rather than pick one arbitrarily, it seemed better to make the "shift" amount a full 64-bit value and leave the range test to the usual operand-checking code. I've rearranged the fields of aarch64_opnd_info so that this doesn't increase the size of the structure (although I don't think its size is critical anyway). include/ * opcode/aarch64.h (AARCH64_OPND_SVE_PATTERN_SCALED): New aarch64_opnd. (AARCH64_MOD_MUL): New aarch64_modifier_kind. (aarch64_opnd_info): Make shifter.amount an int64_t and rearrange the fields. opcodes/ * aarch64-tbl.h (AARCH64_OPERANDS): Add an entry for AARCH64_OPND_SVE_PATTERN_SCALED. * aarch64-opc.h (FLD_SVE_imm4): New aarch64_field_kind. * aarch64-opc.c (fields): Add a corresponding entry. (set_multiplier_out_of_range_error): New function. (aarch64_operand_modifiers): Add entry for AARCH64_MOD_MUL. (operand_general_constraint_met_p): Handle AARCH64_OPND_SVE_PATTERN_SCALED. (print_register_offset_address): Use PRIi64 to print the shift amount. (aarch64_print_operand): Likewise. Handle AARCH64_OPND_SVE_PATTERN_SCALED. * aarch64-opc-2.c: Regenerate. * aarch64-asm.h (ins_sve_scale): New inserter. * aarch64-asm.c (aarch64_ins_sve_scale): New function. * aarch64-asm-2.c: Regenerate. * aarch64-dis.h (ext_sve_scale): New inserter. * aarch64-dis.c (aarch64_ext_sve_scale): New function. * aarch64-dis-2.c: Regenerate. gas/ * config/tc-aarch64.c (SHIFTED_MUL): New parse_shift_mode. (parse_shift): Handle it. Reject AARCH64_MOD_MUL for all other shift modes. Skip range tests for AARCH64_MOD_MUL. (process_omitted_operand): Handle AARCH64_OPND_SVE_PATTERN_SCALED. (parse_operands): Likewise.

This patch adds support for the new SVE floating-point immediate operands. One operand uses the same 8-bit encoding as base AArch64, but in a different position. The others use a single bit to select between two values. One of the single-bit operands is a choice between 0 and 1, where 0 is not a valid 8-bit encoding. I think the cleanest way of handling these single-bit immediates is therefore to use the IEEE float encoding itself as the immediate value and select between the two possible values when encoding and decoding. As described in the covering note for the patch that added F_STRICT, we get better error messages by accepting unsuffixed vector registers and leaving the qualifier matching code to report an error. This means that we carry on parsing the other operands, and so can try to parse FP immediates for invalid instructions like: fcpy z0, #2.5 In this case there is no suffix to tell us whether the immediate should be treated as single or double precision. Again, we get better error messages by picking one (arbitrary) immediate size and reporting an error for the missing suffix later. include/ * opcode/aarch64.h (AARCH64_OPND_SVE_FPIMM8): New aarch64_opnd. (AARCH64_OPND_SVE_I1_HALF_ONE, AARCH64_OPND_SVE_I1_HALF_TWO) (AARCH64_OPND_SVE_I1_ZERO_ONE): Likewise. opcodes/ * aarch64-tbl.h (AARCH64_OPERANDS): Add entries for the new SVE FP immediate operands. * aarch64-opc.h (FLD_SVE_i1): New aarch64_field_kind. * aarch64-opc.c (fields): Add corresponding entry. (operand_general_constraint_met_p): Handle the new SVE FP immediate operands. (aarch64_print_operand): Likewise. * aarch64-opc-2.c: Regenerate. * aarch64-asm.h (ins_sve_float_half_one, ins_sve_float_half_two) (ins_sve_float_zero_one): New inserters. * aarch64-asm.c (aarch64_ins_sve_float_half_one): New function. (aarch64_ins_sve_float_half_two): Likewise. (aarch64_ins_sve_float_zero_one): Likewise. * aarch64-asm-2.c: Regenerate. * aarch64-dis.h (ext_sve_float_half_one, ext_sve_float_half_two) (ext_sve_float_zero_one): New extractors. * aarch64-dis.c (aarch64_ext_sve_float_half_one): New function. (aarch64_ext_sve_float_half_two): Likewise. (aarch64_ext_sve_float_zero_one): Likewise. * aarch64-dis-2.c: Regenerate. gas/ * config/tc-aarch64.c (double_precision_operand_p): New function. (parse_operands): Use it to calculate the dp_p input to parse_aarch64_imm_float. Handle the new SVE FP immediate operands.

If xmalloc fails allocating memory, usually because something tried a huge allocation, like xmalloc(-1) or some such, GDB asks the user what to do: .../src/gdb/utils.c:1079: internal-error: virtual memory exhausted. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) If the user says "n", that throws a QUIT exception, which is caught by one of the multiple CATCH(RETURN_MASK_ALL) blocks somewhere up the stack. The default implementations of operator new / operator new[] call malloc directly, and on memory allocation failure throw std::bad_alloc. Currently, if that happens, since nothing catches it, the exception escapes out of main, and GDB aborts from unhandled exception. This patch replaces the default operator new variants with versions that, just like xmalloc: #1 - Raise an internal-error on memory allocation failure. #2 - Throw a QUIT gdb_exception, so that the exact same CATCH blocks continue handling memory allocation problems. A minor complication of #2 is that operator new can _only_ throw std::bad_alloc, or something that extends it: void* operator new (std::size_t size) throw (std::bad_alloc); That means that if we let a gdb QUIT exception escape from within operator new, the C++ runtime aborts due to unexpected exception thrown. So to bridge the gap, this patch adds a new gdb_quit_bad_alloc exception type that inherits both std::bad_alloc and gdb_exception, and throws _that_. If we decide that we should be catching memory allocation errors in fewer places than all the places we currently catch them (everywhere we use RETURN_MASK_ALL currently), then we could change operator new to throw plain std::bad_alloc then. But I'm considering such a change as separate matter from this one -- it'd make sense to do the same to xmalloc at the same time, for instance. Meanwhile, this allows using new/new[] instead of xmalloc/XNEW/etc. without losing the "virtual memory exhausted" internal-error safeguard. Tested on x86_64 Fedora 23. gdb/ChangeLog: 2016-09-23 Pedro Alves <palves@redhat.com> * Makefile.in (SFILES): Add common/new-op.c. (COMMON_OBS): Add common/new-op.o. (new-op.o): New rule. * common/common-exceptions.h: Include <new>. (struct gdb_quit_bad_alloc): New type. * common/new-op.c: New file. gdb/gdbserver/ChangeLog: 2016-09-23 Pedro Alves <palves@redhat.com> * Makefile.in (SFILES): Add common/new-op.c. (OBS): Add common/new-op.o. (new-op.o): New rule.

gcc-6.2.1-2.fc24.x86_64 (gdb) backtrace 10^M (gdb) FAIL: gdb.arch/i386-signal.exp: backtrace 10 (gdb) disas/s Dump of assembler code for function main: .../gdb/testsuite/gdb.arch/i386-signal.c: 30 { 0x000000000040057f <+0>: push %rbp 0x0000000000400580 <+1>: mov %rsp,%rbp 31 setup (); 0x0000000000400583 <+4>: callq 0x400590 <setup> => 0x0000000000400588 <+9>: mov $0x0,%eax 32 } 0x000000000040058d <+14>: pop %rbp 0x000000000040058e <+15>: retq End of assembler dump. The .exp patch is an obvious typo fix I think. The regex was written to accept "ADDR in main" and I find it OK as checking .debug_line validity is not the purpose of this testfile. gcc-4.8.5-11.el7.x86_64 did not put the 'mov $0x0,%eax' instruction there at all so there was no problem with .debug_line. gdb/testsuite/ChangeLog 2016-10-05 Jan Kratochvil <jan.kratochvil@redhat.com> * gdb.arch/i386-signal.exp (backtrace 10): Fix tromey#2 typo.

Even though this was supposedly in the gdb 7.2 timeframe, the testcase in PR11094 crashes current GDB with a segfault: Program received signal SIGSEGV, Segmentation fault. 0x00000000005ee894 in event_location_to_string (location=0x0) at src/gdb/location.c:412 412 if (EL_STRING (location) == NULL) (top-gdb) bt #0 0x00000000005ee894 in event_location_to_string (location=0x0) at src/gdb/location.c:412 tromey#1 0x000000000057411a in print_breakpoint_location (b=0x18288e0, loc=0x0) at src/gdb/breakpoint.c:6201 tromey#2 0x000000000057483f in print_one_breakpoint_location (b=0x18288e0, loc=0x182cf10, loc_number=0, last_loc=0x7fffffffd258, allflag=1) at src/gdb/breakpoint.c:6473 tromey#3 0x00000000005751e1 in print_one_breakpoint (b=0x18288e0, last_loc=0x7fffffffd258, allflag=1) at src/gdb/breakpoint.c:6707 tromey#4 0x000000000057589c in breakpoint_1 (args=0x0, allflag=1, filter=0x0) at src/gdb/breakpoint.c:6947 tromey#5 0x0000000000575aa8 in maintenance_info_breakpoints (args=0x0, from_tty=0) at src/gdb/breakpoint.c:7026 [...] This is GDB trying to print the location spec of the JIT event breakpoint, but that's an internal breakpoint without one. If I add a NULL check, then we see that the JIT breakpoint is now pending (because its location has shlib_disabled set): (gdb) maint info breakpoints Num Type Disp Enb Address What [...] -8 jit events keep y <PENDING> inf 1 [...] But that's incorrect. GDB should have managed to recreate the JIT breakpoint's location for the second run. So the problem is elsewhere. The problem is that if the JIT loads at the same address on the second run, we never recreate the JIT breakpoint, because we hit this early return: static int jit_breakpoint_re_set_internal (struct gdbarch *gdbarch, struct jit_program_space_data *ps_data) { [...] if (ps_data->cached_code_address == addr) return 0; [...] delete_breakpoint (ps_data->jit_breakpoint); [...] ps_data->jit_breakpoint = create_jit_event_breakpoint (gdbarch, addr); Fix this by deleting the breakpoint and discarding the cached code address when the objfile where the previous JIT breakpoint was found is deleted/unloaded in the first place. The test that was originally added for PR11094 doesn't trip on this because: tromey#1 - It doesn't test the case of the JIT descriptor's address _not_ changing between reruns. tromey#2 - And then it doesn't do "maint info breakpoints", or really anything with the JIT at all. tromey#3 - and even then, to trigger the problem the JIT descriptor needs to be in a separate library, while the current test puts it in the main program. The patch extends the test to cover all combinations of these scenarios. gdb/ChangeLog: 2016-10-06 Pedro Alves <palves@redhat.com> * jit.c (free_objfile_data): Delete the JIT breakpoint and clear the cached code address. gdb/testsuite/ChangeLog: 2016-10-06 Pedro Alves <palves@redhat.com> * gdb.base/jit-simple-dl.c: New file. * gdb.base/jit-simple-jit.c: New file, factored out from ... * gdb.base/jit-simple.c: ... this. * gdb.base/jit-simple.exp (jit_run): Delete. (build_jit): New proc. (jit_test_reread): Recompile either the main program or the shared library, depending on what is being tested. Skip changing address if caller wants to. Compare before/after addresses. If testing standalone, explicitly load the binary. Test "maint info breakpoints". (top level): Add "standalone vs shared lib" and "change address" vs "same address" axes.

Nowadays, if we build GDB with -fsanitize=address, we can get the asan error below, (gdb) quit ================================================================= ==9723==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete) on 0x60200003bf70 #0 0x7f88f3837527 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x55527) tromey#1 0xac8e13 in __gnu_cxx::new_allocator<void (*)()>::deallocate(void (**)(), unsigned long) /usr/include/c++/4.9/ext/new_allocator.h:110 tromey#2 0xac8cc2 in __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::deallocate(std::allocator<void (*)()>&, void (**)(), unsigned long) /usr/include/c++/4.9/ext/alloc_traits.h:185 .... 0x60200003bf70 is located 0 bytes inside of 8-byte region [0x60200003bf70,0x60200003bf78) allocated by thread T0 here: #0 0x7f88f38367ef in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x547ef) tromey#1 0xbd2762 in operator new(unsigned long) /home/yao/SourceCode/gnu/gdb/git/gdb/common/new-op.c:42 tromey#2 0xac8edc in __gnu_cxx::new_allocator<void (*)()>::allocate(unsigned long, void const*) /usr/include/c++/4.9/ext/new_allocator.h:104 tromey#3 0xac8d81 in __gnu_cxx::__alloc_traits<std::allocator<void (*)()> >::allocate(std::allocator<void (*)()>&, unsigned long) /usr/include/c++/4.9/ext/alloc_traits.h:182 The reason for this is that we override operator new but don't override operator delete. This patch does the override if the code is NOT compiled with asan. gdb: 2016-10-25 Yao Qi <yao.qi@linaro.org> PR gdb/20716 * common/new-op.c (__has_feature): New macro. Don't override operator new if asan is used.

Currently GDB never sends more than one action per vCont packet, when connected in non-stop mode. A follow up patch will change that, and it exposed a gdbserver problem with the vCont handling. For example, this in non-stop mode: => vCont;s:p1.1;c <= OK Should be equivalent to: => vCont;s:p1.1 <= OK => vCont;c <= OK But gdbserver currently doesn't handle this. In the latter case, "vCont;c" makes gdbserver clobber the previous step request. This patch fixes that. Note the server side must ignore resume actions for the thread that has a pending %Stopped notification (and any other threads with events pending), until GDB acks the notification with vStopped. Otherwise, e.g., the following case is mishandled: tromey#1 => g (or any other packet) tromey#2 <= [registers] tromey#3 <= %Stopped T05 thread:p1.2 tromey#4 => vCont s:p1.1;c tromey#5 <= OK Above, the server must not resume thread p1.2 when it processes the vCont. GDB can't know that p1.2 stopped until it acks the %Stopped notification. (Otherwise it wouldn't send a default "c" action.) (The vCont documentation already specifies this.) Finally, special care must also be given to handling fork/vfork events. A (v)fork event actually tells us that two processes stopped -- the parent and the child. Until we follow the fork, we must not resume the child. Therefore, if we have a pending fork follow, we must not send a global wildcard resume action (vCont;c). We can still send process-wide wildcards though. (The comments above will be added as code comments to gdb in a follow up patch.) gdb/gdbserver/ChangeLog: 2016-10-26 Pedro Alves <palves@redhat.com> * linux-low.c (handle_extended_wait): Link parent/child fork threads. (linux_wait_1): Unlink them. (linux_set_resume_request): Ignore resume requests for already-resumed and unhandled fork child threads. * linux-low.h (struct lwp_info) <fork_relative>: New field. * server.c (in_queued_stop_replies_ptid, in_queued_stop_replies): New functions. (handle_v_requests) <vCont>: Don't call require_running. * server.h (in_queued_stop_replies): New declaration.

Most of the time, the trace should be in one piece. This case is handled fine by GDB. In some cases, however, there may be gaps in the trace. They result from trace decode errors or from overflows. A gap in the trace means we lost an unknown amount of trace. Gaps can be very small, such as a few instructions in the same function, or they can be rather big. We may, for example, lose a few function calls or returns. The trace may continue in a different function and we likely don't know how we got there. Even though we can't say how the program executed across a gap, higher levels may not be impacted too much by it. Let's assume we have functions a-e and a trace that looks roughly like this: a \ b b \ / c <gap> c / d d \ / e Even though we can't say for sure, it is likely that b and c are the same function instance before and after the gap. This patch is trying to connect the c and b function segments across the gap. This will add a to the back trace of b on the right hand side. The changes are reflected in GDB's internal representation of the trace and will improve: - the output of "record function-call-history /c" - the output of "backtrace" in replay mode - source stepping in replay mode will be improved indirectly via the improved back trace I don't have an automated test for this patch; decode errors will be fixed and overflows occur sporadically and are quite rare. I tested it by hacking GDB to provoke a decode error and on the expected gap in the gdb.btrace/dlopen.exp test. The issue is that we can't predict where we will be able to re-sync in case of errors. For the expected decode error in gdb.btrace/dlopen.exp, for example, we may be able to re-sync somewhere in dlclose, in test, in main, or not at all. Here's one example run of gdb.btrace/dlopen.exp with and without this patch. (gdb) info record Active record target: record-btrace Recording format: Intel Processor Trace. Buffer size: 16kB. warning: Non-contiguous trace at instruction 66608 (offset = 0xa83, pc = 0xb7fdcc31). warning: Non-contiguous trace at instruction 66652 (offset = 0xa9b, pc = 0xb7fdcc31). warning: Non-contiguous trace at instruction 66770 (offset = 0xacb, pc = 0xb7fdcc31). warning: Non-contiguous trace at instruction 66966 (offset = 0xb60, pc = 0xb7ff5ee4). warning: Non-contiguous trace at instruction 66994 (offset = 0xb74, pc = 0xb7ff5f24). warning: Non-contiguous trace at instruction 67334 (offset = 0xbac, pc = 0xb7ff5e6d). warning: Non-contiguous trace at instruction 69022 (offset = 0xc04, pc = 0xb7ff60b3). warning: Non-contiguous trace at instruction 69116 (offset = 0xc1c, pc = 0xb7ff60b3). warning: Non-contiguous trace at instruction 69504 (offset = 0xc74, pc = 0xb7ff605d). warning: Non-contiguous trace at instruction 83648 (offset = 0xecc, pc = 0xb7ff6134). warning: Decode error (-13) at instruction 83876 (offset = 0xf48, pc = 0xb7fd6380): no memory mapped at this address. warning: Non-contiguous trace at instruction 83876 (offset = 0x11b7, pc = 0xb7ff1c70). Recorded 83948 instructions in 912 functions (12 gaps) for thread 1 (process 12996). (gdb) record instruction-history 83876, +2 83876 => 0xb7fec46f <call_init.part.0+95>: call *%eax [decode error (-13): no memory mapped at this address] [disabled] 83877 0xb7ff1c70 <_dl_close_worker.part.0+1584>: nop Without the patch, the trace is disconnected and the backtrace is short: (gdb) record goto 83876 #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 #1 0xb7fec5d0 in _dl_init () from /lib/ld-linux.so.2 #2 0xb7ff0fe3 in dl_open_worker () from /lib/ld-linux.so.2 Backtrace stopped: not enough registers or memory available to unwind further (gdb) record goto 83877 #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 #1 0xb7ff287a in _dl_close () from /lib/ld-linux.so.2 #2 0xb7fc3d5d in dlclose_doit () from /lib/libdl.so.2 #3 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #4 0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2 #5 0xb7fc3d98 in dlclose () from /lib/libdl.so.2 #6 0x0804860a in test () #7 0x08048628 in main () With the patch, GDB is able to connect the trace pieces and we get a full backtrace. (gdb) record goto 83876 #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7fec46f in call_init.part () from /lib/ld-linux.so.2 #1 0xb7fec5d0 in _dl_init () from /lib/ld-linux.so.2 #2 0xb7ff0fe3 in dl_open_worker () from /lib/ld-linux.so.2 #3 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #4 0xb7ff02e2 in _dl_open () from /lib/ld-linux.so.2 #5 0xb7fc3c65 in dlopen_doit () from /lib/libdl.so.2 #6 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #7 0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2 #8 0xb7fc3d0e in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2 #9 0xb7ff28ee in _dl_runtime_resolve () from /lib/ld-linux.so.2 #10 0x0804841c in ?? () #11 0x08048470 in dlopen@plt () #12 0x080485a3 in test () #13 0x08048628 in main () (gdb) record goto 83877 #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 (gdb) backtrace #0 0xb7ff1c70 in _dl_close_worker.part.0 () from /lib/ld-linux.so.2 #1 0xb7ff287a in _dl_close () from /lib/ld-linux.so.2 #2 0xb7fc3d5d in dlclose_doit () from /lib/libdl.so.2 #3 0xb7fec354 in _dl_catch_error () from /lib/ld-linux.so.2 #4 0xb7fc43dd in _dlerror_run () from /lib/libdl.so.2 #5 0xb7fc3d98 in dlclose () from /lib/libdl.so.2 #6 0x0804860a in test () #7 0x08048628 in main () It worked nicely in this case but it may, of course, also lead to weird connections; it is a heuristic, after all. It works best when the gap is small and the trace pieces are long. gdb/ * btrace.c (bfun_s): New typedef. (ftrace_update_caller): Print caller in debug dump. (ftrace_get_caller, ftrace_match_backtrace, ftrace_fixup_level) (ftrace_compute_global_level_offset, ftrace_connect_bfun) (ftrace_connect_backtrace, ftrace_bridge_gap, btrace_bridge_gaps): New. (btrace_compute_ftrace_bts): Pass vector of gaps. Collect gaps. (btrace_compute_ftrace_pt): Likewise. (btrace_compute_ftrace): Split into this, ... (btrace_compute_ftrace_1): ... this, and ... (btrace_finalize_ftrace): ... this. Call btrace_bridge_gaps.

… frame This patch ensures that the frame id for the current frame is stashed before that of the previous frame (to the current frame). First, it should be noted that the frame id for the current frame is not stashed by get_current_frame(). The current frame's frame id is lazily computed and stashed via calls to get_frame_id(). However, it's possible for get_prev_frame() to be called without first stashing the current frame. The frame stash is used not only to speed up frame lookups, but also to detect cycles. When attempting to compute the frame id for a "previous" frame (in get_prev_frame_if_no_cycle), a cycle is detected if the computed frame id is already in the stash. If it should happen that a previous frame id is stashed which should represent a cycle for the current frame, then an assertion failure will trigger should get_frame_id() be later called to determine the frame id for the current frame. As of late 2016, with the "Tweak meaning of VALUE_FRAME_ID" patch in place, this actually occurs when running the gdb.dwarf2/dw2-dup-frame.exp test. While attempting to generate a backtrace, the python frame filter code is invoked, leading to frame_info_to_frame_object() (in python/py-frame.c) being called. That function will potentially call get_prev_frame() before get_frame_id() is called. The call to get_prev_frame() can eventually end up in get_prev_frame_if_no_cycle() which, in turn, calls compute_frame_id(), after which the frame id is stashed for the previous frame. If the frame id for the current frame is stashed, the cycle detection code (which relies on the frame stash) in get_prev_frame_if_no_cycle() will be triggered for a cycle starting with the current frame. If the current frame's id is not stashed, the cycle detecting code can't operate as designed. Instead, when get_frame_id() is called on the current frame at some later point, the current frame's id will found to be already in the stash, triggering an assertion failure. Below is an in depth examination of the failure which lead to this change. I've shortened pathnames for brevity and readability. Here's the portion of the log file showing the failure/internal error: (gdb) break stop_frame Breakpoint 1 at 0x40059a: file dw2-dup-frame.c, line 22. (gdb) run Starting program: testsuite/outputs/gdb.dwarf2/dw2-dup-frame/dw2-dup-frame Breakpoint 1, stop_frame () at dw2-dup-frame.c:22 22 } (gdb) bt gdb/frame.c:544: internal-error: frame_id get_frame_id(frame_info*): Assertion `stashed' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: gdb.dwarf2/dw2-dup-frame.exp: backtrace from stop_frame (GDB internal error) Here's a partial backtrace from the internal error, showing the frames which I think are relevant, plus several extra to provide context: #0 internal_error ( file=0x932b98 "gdb/frame.c", line=544, fmt=0x932b20 "%s: Assertion `%s' failed.") at gdb/common/errors.c:54 #1 0x000000000072207e in get_frame_id (fi=0xe5a760) at gdb/frame.c:544 #2 0x00000000004eb50d in frame_info_to_frame_object (frame=0xe5a760) at gdb/python/py-frame.c:390 #3 0x00000000004ef5be in bootstrap_python_frame_filters (frame=0xe5a760, frame_low=0, frame_high=-1) at gdb/python/py-framefilter.c:1453 #4 0x00000000004ef7a9 in gdbpy_apply_frame_filter ( extlang=0x8857e0 <extension_language_python>, frame=0xe5a760, flags=7, args_type=CLI_SCALAR_VALUES, out=0xf6def0, frame_low=0, frame_high=-1) at gdb/python/py-framefilter.c:1548 #5 0x00000000005f2c5a in apply_ext_lang_frame_filter (frame=0xe5a760, flags=7, args_type=CLI_SCALAR_VALUES, out=0xf6def0, frame_low=0, frame_high=-1) at gdb/extension.c:572 #6 0x00000000005ea896 in backtrace_command_1 (count_exp=0x0, show_locals=0, no_filters=0, from_tty=1) at gdb/stack.c:1834 Examination of the code in frame_info_to_frame_object(), which is in python/py-frame.c, is key to understanding this problem: if (get_prev_frame (frame) == NULL && get_frame_unwind_stop_reason (frame) != UNWIND_NO_REASON && get_next_frame (frame) != NULL) { frame_obj->frame_id = get_frame_id (get_next_frame (frame)); frame_obj->frame_id_is_next = 1; } else { frame_obj->frame_id = get_frame_id (frame); frame_obj->frame_id_is_next = 0; } I will first note that the frame id for frame has not been computed yet. (This was verified by placing a breakpoint on compute_frame_id().) The call to get_prev_frame() causes the the frame id to (eventually) be computed for the previous frame. Here's a backtrace showing how we get there: #0 compute_frame_id (fi=0x10e2810) at gdb/frame.c:496 #1 0x0000000000724a67 in get_prev_frame_if_no_cycle (this_frame=0xe5a760) at gdb/frame.c:1871 #2 0x0000000000725136 in get_prev_frame_always_1 (this_frame=0xe5a760) at gdb/frame.c:2045 #3 0x000000000072516b in get_prev_frame_always (this_frame=0xe5a760) at gdb/frame.c:2061 #4 0x000000000072570f in get_prev_frame (this_frame=0xe5a760) at gdb/frame.c:2303 #5 0x00000000004eb471 in frame_info_to_frame_object (frame=0xe5a760) at gdb/python/py-frame.c:381 For this particular case, we end up in the else clause of the code above which calls get_frame_id (frame). It's at this point that the frame id for frame is computed. Again, here's a backtrace: #0 compute_frame_id (fi=0xe5a760) at gdb/frame.c:496 #1 0x000000000072203d in get_frame_id (fi=0xe5a760) at gdb/frame.c:539 #2 0x00000000004eb50d in frame_info_to_frame_object (frame=0xe5a760) at gdb/python/py-frame.c:390 The test in question, dw2-dup-frame.exp, deliberately creates a broken (cyclic) stack. So, in this instance, the frame id for the prev `frame' will be the same as that for `frame'. But that particular frame id ended up in the stash during the previous frame operation. When, just a few lines later, we compute the frame id for `frame', the id in question is already in the stash, thus triggering the assertion failure. I considered two other solutions to solving this problem: We could prevent get_prev_frame() from being called before get_frame_id() in frame_info_to_frame_object(). (See above for the snippet of code where this happens.) A call to get_frame_id (frame) could be placed ahead of that code snippet above. I have tested this approach and, while it does work, I can't be certain that get_prev_frame() isn't called ahead of stashing the current frame somewhere else in GDB, but in a less obvious way. Another approach is to stash the current frame's id by calling get_frame_id() in get_current_frame(). This approach is conceptually simpler, but when importing a python unwinder, has the unwelcome side effect of causing the unwinder to be called during import. A cleaner looking fix would be to place this code after code corresponding to the "Don't compute the frame id of the current frame yet..." comment in get_prev_frame_if_no_cycle(). Sadly, this does not work though; by the time we get to this point, the frame state for the prev frame has been modified just enough to cause an internal error to occur when attempting to compute the (current) frame id for inline frames. (The unexpected failure count increases by roughly 130 failures.) Therefore, I decided to place it as early as possible in get_prev_frame(). gdb/ChangeLog: * frame.c (get_prev_frame): Stash frame id for current frame prior to computing frame id for previous frame.

This patch fixes a few problems with GDB's time handling. #1 - It avoids problems with gnulib's C++ namespace support On MinGW, the struct timeval that should be passed to gnulib's gettimeofday replacement is incompatible with libiberty's timeval_sub/timeval_add. That's because gnulib also replaces "struct timeval" with its own definition, while libiberty expects the system's. E.g., in code like this: gettimeofday (&prompt_ended, NULL); timeval_sub (&prompt_delta, &prompt_ended, &prompt_started); timeval_add (&prompt_for_continue_wait_time, &prompt_for_continue_wait_time, &prompt_delta); That's currently handled in gdb by not using gnulib's gettimeofday at all (see common/gdb_sys_time.h), but that #undef hack won't work with if/when we enable gnulib's C++ namespace support, because that mode adds compile time warnings for uses of ::gettimeofday, which are hard errors with -Werror. #2 - But there's an elephant in the room: gettimeofday is not monotonic... We're using it to: a) check how long functions take, for performance analysis b) compute when in the future to fire events in the event-loop c) print debug timestamps But that's exactly what gettimeofday is NOT meant for. Straight from the man page: ~~~ The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2). ~~~ std::chrono (part of the C++11 standard library) has a monotonic clock exactly for such purposes (std::chrono::steady_clock). This commit switches to use that instead of gettimeofday, fixing all the issues mentioned above. gdb/ChangeLog: 2016-11-23 Pedro Alves <palves@redhat.com> * Makefile.in (SFILES): Add common/run-time-clock.c. (HFILES_NO_SRCDIR): Add common/run-time-clock.h. (COMMON_OBS): Add run-time-clock.o. * common/run-time-clock.c, common/run-time-clock.h: New files. * defs.h (struct timeval, print_transfer_performance): Delete declarations. * event-loop.c (struct gdb_timer) <when>: Now a std::chrono::steady_clock::time_point. (create_timer): use std::chrono::steady_clock instead of gettimeofday. Use new instead of malloc. (delete_timer): Use delete instead of xfree. (duration_cast_timeval): New. (update_wait_timeout): Use std::chrono::steady_clock instead of gettimeofday. * maint.c: Include <chrono> instead of "gdb_sys_time.h", <time.h> and "timeval-utils.h". (scoped_command_stats::~scoped_command_stats) (scoped_command_stats::scoped_command_stats): Use std::chrono::steady_clock instead of gettimeofday. Use user_cpu_time_clock instead of get_run_time. * maint.h: Include "run-time-clock.h" and <chrono>. (scoped_command_stats): <m_start_cpu_time>: Now a user_cpu_time_clock::time_point. <m_start_wall_time>: Now a std::chrono::steady_clock::time_point. * mi/mi-main.c: Include "run-time-clock.h" and <chrono> instead of "gdb_sys_time.h" and <sys/resource.h>. (rusage): Delete. (mi_execute_command): Use new instead of XNEW. (mi_load_progress): Use std::chrono::steady_clock instead of gettimeofday. (timestamp): Rewrite in terms of std::chrono::steady_clock, user_cpu_time_clock and system_cpu_time_clock. (timeval_diff): Delete. (print_diff): Adjust to use std::chrono::steady_clock, user_cpu_time_clock and system_cpu_time_clock. * mi/mi-parse.h: Include "run-time-clock.h" and <chrono> instead of "gdb_sys_time.h". (struct mi_timestamp): Change fields types to std::chrono::steady_clock::time_point, user_cpu_time_clock::time and system_cpu_time_clock::time_point, instead of struct timeval. * symfile.c: Include <chrono> instead of <time.h> and "gdb_sys_time.h". (struct time_range): New. (generic_load): Use std::chrono::steady_clock instead of gettimeofday. (print_transfer_performance): Replace timeval parameters with a std::chrono::steady_clock::duration parameter. Adjust. * utils.c: Include <chrono> instead of "timeval-utils.h", "gdb_sys_time.h", and <time.h>. (prompt_for_continue_wait_time): Now a std::chrono::steady_clock::duration. (defaulted_query, prompt_for_continue): Use std::chrono::steady_clock instead of gettimeofday/timeval_sub/timeval_add. (reset_prompt_for_continue_wait_time): Use std::chrono::steady_clock::duration instead of struct timeval. (get_prompt_for_continue_wait_time): Return a std::chrono::steady_clock::duration instead of struct timeval. (vfprintf_unfiltered): Use std::chrono::steady_clock instead of gettimeofday. Use std::string. Use '.' instead of ':'. * utils.h: Include <chrono>. (get_prompt_for_continue_wait_time): Return a std::chrono::steady_clock::duration instead of struct timeval. gdb/gdbserver/ChangeLog: 2016-11-23 Pedro Alves <palves@redhat.com> * debug.c: Include <chrono> instead of "gdb_sys_time.h". (debug_vprintf): Use std::chrono::steady_clock instead of gettimeofday. Use '.' instead of ':'. * tracepoint.c: Include <chrono> instead of "gdb_sys_time.h". (get_timestamp): Use std::chrono::steady_clock instead of gettimeofday.

…binations This adds a test that exposes several problems fixed by earlier patches: #1 - Buffer overrun when host/target formats match, but sizes don't. https://sourceware.org/ml/gdb-patches/2016-03/msg00125.html #2 - Missing handling for FR-V FR300. https://sourceware.org/ml/gdb-patches/2016-03/msg00117.html #3 - BFD architectures with spaces in their names (v850). https://sourceware.org/ml/binutils/2016-03/msg00108.html #4 - The OS ABI names with spaces issue. https://sourceware.org/ml/gdb-patches/2016-03/msg00116.html #5 - Bogus HP/PA long double format. https://sourceware.org/ml/gdb-patches/2016-03/msg00122.html #6 - Cris big endian internal error. https://sourceware.org/ml/gdb-patches/2016-03/msg00126.html #7 - Several PowerPC bfd archs/machines not handled by gdb. https://sourceware.org/bugzilla/show_bug.cgi?id=19797 And hopefully helps catch others in the future. This started out as a test that simply did, gdb -ex "print 1.0L" to exercise #1 above. Then to cover both 32-bit target / 64-bit host and the converse, I thought of having the testcase print the floats twice, once with the architecture set to "i386" and then to "i386:x86-64". This way it wouldn't matter whether gdb was built as 32-bit or a 64-bit program. Then I thought that other archs might have similar host/target floatformat conversion issues as well. Instead of hardcoding some architectures in the test file, I thought we could just iterate over all bfd architectures and OS ABIs supported by the gdb build being tested. This is what then exposed all the other problems listed above... With an --enable-targets=all, this exercises over 14 thousand combinations. If left in a single test file, it all consistenly runs in under a minute on my machine (An Intel i7-4810MQ @ 2.8 MHZ running Fedora 23). Split in 8 chunks, as in this commit, it runs in around 25 seconds, with make -j8. To avoid flooding the gdb.sum file, it avoids calling "pass" on each tested combination/iteration. I'm explicitly not implementing that by passing an empty message to gdb_test / gdb_test_multiple, because I still want a FAIL to be logged in gdb.sum. So instead this puts the internal passes in the gdb.log file, only, prefixed "IPASS:", for internal pass. TBC, if some iteration fails, it'll still show up as FAIL in gdb.sum. If this is an approach that takes on, I can see us extending the common bits to support it for all testcases. gdb/testsuite/ChangeLog: 2016-12-09 Pedro Alves <palves@redhat.com> * gdb.base/all-architectures-0.exp: New file. * gdb.base/all-architectures-1.exp: New file. * gdb.base/all-architectures-2.exp: New file. * gdb.base/all-architectures-3.exp: New file. * gdb.base/all-architectures-4.exp: New file. * gdb.base/all-architectures-5.exp: New file. * gdb.base/all-architectures-6.exp: New file. * gdb.base/all-architectures-7.exp: New file. * gdb.base/all-architectures.exp.in: New file.

It was pointed out on the mailing list[1] that after this commit: commit b1e0126 Date: Wed Jun 21 14:18:54 2023 +0100 gdb: don't resume vfork parent while child is still running the test gdb.base/vfork-follow-parent.exp now has some failures when run with the native-gdbserver or native-extended-gdbserver boards: FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to end of inferior 2 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to break_parent (timeout) The reason that these failures don't show up when run on the standard unix board is that the test is only run in the default operating mode, so for Linux this will be all-stop on top of non-stop. If we adjust the test script so that it runs in the default mode and with target-non-stop turned off, then we see the same failures on the unix board. This commit includes this change. The way that the test is written means that it is not (currently) possible to turn on non-stop mode and have the test still work, so this commit does not do that. I have also updated the test script so that the vfork child performs an exec as well as the current exit. Exec and exit are the two ways in which a vfork child can release the vfork parent, so testing both of these cases is useful I think. In this test the inferior performs a vfork and the vfork-child immediately exits. The vfork-parent will wait for the vfork-child and then blocks waiting for gdb. Once gdb has released the vfork-parent, the vfork-parent also exits. In the test that fails, GDB sets 'detach-on-fork off' and then runs to the vfork. At this point the test tries to just "continue", but this fails as the vfork-parent is still selected, and the parent can't continue until the vfork-child completes. As the vfork-child is stopped by GDB the parent will never stop once resumed, so GDB refuses to resume it. The test script then sets 'schedule-multiple on' and once again continues. This time GDB, in theory, resumes both the parent and the child, the parent will be held blocked by the kernel, but the child will run until it exits, and which point GDB stops again, this time with inferior 2, the newly exited vfork-child, selected. What happens after this in the test script is irrelevant as far as this failure is concerned. To understand why the test started failing we should consider the behaviour of four different cases: 1. All-stop-on-non-stop before commit b1e0126, 2. All-stop-on-non-stop after commit b1e0126, 3. All-stop-on-all-stop before commit b1e0126, and 4. All-stop-on-all-stop after commit b1e0126. Only case #4 is failing after commit b1e0126, but I think the other cases are interesting because, (a) they inform how we might fix the regression, and (b) it turns out the behaviour of #2 changed too with the commit, but the change was harmless. For #1 All-stop-on-non-stop before commit b1e0126, what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-non-stop, every thread is resumed individually, so GDB tries to resume both the vfork-parent and the vfork-child, both of which succeed, 3. The vfork-parent is held stopped by the kernel, 4. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 5. At this point we might take two paths depending on which event GDB handles first, if GDB handles the VFORK_DONE first then: (a) As GDB is controlling both parent and child the VFORK_DONE is ignored (see handle_vfork_done), the vfork-parent will be resumed, (b) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. Alternatively, if GDB selects the EXITED event first then: (c) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. (d) At some future time the user resumes the vfork-parent, at which point the VFORK_DONE is reported to GDB, however, GDB is ignoring the VFORK_DONE (see handle_vfork_done), so the parent is resumed. For case #2, all-stop-on-non-stop after commit b1e0126, the important difference is in step (2) above, now, instead of resuming both the vfork-parent and the vfork-child, only the vfork-child is resumed. As such, when we get to step (5), only a single event, the EXITED event is reported. GDB handles the EXITED just as in (5)(c), then, later, when the user resumes the vfork-parent, the VFORKED_DONE is immediately delivered from the kernel, but this is ignored just as in (5)(d), and so, though the pattern of when the vfork-parent is resumed changes, the overall pattern of which events are reported and when, doesn't actually change. In fact, by not resuming the vfork-parent, the order of events (in this test) is now deterministic, which (maybe?) is a good thing. If we now consider case #3, all-stop-on-all-stop before commit b1e0126, then what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-all-stop, the resume is passed down to the linux-nat target, the vfork-parent is the event thread, while the vfork-child is a sibling of the event thread, 3. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this causes the vfork-child to be resumed. Then in linux_nat_target::resume, the event thread, the vfork-parent, is also resumed. 4. The vfork-parent is held stopped by the kernel, 5. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 6. We are now in a situation identical to step (5) as for all-stop-on-non-stop above, GDB selects one of the events to handle, and whichever we select the user sees the correct behaviour. And so, finally, we can consider #4, all-stop-on-all-stop after commit b1e0126, this is the case that started failing. We start out just like above, in proceed, the resume_ptid is -1 (resume everything), due to schedule multiple being on. And just like above, due to the target being all-stop, we call proceed_resume_thread_checked just once, for the current thread, which, remember, is the vfork-parent thread. The change in commit b1e0126 was to avoid resuming a vfork-parent thread, read the commit message for the justification for this change. However, this means that GDB now rejects resuming the vfork-parent in this case, which means that nothing gets resumed! Obviously, if nothing resumes, then nothing will ever stop, and so GDB appears to hang. I considered a couple of solutions which, in the end, I didn't go with, these were: 1. Move the vfork-parent check out of proceed_resume_thread_checked, and place it in proceed, but only on the all-stop-on-non-stop path, this should still address the issue seen in b1e0126, but would avoid the issue seen here. I rejected this just because it didn't feel great to split the checks that exist in proceed_resume_thread_checked like this, 2. Extend the condition in proceed_resume_thread_checked by adding a target_is_non_stop_p check. This would have the same effect as idea 1, but leaves all the checks in the same place, which I think would be better, but this still just didn't feel right to me, and so, What I noticed was that for the all-stop-on-non-stop, after commit b1e0126, we only resumed the vfork-child, and this seems fine. The vfork-parent isn't going to run anyway (the kernel will hold it back), so if feels like we there's no harm in just waiting for the child to complete, and then resuming the parent. So then I started looking at follow_fork, which is called from the top of proceed. This function already has the task of switching between the parent and child based on which the user wishes to follow. So, I wondered, could we use this to switch to the vfork-child in the case that we are attached to both? Turns out this is pretty simple to do. Having done that, now the process is for all-stop-on-all-stop after commit b1e0126, and with this new fix is: 1. GDB calls proceed with the vfork-parent selected, but, 2. In follow_fork, and follow_fork_inferior, GDB switches the selected thread to be that of the vfork-child, 3. Back in proceed user_visible_resume_ptid returns -1 (everything) as the resume_ptid still, but now, 4. When GDB calls proceed_resume_thread_checked, the vfork-child is the current selected thread, this is not a vfork-parent, and so GDB allows the proceed to continue to the linux-nat target, 5. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this does not resume the vfork-parent (because it is a vfork-parent), and then the vfork-child is resumed as this is the event thread, At this point we are back in the same situation as for all-stop-on-non-stop after commit b1e0126, that is, the vfork-child is resumed, while the vfork-parent is held stopped by GDB. Eventually the vfork-child will exit or exec, at which point the vfork-parent will be resumed. [1] https://inbox.sourceware.org/gdb-patches/3e1e1db0-13d9-dd32-b4bb-051149ae6e76@simark.ca/

After running a number of programs under Windows gdb and detaching them, I typed run in gdb, and got a hang, here: (top-gdb) bt #0 sharing_input_terminal (pid=4672) at /home/pedro/gdb/src/gdb/mingw-hdep.c:388 #1 0x00007ff71a2d8678 in sharing_input_terminal (inf=0x23bf23dafb0) at /home/pedro/gdb/src/gdb/inflow.c:269 #2 0x00007ff71a2d887b in child_terminal_save_inferior (self=0x23bf23de060) at /home/pedro/gdb/src/gdb/inflow.c:423 #3 0x00007ff71a2c80c0 in inf_child_target::terminal_save_inferior (this=0x23bf23de060) at /home/pedro/gdb/src/gdb/inf-child.c:111 #4 0x00007ff71a429c0f in target_terminal_is_ours_kind (desired_state=target_terminal_state::is_ours_for_output) at /home/pedro/gdb/src/gdb/target.c:1037 #5 0x00007ff71a429e02 in target_terminal::ours_for_output () at /home/pedro/gdb/src/gdb/target.c:1094 #6 0x00007ff71a2ccc8e in post_create_inferior (from_tty=0) at /home/pedro/gdb/src/gdb/infcmd.c:245 #7 0x00007ff71a2cd431 in run_command_1 (args=0x0, from_tty=0, run_how=RUN_NORMAL) at /home/pedro/gdb/src/gdb/infcmd.c:502 #8 0x00007ff71a2cd58b in run_command (args=0x0, from_tty=0) at /home/pedro/gdb/src/gdb/infcmd.c:527 The problem is that the loop around GetConsoleProcessList looped forever, because there were exactly 10 processes to return. GetConsoleProcessList's documentation says: If the buffer is too small to hold all the valid process identifiers, the return value is the required number of array elements. The function will have stored no identifiers in the buffer. In this situation, use the return value to allocate a buffer that is large enough to store the entire list and call the function again. In this case, the buffer wasn't too small, it was exactly the right size, so we should have broken out of the loop. We didn't due to a "<" check that should have been "<=". That is fixed by this patch. Approved-By: Tom Tromey <tom@tromey.com> Reviewed-By: Eli Zaretskii <eliz@gnu.org> Change-Id: I14e4909f2ac2fa83d0d9b6e64418b5831ac4e4e3

When running test-case gdb.base/add-symbol-file-attach.exp with target board unix/-m32, we run into: ... (gdb) attach 3955^M Attaching to process 3955^M Load new symbol table from "add-symbol-file-attach"? (y or n) y^M Reading symbols from add-symbol-file-attach/add-symbol-file-attach...^M Reading symbols from /lib/libm.so.6...^M Reading symbols from /usr/lib/debug/lib/libm-2.31.so-i386.debug...^M Reading symbols from /lib/libc.so.6...^M Reading symbols from /usr/lib/debug/lib/libc-2.31.so-i386.debug...^M Reading symbols from /lib/ld-linux.so.2...^M Reading symbols from /usr/lib/debug/lib/ld-2.31.so-i386.debug...^M 0xf7f53549 in __kernel_vsyscall ()^M (gdb) FAIL: gdb.base/add-symbol-file-attach.exp: attach ... The test fails because this regexp is used: ... -re ".*in \[_A-Za-z0-9\]*pause.*$gdb_prompt $" { ... The regexp attempts to detect that the exec is somewhere in pause (): ... int main (int argc, char **argv) { pause (); return 0; } ... but when the exec is blocked in pause, the backtrace is: ... (gdb) bt #0 0xf7fd2549 in __kernel_vsyscall () #1 0xf7d84966 in __libc_pause () at ../sysdeps/unix/sysv/linux/pause.c:29 #2 0x0804844c in main (argc=1, argv=0xffffce84) at /data/vries/gdb/src/gdb/testsuite/gdb.base/add-symbol-file-attach.c:26 ... We could simply extend the regexp to also match __kernel_vsyscall, but the more fundamental problem is that the test is racy. The attach can happen before the exec is blocked in pause (), somewhere in the dynamic linker resolving the call to pause, in main or even earlier. Note that for the test-case to be effective, the exec is not required to be in pause (). I added a "while (1);" loop at the start of main, reverted the patch fixing the corresponding PR and reproduced the problem it's supposed to detect. Fix this by simply matching the "Reading symbols from" line, similar to what an earlier test is doing. While we're at it, rewrite the earlier test to also use the -wrap idiom. Tested on x86_64-linux.

This commit fixes an issue that was discovered while writing the tests for the previous commit. I noticed that, when GDB restarts an inferior, the executable_changed event would trigger twice. The first notification would originate from: #0 exec_file_attach (filename=0x4046680 "/tmp/hello.x", from_tty=0) at ../../src/gdb/exec.c:513 #1 0x00000000006f3adb in reopen_exec_file () at ../../src/gdb/corefile.c:122 #2 0x0000000000e6a3f2 in generic_mourn_inferior () at ../../src/gdb/target.c:3682 #3 0x0000000000995121 in inf_child_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/inf-child.c:192 #4 0x0000000000995cff in inf_ptrace_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/inf-ptrace.c:125 #5 0x0000000000a32472 in linux_nat_target::mourn_inferior (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3609 #6 0x0000000000e68a40 in target_mourn_inferior (ptid=...) at ../../src/gdb/target.c:2761 #7 0x0000000000a323ec in linux_nat_target::kill (this=0x2fe95c0 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3593 #8 0x0000000000e64d1c in target_kill () at ../../src/gdb/target.c:924 #9 0x00000000009a19bc in kill_if_already_running (from_tty=1) at ../../src/gdb/infcmd.c:328 #10 0x00000000009a1a6f in run_command_1 (args=0x0, from_tty=1, run_how=RUN_STOP_AT_MAIN) at ../../src/gdb/infcmd.c:381 #11 0x00000000009a20a5 in start_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:527 #12 0x000000000068dc5d in do_simple_func (args=0x0, from_tty=1, c=0x35c7200) at ../../src/gdb/cli/cli-decode.c:95 While the second originates from: #0 exec_file_attach (filename=0x3d7a1d0 "/tmp/hello.x", from_tty=0) at ../../src/gdb/exec.c:513 #1 0x0000000000dfe525 in reread_symbols (from_tty=1) at ../../src/gdb/symfile.c:2517 #2 0x00000000009a1a98 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_STOP_AT_MAIN) at ../../src/gdb/infcmd.c:398 #3 0x00000000009a20a5 in start_command (args=0x0, from_tty=1) at ../../src/gdb/infcmd.c:527 #4 0x000000000068dc5d in do_simple_func (args=0x0, from_tty=1, c=0x35c7200) at ../../src/gdb/cli/cli-decode.c:95 In the first case the call to exec_file_attach first passes through reopen_exec_file. The reopen_exec_file performs a modification time check on the executable file, and only calls exec_file_attach if the executable has changed on disk since it was last loaded. However, in the second case things work a little differently. In this case GDB is really trying to reread the debug symbol. As such, we iterate over the objfiles list, and for each of those we check the modification time, if the file on disk has changed then we reload the debug symbols from that file. However, there is an additional check, if the objfile has the same name as the executable then we will call exec_file_attach, but we do so without checking the cached modification time that indicates when the executable was last reloaded, as a result, we reload the executable twice. In this commit I propose that reread_symbols be changed to unconditionally call reopen_exec_file before performing the objfile iteration. This will ensure that, if the executable has changed, then the executable will be reloaded, however, if the executable has already been recently reloaded, we will not reload it for a second time. After handling the executable, GDB can then iterate over the objfiles list and reload them in the normal way. With this done I now see the executable reloaded only once when GDB restarts an inferior, which means I can remove the kfail that I added to the gdb.python/py-exec-file.exp test in the previous commit. Approved-By: Tom Tromey <tom@tromey.com>

It was pointed out on the mailing list that a recently added test (gdb.python/py-progspace-events.exp) was failing when run with the native-extended-gdbserver board. This test was added with this commit: commit 59912fb Date: Tue Sep 19 11:45:36 2023 +0100 gdb: add Python events for program space addition and removal It turns out though that the test is failing due to a existing bug in GDB, the new test just exposes the problem. Additionally, the failure really doesn't even rely on the new functionality added in the above commit. I reduced the test to a simple set of steps that reproduced the failure and tested against GDB 13, and the test passes; so the bug was introduced since then. In fact, the bug was introduced with this commit: commit a282736 Date: Fri Sep 8 15:48:16 2023 +0100 gdb: remove final user of the executable_changed observer This commit changed how the per-inferior auxv data cache is managed, specifically, when the cache is cleared, and it is this that leads to the failure. This bug is interesting because it exposes a number of issues with GDB, I'll explain all of the problems I see, though ultimately, I only propose fixing one problem in this commit, which is enough to resolve the crash we are currently seeing. The crash that we are seeing manifests like this: ... [Inferior 2 (process 3970384) exited normally] +inferior 1 [Switching to inferior 1 [process 3970383] (/tmp/build/gdb/testsuite/outputs/gdb.python/py-progspace-events/py-progspace-events)] [Switching to thread 1.1 (Thread 3970383.3970383)] #0 breakpt () at /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-progspace-events.c:28 28 { /* Nothing. */ } (gdb) step +step terminate called after throwing an instance of 'gdb_exception_error' Fatal signal: Aborted ... etc ... What's happening is that GDB attempts to refill the auxv cache as a result of the gdbarch_has_shared_address_space call in program_space::~program_space, the backtrace looks like this: #0 0x00007fb4f419a9a5 in raise () from /lib64/libpthread.so.0 #1 0x00000000008b635d in handle_fatal_signal (sig=6) at ../../src/gdb/event-top.c:912 #2 <signal handler called> #3 0x00007fb4f38e3625 in raise () from /lib64/libc.so.6 #4 0x00007fb4f38cc8d9 in abort () from /lib64/libc.so.6 #5 0x00007fb4f3c70756 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6 #6 0x00007fb4f3c7c6dc in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 #7 0x00007fb4f3c7b6e9 in __cxa_call_terminate () from /lib64/libstdc++.so.6 #8 0x00007fb4f3c7c094 in __gxx_personality_v0 () from /lib64/libstdc++.so.6 #9 0x00007fb4f3a80c63 in _Unwind_RaiseException_Phase2 () from /lib64/libgcc_s.so.1 #10 0x00007fb4f3a8154e in _Unwind_Resume () from /lib64/libgcc_s.so.1 #11 0x0000000000e8832d in target_read_alloc_1<unsigned char> (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2266 #12 0x0000000000e73dea in target_read_alloc (ops=0x408a3a0, object=TARGET_OBJECT_AUXV, annex=0x0) at ../../src/gdb/target.c:2315 #13 0x000000000058248c in target_read_auxv_raw (ops=0x408a3a0) at ../../src/gdb/auxv.c:379 #14 0x000000000058243d in target_read_auxv () at ../../src/gdb/auxv.c:368 #15 0x000000000058255c in target_auxv_search (match=0x0, valp=0x7ffdee17c598) at ../../src/gdb/auxv.c:415 #16 0x0000000000a464bb in linux_is_uclinux () at ../../src/gdb/linux-tdep.c:433 #17 0x0000000000a464f6 in linux_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/linux-tdep.c:440 #18 0x0000000000510eae in gdbarch_has_shared_address_space (gdbarch=0x409a2d0) at ../../src/gdb/gdbarch.c:4889 #19 0x0000000000bc7558 in program_space::~program_space (this=0x4544aa0, __in_chrg=<optimized out>) at ../../src/gdb/progspace.c:124 #20 0x00000000009b245d in delete_inferior (inf=0x47b3de0) at ../../src/gdb/inferior.c:290 #21 0x00000000009b2c10 in prune_inferiors () at ../../src/gdb/inferior.c:480 #22 0x00000000009c5e3e in fetch_inferior_event () at ../../src/gdb/infrun.c:4558 #23 0x000000000099b4dc in inferior_event_handler (event_type=INF_REG_EVENT) at ../../src/gdb/inf-loop.c:42 #24 0x0000000000cbc64f in remote_async_serial_handler (scb=0x4090a30, context=0x408a6b0) at ../../src/gdb/remote.c:14859 #25 0x0000000000d83d3a in run_async_handler_and_reschedule (scb=0x4090a30) at ../../src/gdb/ser-base.c:138 #26 0x0000000000d83e1f in fd_event (error=0, context=0x4090a30) at ../../src/gdb/ser-base.c:189 So this is problem #1, if we throw an exception while deleting a program_space then this is not caught, and is going to crash GDB. Problem #2 becomes evident when we ask why GDB is throwing an error in this case; the error is thrown because the remote target, operating in non-async mode, can't read the auxv data while an inferior is running and GDB is waiting for a stop reply. The problem here then, is why does GDB get into a position where it tries to interact with the remote target in this way, at this time? The problem is caused by the prune_inferiors call which can be seen in the above backtrace. In prune_inferiors we check if the inferior is deletable, and if it is, we delete it. The problem is, I think, we should also check if the target is currently in a state that would allow us to delete the inferior. We don't currently have such a check available, we'd need to add one, but for the remote target, this would return false if the remote is in async mode and the remote is currently waiting for a stop reply. With this change in place GDB would defer deleting the inferior until the remote target has stopped, at which point GDB would be able to refill the auxv cache successfully. And then, problem #3 becomes evident when we ask why GDB is needing to refill the auxv cache now when it didn't need to for GDB 13. This is where the second commit mentioned above (a282736) comes in. Prior to this commit, the auxv cache was cleared by the executable_changed observer, while after that commit the auxv cache was cleared by the new_objfile observer -- but only when the new_objfile observer is used in the special mode that actually means that all objfiles have been unloaded (I know, the overloading of the new_objfile observer is horrible, and unnecessary, but it's not really important for this bug). The difference arises because the new_objfile observer is triggered from clear_symtab_users, which in turn is called from program_space::~program_space. The new_objfile observer for auxv does this: static void auxv_new_objfile_observer (struct objfile *objfile) { if (objfile == nullptr) invalidate_auxv_cache_inf (current_inferior ()); } That is, when all the objfiles are unloaded, we clear the auxv cache for the current inferior. The problem is, then when we look at the prune_inferiors -> delete_inferior -> ~program_space path, we see that the current inferior is not going to be an inferior that exists within the program_space being deleted; delete_inferior removes the deleted inferior from the global inferior list, and then only deletes the program_space if program_space::empty() returns true, which is only the case if the current inferior isn't within the program_space to delete, and no other inferior exists within that program_space either. What this means is that when the new_objfile observer is called we can't rely on the current inferior having any relationship with the program space in which the objfiles were removed. This was an error in the commit a282736, the only thing we can rely on is the current program space. As a result of this mistake, after commit a282736, GDB was sometimes clearing the auxv cache for a random inferior. In the native target case this was harmless as we can always refill the cache when needed, but in the remote target case, if we need to refill the cache when the remote target is executing, then we get the crash we observed. And additionally, if we think about this a little more, we see that commit a282736 made another mistake. When all the objfiles are removed, they are removed from a program_space, a program_space might contain multiple inferiors, so surely, we should clear the auxv cache for all of the matching inferiors? Given these two insights, that the current_inferior is not relevant, only the current_program_space, and that we should be clearing the cache for all inferiors in the current_program_space, we can update auxv_new_objfile_observer to: if (objfile == nullptr) { for (inferior *inf : all_inferiors ()) { if (inf->pspace == current_program_space) invalidate_auxv_cache_inf (inf); } } With this change we now correctly clear the auxv cache for the correct inferiors, and GDB no longer needs to refill the cache at an inconvenient time, this avoids the crash we were seeing. And finally, we reach problem #4. Inspired by the observation that using the current_inferior from within the ~program_space function was not correct, I added some debug to see if current_inferior() was called anywhere else (below ~program_space), and the answer is yes, it's called a often. Mostly the culprit is GDB doing: current_inferior ()->top_target ()-> .... But I think all of these calls are most likely doing the wrong thing, and only work because the top target in all these cases is shared between all inferiors, e.g. it's the native target, or the remote target for all inferiors. But if we had a truly multi-connection setup, then we might start to see odd behaviour. Problem #1 I'm just ignoring for now, I guess at some point we might run into this again, and then we'd need to solve this. But in this case I wasn't sure what a "good" solution would look like. We need the auxv data in order to implement the linux_is_uclinux() function. If we can't get the auxv data then what should we do, assume yes, or assume no? The right answer would probably be to propagate the error back up the stack, but then we reach ~program_space, and throwing exceptions from a destructor is problematic, so we'd need to catch and deal at this point. The linux_is_uclinux() call is made from within gdbarch_has_shared_address_space(), which is used like: if (!gdbarch_has_shared_address_space (target_gdbarch ())) delete this->aspace; So, we would have to choose; delete the address space or not. If we delete it on error, then we might delete an address space that is shared within another program space. If we don't delete the address space, then we might leak it. Neither choice is great. A better solution might be to have the address spaces be reference counted, then we could remove the gdbarch_has_shared_address_space call completely, and just rely on the reference count to auto-delete the address space when appropriate. The solution for problem #2 I already hinted at above, we should have a new target_can_delete_inferiors() call, which should be called from prune_inferiors, this would prevent GDB from trying to delete inferiors when a (remote) target is in a state where we know it can't delete the inferior. Deleting an inferior often (always?) requires sending packets to the remote, and if the remote is waiting for a stop reply then this will never work, so the pruning should be deferred in this case. The solution for problem #3 is included in this commit. And, for problem #4, I'm not sure what the right solution is. Maybe delete_inferior should ensure the inferior to be deleted is in place when ~program_space is called? But that seems a little weird, as the current inferior would, in theory, still be using the current program_space... Anyway, after this commit, the gdb.python/py-progspace-events.exp test now passes when run with the native-extended-remote board. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30935 Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I41f0e6e2d7ecc1e5e55ec170f37acd4052f46eaf

Overview ======== Consider the following situation, GDB is in non-stop mode, the main thread is running while a second thread is stopped. The user has the second thread selected as the current thread and asks GDB to detach. At the exact moment of detach the main thread exits. This situation currently causes crashes, assertion failures, and unexpected errors to be reported from GDB for both native and remote targets. This commit addresses this situation for native and remote targets. There are a number of different fixes, but all are required in order to get this functionality working correct for native and remote targets. Native Linux Target =================== For the native Linux target, detaching is handled in the function linux_nat_target::detach. In here we call stop_wait_callback for each thread, and it is this callback that will spot that the main thread has exited. GDB then detaches from everything except the main thread by calling detach_callback. After this the first problem is this assert: /* Only the initial process should be left right now. */ gdb_assert (num_lwps (pid) == 1); The num_lwps call will return 0 as the main thread has exited and all of the other threads have now been detached. I fix this by changing the assert to allow for 0 or 1 lwps at this point. As the 0 case can only happen in non-stop mode, the assert becomes: gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); The next problem is that we do: main_lwp = find_lwp_pid (ptid_t (pid)); and then proceed assuming that main_lwp is not nullptr. In the case that the main thread has exited though, main_lwp will be nullptr. However, we only need main_lwp so that GDB can detach from the thread. If the main thread has exited, and GDB has already detached from every other thread, then GDB has finished detaching, GDB can skip the calls that try to detach from the main thread, and then tell the user that the detach was a success. For Remote Targets ================== On remote targets there are two problems. First is that when the exit occurs during the early phase of the detach, we see the stop notification arrive while GDB is removing the breakpoints ahead of the detach. The 'set debug remote on' trace looks like this: [remote] Sending packet: $z0,7f1648fe0241,1#35 [remote] Notification received: Stop:W0;process:2a0ac8 # At this point an unpatched gdbserver segfaults, and the connection # is broken. A patched gdbserver continues as below... [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff00a8,1#68 [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff132f,1#6b [remote] Packet received: E01 [remote] Sending packet: $D;2a0ac8#3e [remote] Packet received: E01 I was originally running into Segmentation Faults, from within gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This function calls current_process() and then dereferences the result to find the breakpoint list. However, in our case, the current process has already exited, and so the current_process() call returns nullptr. At the point of failure, the gdbserver backtrace looks like this: #0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982 #1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093 #2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372 #3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498 ... The problem is that, as a result non-stop being on, the process exiting is only reported back to GDB after the request to remove a breakpoint has been sent. Clearly gdbserver can't actually remove this breakpoint -- the process has already exited -- so I think the best solution is for gdbserver just to report an error, which is what I've done. The second problem I ran into was on the gdb side, as the process has already exited, but GDB has not yet acknowledged the exit event, the detach -- the 'D' packet in the above trace -- fails. This was being reported to the user with a 'Can't detach process' error. As the test actually calls detach from Python code, this error was then becoming a Python exception. Though clearly the detach has returned an error, and so, maybe, having GDB throw an error would be fine, I think in this case, there's a good argument that the remote error can be ignored -- if GDB tries to detach and gets back an error, and if there's a pending exit event for the pid we tried to detach, then just ignore the error and pretend the detach worked fine. We could possibly check for a pending exit event before sending the detach packet, however, I believe that it might be possible (in non-stop mode) for the stop notification to arrive after the detach is sent, but before gdbserver has started processing the detach. In this case we would still need to check for pending stop events after seeing the detach fail, so I figure there's no point having two checks -- we just send the detach request, and if it fails, check to see if the process has already exited. Testing ======= In order to test this issue I needed to ensure that the exit event arrives at the same time as the detach call. The window of opportunity for getting the exit to arrive is so small I've never managed to trigger this in real use -- I originally spotted this issue while working on another patch, which did manage to trigger this issue. However, if we trigger both the exit and the detach from a single Python function then we never return to GDB's event loop, as such GDB never processes the exit event, and so the first time GDB gets a chance to see the exit is during the detach call. And so that is the approach I've taken for testing this patch. Tested-By: Kevin Buettner <kevinb@redhat.com> Approved-By: Kevin Buettner <kevinb@redhat.com>

I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643

(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>

This commit extends the logic added by these two commits from a while ago: #1 7b96196 (gdbserver: hide fork child threads from GDB), #2 df5ad10 (gdb, gdbserver: detach fork child when detaching from fork parent) ... to handle thread clone events, which are very similar to (v)fork events. For #1, we want to hide clone children as well, so just update the comments. For #2, unlike (v)fork children, pending clone children aren't full processes, they're just threads, so don't detach them in handle_detach. linux-low.cc will take care of detaching them along with all other threads of the process, there's nothing special that needs to be done. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I7f5901d07efda576a2522d03e183994e071b8ffc

Running the gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase added later in the series against gdbserver, after the TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run into an infinite loop in stop_all_threads, leading to a timeout: FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout) The is really a latent bug, and it is about the fact that stop_all_threads stops listening to events from a target as soon as it sees a TARGET_WAITKIND_NO_RESUMED, ignoring that TARGET_WAITKIND_NO_RESUMED may be delayed. handle_no_resumed knows how to handle delayed no-resumed events, but stop_all_threads was never taught to. In more detail, here's what happens with that testcase: #1 - Multiple threads report breakpoint hits to gdb. #2 - gdb picks one events, and it's for thread 1. All other stops are left pending. thread 1 needs to move past a breakpoint, so gdb stops all threads to start an inline step over for thread 1. While stopping threads, some of the threads that were still running report events that are also left pending. #2 - gdb steps thread 1 #3 - Thread 1 exits while stepping (it steps over an exit syscall), gdbserver reports thread exit for thread 1 #4 - Thread 1 was the last resumed thread, so gdbserver also reports no-resumed: [remote] Notification received: Stop:w0;p3445d0.3445d3 [remote] Sending packet: $vStopped#55 [remote] Packet received: N [remote] Sending packet: $vStopped#55 [remote] Packet received: OK #5 - gdb processes the thread exit for thread 1, finishes the step over and restarts threads. #6 - gdb picks the next event to process out of one of the resumed threads with pending events: [infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting #11 #7 - This is again a breakpoint hit and the breakpoint needs to be stepped over too, so gdb starts a step-over dance again. #8 - We reach stop_all_threads, which finds that some threads need to be stopped. #9 - wait_one finally consumes the no-resumed event queue by #4. Seeing this, wait_one disable target async, to stop listening for events out of the remote target. #10 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. #11 - Because the remote target is no longer async, and there are no other targets, wait_one return no-resumed immediately without polling the remote target. #12 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. goto #11, looping forever. Fix this by explicitly enabling/re-enabling target async on targets that can async, before waiting for stops. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f

This commit adds a new extension_language_ops hook which allows an extension to handle the case where GDB can't find a separate debug information file for a particular objfile. This commit doesn't actually implement the hook for any of GDB's extension languages, the next commit will do that. This commit just adds support for the hook to extension-priv.h and extension.[ch], and then reworks symfile-debug.c to call the hook. Right now the hook will always return its default value, which means GDB should do nothing different. As such, there should be no user visible changes after this commit. I'll give a brief description of what the hook does here so that we can understand the changes in symfile-debug.c. The next commit adds a Python implementation for this new hook, and gives a fuller description of the new functionality. Currently, when looking for separate debug information GDB tries three things, in this order: 1. Use the build-id to find the required debug information, 2. Check for .gnu_debuglink section and use that to look up the required debug information, 3. Check with debuginfod to see if it can supply the required information. The new extension_language_ops::handle_missing_debuginfo hook is called if all three steps fail to find any debug information. The hook has three possible return values: a. Nothing, no debug information is found, GDB continues without the debug information for this objfile. This matches the current behaviour of GDB, and is the default if nothing is implementing this new hook, b. Install debug information into a location that step #1 or #2 above would normally check, and then request that GDB repeats steps #1 and #2 in the hope that GDB will now find the debug information. If the debug information is still not found then GDB carries on without the debug information. If the debug information is found the GDB loads it and carries on, c. Return a filename for a file containing the required debug information. GDB loads the contents of this file and carries on. The changes in this commit mostly involve placing the core of objfile::find_and_add_separate_symbol_file into a loop which allows for steps #1 and #2 to be repeated. We take care to ensure that debuginfod is only queried once, the first time through. The assumption is that no extension is going to be able to control the replies from debuginfod, so there's no point making a second request -- and as these requests go over the network, they could potentially be slow. The warnings that find_and_add_separate_symbol_file collects are displayed only once assuming that no debug information is found. If debug information is found, even after the extension has operated, then the warnings are not shown; remember, these are warnings from GDB about failure to find any suitable debug information, so it makes sense to hide these if debug information is found. Approved-By: Tom Tromey <tom@tromey.com>

On aarch64-linux, with gcc 13.2.1, I run into: ... (gdb) backtrace^M #0 break_here () at solib-search.c:30^M #1 0x0000fffff7f20194 in lib2_func4 () at solib-search-lib2.c:50^M #2 0x0000fffff7f70194 in lib1_func3 () at solib-search-lib1.c:50^M #3 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M #4 0x0000fffff7f70174 in lib1_func1 () at solib-search-lib1.c:30^M #5 0x00000000004101b4 in main () at solib-search.c:23^M (gdb) PASS: gdb.base/solib-search.exp: \ backtrace (with wrong libs) (data collection) FAIL: gdb.base/solib-search.exp: backtrace (with wrong libs) ... The FAIL is generated by this code in the test-case: ... if { $expect_fail } { # If the backtrace output is correct the test isn't sufficiently # testing what it should. if { $count == $total_expected } { set fail 1 } ... The test-case: - builds two versions of two shared libs, a "right" and "wrong" version, the difference being an additional dummy function (called spacer function), - uses the "right" version to generate a core file, - uses the "wrong" version to interpret the core file, and - generates a backtrace. The intent is that the backtrace is incorrect due to using the "wrong" version, but actually it's correct. This is because the spacer functions aren't large enough. Fix this by increasing the size of the spacer functions by adding a dummy loop, after which we have, as expected, an incorrect backtrace: ... (gdb) backtrace^M #0 break_here () at solib-search.c:30^M #1 0x0000fffff7f201c0 in ?? ()^M #2 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M #3 0x0000fffff7f20174 in lib2_func2 () at solib-search-lib2.c:30^M #4 0x0000fffff7f70174 in lib1_func1 () at solib-search-lib1.c:30^M #5 0x00000000004101b4 in main () at solib-search.c:23^M (gdb) PASS: gdb.base/solib-search.exp: \ backtrace (with wrong libs) (data collection) PASS: gdb.base/solib-search.exp: backtrace (with wrong libs) ... Tested on aarch64-linux.

The testsuite for SCFI contains target-specific tests. When a test is executed with --scfi=experimental command line option, the CFI annotations in the test .s files are skipped altogether by the GAS for processing. The CFI directives in the input assembly files are, however, validated by running the assembler one more time without --scfi=experimental. Some testcases are used to highlight those asm constructs that the SCFI machinery in GAS currently does not support: - Only System V AMD64 ABI is supported for now. Using either --32 or --x32 with SCFI results in hard error. See scfi-unsupported-1.s. - Untraceable stack-pointer manipulation in function epilougue and prologue. See scfi-unsupported-2.s. - Using Dynamically Realigned Arguement Pointer (DRAP) register to realign the stack. For SCFI, the CFA must be only REG_SP or REG_FP based. See scfi-unsupported-drap-1.s Some testcases are used to highlight some diagnostics that the SCFI machinery in GAS currently issues, with an intent to help user correct inadvertent errors in their hand-written asm. An error is issued when GAS finds that input asm is not amenable to correct CFI synthesis. - (#1) "Warning: SCFI: Asymetrical register restore" - (#2) "Error: SCFI: usage of REG_FP as scratch not supported" - (#3) "Error: SCFI: unsupported stack manipulation pattern" In case of (#2) and (#3), SCFI generation is skipped for the respective function. Above is a subset of the warnings/errors implemented in the code. gas/testsuite/: * gas/scfi/README: New test. * gas/scfi/x86_64/ginsn-add-1.l: New test. * gas/scfi/x86_64/ginsn-add-1.s: New test. * gas/scfi/x86_64/ginsn-dw2-regnum-1.l: New test. * gas/scfi/x86_64/ginsn-dw2-regnum-1.s: New test. * gas/scfi/x86_64/ginsn-pop-1.l: New test. * gas/scfi/x86_64/ginsn-pop-1.s: New test. * gas/scfi/x86_64/ginsn-push-1.l: New test. * gas/scfi/x86_64/ginsn-push-1.s: New test. * gas/scfi/x86_64/scfi-add-1.d: New test. * gas/scfi/x86_64/scfi-add-1.l: New test. * gas/scfi/x86_64/scfi-add-1.s: New test. * gas/scfi/x86_64/scfi-add-2.d: New test. * gas/scfi/x86_64/scfi-add-2.l: New test. * gas/scfi/x86_64/scfi-add-2.s: New test. * gas/scfi/x86_64/scfi-asm-marker-1.d: New test. * gas/scfi/x86_64/scfi-asm-marker-1.l: New test. * gas/scfi/x86_64/scfi-asm-marker-1.s: New test. * gas/scfi/x86_64/scfi-asm-marker-2.d: New test. * gas/scfi/x86_64/scfi-asm-marker-2.l: New test. * gas/scfi/x86_64/scfi-asm-marker-2.s: New test. * gas/scfi/x86_64/scfi-asm-marker-3.d: New test. * gas/scfi/x86_64/scfi-asm-marker-3.l: New test. * gas/scfi/x86_64/scfi-asm-marker-3.s: New test. * gas/scfi/x86_64/scfi-bp-sp-1.d: New test. * gas/scfi/x86_64/scfi-bp-sp-1.l: New test. * gas/scfi/x86_64/scfi-bp-sp-1.s: New test. * gas/scfi/x86_64/scfi-bp-sp-2.d: New test. * gas/scfi/x86_64/scfi-bp-sp-2.l: New test. * gas/scfi/x86_64/scfi-bp-sp-2.s: New test. * gas/scfi/x86_64/scfi-callee-saved-1.d: New test. * gas/scfi/x86_64/scfi-callee-saved-1.l: New test. * gas/scfi/x86_64/scfi-callee-saved-1.s: New test. * gas/scfi/x86_64/scfi-callee-saved-2.d: New test. * gas/scfi/x86_64/scfi-callee-saved-2.l: New test. * gas/scfi/x86_64/scfi-callee-saved-2.s: New test. * gas/scfi/x86_64/scfi-callee-saved-3.d: New test. * gas/scfi/x86_64/scfi-callee-saved-3.l: New test. * gas/scfi/x86_64/scfi-callee-saved-3.s: New test. * gas/scfi/x86_64/scfi-callee-saved-4.d: New test. * gas/scfi/x86_64/scfi-callee-saved-4.l: New test. * gas/scfi/x86_64/scfi-callee-saved-4.s: New test. * gas/scfi/x86_64/scfi-cfg-1.d: New test. * gas/scfi/x86_64/scfi-cfg-1.l: New test. * gas/scfi/x86_64/scfi-cfg-1.s: New test. * gas/scfi/x86_64/scfi-cfg-2.d: New test. * gas/scfi/x86_64/scfi-cfg-2.l: New test. * gas/scfi/x86_64/scfi-cfg-2.s: New test. * gas/scfi/x86_64/scfi-cfi-label-1.d: New test. * gas/scfi/x86_64/scfi-cfi-label-1.l: New test. * gas/scfi/x86_64/scfi-cfi-label-1.s: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.d: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.l: New test. * gas/scfi/x86_64/scfi-cfi-sections-1.s: New test. * gas/scfi/x86_64/scfi-cofi-1.d: New test. * gas/scfi/x86_64/scfi-cofi-1.l: New test. * gas/scfi/x86_64/scfi-cofi-1.s: New test. * gas/scfi/x86_64/scfi-diag-1.l: New test. * gas/scfi/x86_64/scfi-diag-1.s: New test. * gas/scfi/x86_64/scfi-diag-2.l: New test. * gas/scfi/x86_64/scfi-diag-2.s: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.d: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.l: New test. * gas/scfi/x86_64/scfi-dyn-stack-1.s: New test. * gas/scfi/x86_64/scfi-enter-1.d: New test. * gas/scfi/x86_64/scfi-enter-1.l: New test. * gas/scfi/x86_64/scfi-enter-1.s: New test. * gas/scfi/x86_64/scfi-fp-diag-2.l: New test. * gas/scfi/x86_64/scfi-fp-diag-2.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-1.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-2.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-3.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.d: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.l: New test. * gas/scfi/x86_64/scfi-indirect-mov-4.s: New test. * gas/scfi/x86_64/scfi-indirect-mov-5.s: New test. * gas/scfi/x86_64/scfi-lea-1.d: New test. * gas/scfi/x86_64/scfi-lea-1.l: New test. * gas/scfi/x86_64/scfi-lea-1.s: New test. * gas/scfi/x86_64/scfi-leave-1.d: New test. * gas/scfi/x86_64/scfi-leave-1.l: New test. * gas/scfi/x86_64/scfi-leave-1.s: New test. * gas/scfi/x86_64/scfi-pushq-1.d: New test. * gas/scfi/x86_64/scfi-pushq-1.l: New test. * gas/scfi/x86_64/scfi-pushq-1.s: New test. * gas/scfi/x86_64/scfi-pushsection-1.d: New test. * gas/scfi/x86_64/scfi-pushsection-1.l: New test. * gas/scfi/x86_64/scfi-pushsection-1.s: New test. * gas/scfi/x86_64/scfi-pushsection-2.d: New test. * gas/scfi/x86_64/scfi-pushsection-2.l: New test. * gas/scfi/x86_64/scfi-pushsection-2.s: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.d: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.l: New test. * gas/scfi/x86_64/scfi-selfalign-func-1.s: New test. * gas/scfi/x86_64/scfi-simple-1.d: New test. * gas/scfi/x86_64/scfi-simple-1.l: New test. * gas/scfi/x86_64/scfi-simple-1.s: New test. * gas/scfi/x86_64/scfi-simple-2.d: New test. * gas/scfi/x86_64/scfi-simple-2.l: New test. * gas/scfi/x86_64/scfi-simple-2.s: New test. * gas/scfi/x86_64/scfi-sub-1.d: New test. * gas/scfi/x86_64/scfi-sub-1.l: New test. * gas/scfi/x86_64/scfi-sub-1.s: New test. * gas/scfi/x86_64/scfi-sub-2.d: New test. * gas/scfi/x86_64/scfi-sub-2.l: New test. * gas/scfi/x86_64/scfi-sub-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-2.l: New test. * gas/scfi/x86_64/scfi-unsupported-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-3.l: New test. * gas/scfi/x86_64/scfi-unsupported-3.s: New test. * gas/scfi/x86_64/scfi-unsupported-4.l: New test. * gas/scfi/x86_64/scfi-unsupported-4.s: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-2.l: New test. * gas/scfi/x86_64/scfi-unsupported-cfg-2.s: New test. * gas/scfi/x86_64/scfi-unsupported-drap-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-drap-1.s: New test. * gas/scfi/x86_64/scfi-unsupported-insn-1.l: New test. * gas/scfi/x86_64/scfi-unsupported-insn-1.s: New test. * gas/scfi/x86_64/scfi-x86-64.exp: New file.

A review comment on the SCFI V4 series was to handle ginsn creation for certain lea opcodes more precisely. Specifically, we should preferably handle the following two cases of lea opcodes similarly: - #1 lea with "index register and scale factor of 1, but no base register", - #2 lea with "no index register, but base register present". Currently, a ginsn of type GINSN_TYPE_OTHER is generated for the case of #1 above. For #2, however, the lea insn is translated to either a GINSN_TYPE_ADD or GINSN_TYPE_MOV depending on whether the immediate for displacement is non-zero or not respectively. Change the handling in x86_ginsn_lea so that both of the above lea manifestations are handled similarly. While at it, remove the code paths creating GINSN_TYPE_OTHER altogether from the function. It makes sense to piggy back on the x86_ginsn_unhandled code path to create GINSN_TYPE_OTHER if the destination register is interesting. This was also suggested in one of the previous review rounds; the other functions already follow that model, so this keeps functions symmetrical looking. gas/ * gas/config/tc-i386.c (x86_ginsn_lea): Handle select lea ops with no base register similar to the case of no index register. Remove creation of GINSN_TYPE_OTHER from the function. gas/testsuite/ * gas/scfi/x86_64/ginsn-lea-1.l: New test. * gas/scfi/x86_64/ginsn-lea-1.s: Likewise. * gas/scfi/x86_64/scfi-x86-64.exp: Add new test.

Bug PR gdb/28313 describes attaching to a process when the executable has been deleted. The bug is for S390 and describes how a user sees a message 'PC not saved'. On x86-64 (GNU/Linux) I don't see a 'PC not saved' message, but instead I see this: (gdb) attach 901877 Attaching to process 901877 No executable file now. warning: Could not load vsyscall page because no executable was specified 0x00007fa9d9c121e7 in ?? () (gdb) bt #0 0x00007fa9d9c121e7 in ?? () #1 0x00007fa9d9c1211e in ?? () #2 0x0000000000000007 in ?? () #3 0x000000002dc8b18d in ?? () #4 0x0000000000000000 in ?? () (gdb) Notice that the addresses in the backtrace don't seem right, quickly heading to 0x7 and finally ending at 0x0. What's going on, in both the s390 case and the x86-64 case is that the architecture's prologue scanner is going wrong and causing the stack unwinding to fail. The prologue scanner goes wrong because GDB has no unwind information. And GDB has no unwind information because, of course, the executable has been deleted. Notice in the example session above we get this line in the output: No executable file now. which indicates that GDB failed to find an executable to debug. For GNU/Linux when GDB tries to find an executable for a given pid we end up calling linux_proc_pid_to_exec_file in gdb/nat/linux-procfs.c. Within this function we call `readlink` on /proc/PID/exe to find the path of the actual executable. If the `readlink` call fails then we already fallback on using /proc/PID/exe as the path to the executable to debug. However, when the executable has been deleted the `readlink` call doesn't fail, but the path that is returned points to a non-existent file. I propose that we add an `access` call to linux_proc_pid_to_exec_file to check that the target file exists and can be read. If the target can't be read then we should fall back to /proc/PID/exe (assuming that /proc/PID/exe can be read). Now on x86-64 the output looks like this: (gdb) attach 901877 Attaching to process 901877 Reading symbols from /proc/901877/exe... Reading symbols from /lib64/libc.so.6... (No debugging symbols found in /lib64/libc.so.6) Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007fa9d9c121e7 in nanosleep () from /lib64/libc.so.6 (gdb) bt #0 0x00007fa9d9c121e7 in nanosleep () from /lib64/libc.so.6 #1 0x00007fa9d9c1211e in sleep () from /lib64/libc.so.6 #2 0x000000000040117e in spin_forever () at attach-test.c:17 #3 0x0000000000401198 in main () at attach-test.c:24 (gdb) which is much better. I've also tagged the bug PR gdb/29782 which concerns the test gdb.server/connect-with-no-symbol-file.exp. After making this change, when running gdb.server/connect-with-no-symbol-file.exp GDB would now pick up the /proc/PID/exe file as the executable in some cases. As GDB is not restarted for the multiple iterations of this test GDB (or rather BFD) would given a warning/error like: (gdb) PASS: gdb.server/connect-with-no-symbol-file.exp: sysroot=target:: action=permission: setup: disconnect set sysroot target: BFD: reopening /proc/3283001/exe: No such file or directory (gdb) FAIL: gdb.server/connect-with-no-symbol-file.exp: sysroot=target:: action=permission: setup: adjust sysroot What's happening is that an executable found for an earlier iteration of the test is still registered for the inferior when we are setting up for a second iteration of the test. When the sysroot changes, if there's an executable registered GDB tries to reopen it, but in this case the file has disappeared (the previous inferior has exited by this point). I did think about maybe, when the executable is /proc/PID/exe, we should auto-delete the file from the inferior. But in the end I thought this was a bad idea. Not only would this require a lot of special code in GDB just to support this edge case: we'd need to track if the exe file name came from /proc and should be auto-deleted, or we'd need target specific code to check if a path should be auto-deleted..... ... in addition, we'd still want to warn the user when we auto-deleted the file from the inferior, otherwise they might be surprised to find their inferior suddenly has no executable attached, so we wouldn't actually reduce the number of warnings the user sees. So in the end I figured that the best solution is to just update the test to avoid the warning. This is easily done by manually removing the executable from the inferior once each iteration of the test has completed. Now, in bug PR gdb/29782 GDB is clearly managing to pick up an executable from the NFS cache somehow. I guess what's happening is that when the original file is deleted /proc/PID/exe is actually pointing to a file in the NFS cache which is only deleted at some later point, and so when GDB starts up we do manage to associate a file with the inferior, this results in the same message being emitted from BFD as I was seeing. The fix included in this commit should also fix that bug. One final note: On x86-64 GNU/Linux, the gdb.server/connect-with-no-symbol-file.exp test will produce 2 core files. This is due to a bug in gdbserver that is nothing to do with this test. These core files are created before and after this commit. I am working on a fix for the gdbserver issue, but will post that separately. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28313 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29782 Approved-By: Tom Tromey <tom@tromey.com>

When running test-case gdb.dap/eof.exp, it occasionally coredumps. The thread triggering the coredump is: ... #0 0x0000ffff42bb2280 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x0000ffff42b65800 [PAC] in raise () from /lib64/libc.so.6 #2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11) at gdb/event-top.c:926 #3 0x00000000007b0470 in handle_sigsegv (sig=11) at gdb/event-top.c:976 #4 <signal handler called> #5 0x0000000000606080 in cli_ui_out::do_message (this=0xffff2f7ed728, style=..., format=0xffff0c002af1 "%s", args=...) at gdb/cli-out.c:232 #6 0x0000000000ce6358 in ui_out::call_do_message (this=0xffff2f7ed728, style=..., format=0xffff0c002af1 "%s") at gdb/ui-out.c:584 #7 0x0000000000ce6610 in ui_out::vmessage (this=0xffff2f7ed728, in_style=..., format=0x16f93ea "", args=...) at gdb/ui-out.c:621 #8 0x0000000000ce3a9c in ui_file::vprintf (this=0xfffffbea1b18, ...) at gdb/ui-file.c:74 #9 0x0000000000d2b148 in gdb_vprintf (stream=0xfffffbea1b18, format=0x16f93e8 "%s", args=...) at gdb/utils.c:1898 #10 0x0000000000d2b23c in gdb_printf (stream=0xfffffbea1b18, format=0x16f93e8 "%s") at gdb/utils.c:1913 #11 0x0000000000ab5208 in gdbpy_write (self=0x33fe35d0, args=0x342ec280, kw=0x345c08b0) at gdb/python/python.c:1464 #12 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0 #13 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall () from /lib64/libpython3.12.so.1.0 #14 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 #15 0x0000ffff434d8cd0 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0 #16 0x0000ffff434b9824 [PAC] in PyObject_CallOneArg () from /lib64/libpython3.12.so.1.0 #17 0x0000ffff43557674 [PAC] in PyFile_WriteObject () from /lib64/libpython3.12.so.1.0 #18 0x0000ffff435577a0 [PAC] in PyFile_WriteString () from /lib64/libpython3.12.so.1.0 #19 0x0000ffff43465354 [PAC] in thread_excepthook () from /lib64/libpython3.12.so.1.0 #20 0x0000ffff434ac6e0 [PAC] in cfunction_vectorcall_O () from /lib64/libpython3.12.so.1.0 #21 0x0000ffff434a32d8 [PAC] in PyObject_Vectorcall () from /lib64/libpython3.12.so.1.0 #22 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 #23 0x0000ffff434d8d88 [PAC] in method_vectorcall () from /lib64/libpython3.12.so.1.0 #24 0x0000ffff435e0ef4 [PAC] in thread_run () from /lib64/libpython3.12.so.1.0 #25 0x0000ffff43591ec0 [PAC] in pythread_wrapper () from /lib64/libpython3.12.so.1.0 #26 0x0000ffff42bb0584 [PAC] in start_thread () from /lib64/libc.so.6 #27 0x0000ffff42c1fd4c [PAC] in thread_start () from /lib64/libc.so.6 ... The direct cause for the coredump seems to be that cli_ui_out::do_message is trying to write to a stream variable which does not look sound: ... (gdb) p *stream $8 = {_vptr.ui_file = 0x0, m_applied_style = {m_foreground = {m_simple = true, { m_value = 0, {m_red = 0 '\000', m_green = 0 '\000', m_blue = 0 '\000'}}}, m_background = {m_simple = 32, {m_value = 65535, {m_red = 255 '\377', m_green = 255 '\377', m_blue = 0 '\000'}}}, m_intensity = (unknown: 0x438fe710), m_reverse = 255}} ... The string that is being printed is: ... (gdb) p str $9 = "Exception in thread " ... so AFAICT this is a DAP thread running into an exception and trying to print it. If we look at the state of gdb's main thread, we have: ... #0 0x0000ffff42bac914 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 #1 0x0000ffff42bafb44 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libc.so.6 #2 0x0000ffff43466e9c [PAC] in take_gil () from /lib64/libpython3.12.so.1.0 #3 0x0000ffff43484fe0 [PAC] in PyEval_RestoreThread () from /lib64/libpython3.12.so.1.0 #4 0x0000000000ab8698 [PAC] in gdbpy_allow_threads::~gdbpy_allow_threads ( this=0xfffffbea1cf8, __in_chrg=<optimized out>) at gdb/python/python-internal.h:769 #5 0x0000000000ab2fec in execute_gdb_command (self=0x33fe35d0, args=0x34297b60, kw=0x34553d20) at gdb/python/python.c:681 #6 0x0000ffff434acedc in cfunction_call () from /lib64/libpython3.12.so.1.0 #7 0x0000ffff4347c500 [PAC] in _PyObject_MakeTpCall () from /lib64/libpython3.12.so.1.0 #8 0x0000ffff43488b64 [PAC] in _PyEval_EvalFrameDefault () from /lib64/libpython3.12.so.1.0 #9 0x0000ffff4353bce8 [PAC] in _PyObject_VectorcallTstate.lto_priv.3 () from /lib64/libpython3.12.so.1.0 #10 0x0000000000ab87fc [PAC] in gdbpy_event::operator() (this=0xffff14005900) at gdb/python/python.c:1061 #11 0x0000000000ab93e8 in std::__invoke_impl<void, gdbpy_event&> (__f=...) at /usr/include/c++/13/bits/invoke.h:61 #12 0x0000000000ab9204 in std::__invoke_r<void, gdbpy_event&> (__fn=...) at /usr/include/c++/13/bits/invoke.h:111 #13 0x0000000000ab8e90 in std::_Function_handler<..>::_M_invoke(...) (...) at /usr/include/c++/13/bits/std_function.h:290 #14 0x000000000062e0d0 in std::function<void ()>::operator()() const ( this=0xffff14005830) at /usr/include/c++/13/bits/std_function.h:591 #15 0x0000000000b67f14 in run_events (error=0, client_data=0x0) at gdb/run-on-main-thread.c:76 #16 0x000000000157e290 in handle_file_event (file_ptr=0x33dae3a0, ready_mask=1) at gdbsupport/event-loop.cc:573 #17 0x000000000157e760 in gdb_wait_for_event (block=1) at gdbsupport/event-loop.cc:694 #18 0x000000000157d464 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:264 #19 0x0000000000943a84 in start_event_loop () at gdb/main.c:401 #20 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465 #21 0x000000000094567c in captured_main (data=0xfffffbea23e8) at gdb/main.c:1335 #22 0x0000000000945700 in gdb_main (args=0xfffffbea23e8) at gdb/main.c:1354 #23 0x0000000000423ab4 in main (argc=14, argv=0xfffffbea2578) at gdb/gdb.c:39 ... AFAIU, there's a race between the two threads on gdb_stderr: - the DAP thread samples the gdb_stderr value, and uses it a bit later to print to - the gdb main thread changes the gdb_stderr value forth and back, using a temporary value for string capture purposes The non-sound stream value is caused by gdb_stderr being sampled while pointing to a str_file object, and used once the str_file object is already destroyed. The error here is that the DAP thread attempts to print to gdb_stderr. Fix this by adding a thread_wrapper that: - catches all exceptions and logs them to dap.log, and - while we're at it, logs when exiting and using the thread_wrapper for each DAP thread. Tested on aarch64-linux. Approved-By: Tom Tromey <tom@tromey.com>

When running test-case gdb.dap/eof.exp, we're likely to get a coredump due to a segfault in new_threadstate. At the point of the core dump, the gdb main thread looks like: ... (gdb) bt #0 0x0000fffee30d2280 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x0000fffee3085800 [PAC] in raise () from /lib64/libc.so.6 #2 0x00000000007b03e8 [PAC] in handle_fatal_signal (sig=11) at gdb/event-top.c:926 #3 0x00000000007b0470 in handle_sigsegv (sig=11) at gdb/event-top.c:976 #4 <signal handler called> #5 0x0000fffee3a4db14 in new_threadstate () from /lib64/libpython3.12.so.1.0 #6 0x0000fffee3ab0548 [PAC] in PyGILState_Ensure () from /lib64/libpython3.12.so.1.0 #7 0x0000000000a6d034 [PAC] in gdbpy_gil::gdbpy_gil (this=0xffffcb279738) at gdb/python/python-internal.h:787 #8 0x0000000000ab87ac in gdbpy_event::~gdbpy_event (this=0xfffea8001ee0, __in_chrg=<optimized out>) at gdb/python/python.c:1051 #9 0x0000000000ab9460 in std::_Function_base::_Base_manager<...>::_M_destroy (__victim=...) at /usr/include/c++/13/bits/std_function.h:175 #10 0x0000000000ab92dc in std::_Function_base::_Base_manager<...>::_M_manager (__dest=..., __source=..., __op=std::__destroy_functor) at /usr/include/c++/13/bits/std_function.h:203 #11 0x0000000000ab8f14 in std::_Function_handler<...>::_M_manager(...) (...) at /usr/include/c++/13/bits/std_function.h:282 #12 0x000000000042dd9c in std::_Function_base::~_Function_base (this=0xfffea8001c10, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:244 #13 0x000000000042e654 in std::function<void ()>::~function() (this=0xfffea8001c10, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/std_function.h:334 #14 0x0000000000b68e60 in std::_Destroy<std::function<void ()> >(...) (...) at /usr/include/c++/13/bits/stl_construct.h:151 #15 0x0000000000b68cd0 in std::_Destroy_aux<false>::__destroy<...>(...) (...) at /usr/include/c++/13/bits/stl_construct.h:163 #16 0x0000000000b689d8 in std::_Destroy<...>(...) (...) at /usr/include/c++/13/bits/stl_construct.h:196 #17 0x0000000000b68414 in std::_Destroy<...>(...) (...) at /usr/include/c++/13/bits/alloc_traits.h:948 #18 std::vector<...>::~vector() (this=0x2a183c8 <runnables>) at /usr/include/c++/13/bits/stl_vector.h:732 #19 0x0000fffee3088370 in __run_exit_handlers () from /lib64/libc.so.6 #20 0x0000fffee3088450 [PAC] in exit () from /lib64/libc.so.6 #21 0x0000000000c95600 [PAC] in quit_force (exit_arg=0x0, from_tty=0) at gdb/top.c:1822 #22 0x0000000000609140 in quit_command (args=0x0, from_tty=0) at gdb/cli/cli-cmds.c:508 #23 0x0000000000c926a4 in quit_cover () at gdb/top.c:300 #24 0x00000000007b09d4 in async_disconnect (arg=0x0) at gdb/event-top.c:1230 #25 0x0000000000548acc in invoke_async_signal_handlers () at gdb/async-event.c:234 #26 0x000000000157d2d4 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:199 #27 0x0000000000943a84 in start_event_loop () at gdb/main.c:401 #28 0x0000000000943bfc in captured_command_loop () at gdb/main.c:465 #29 0x000000000094567c in captured_main (data=0xffffcb279d08) at gdb/main.c:1335 #30 0x0000000000945700 in gdb_main (args=0xffffcb279d08) at gdb/main.c:1354 #31 0x0000000000423ab4 in main (argc=14, argv=0xffffcb279e98) at gdb/gdb.c:39 ... The direct cause of the segfault is calling PyGILState_Ensure after calling Py_Finalize. AFAICT the problem is a race between the gdb main thread and DAP's JSON writer thread. On one side, we have the following events: - DAP's JSON reader thread reads an EOF, and lets DAP's main thread known by writing None into read_queue - DAP's main thread lets DAP's JSON writer thread known by writing None into write_queue - DAP's JSON writer thread sees the None in its queue, and calls send_gdb("quit") - a corresponding gdbpy_event is deposited in the runnables vector, to be run by the gdb main thread On the other side, we have the following events: - the gdb main thread receives a SIGHUP - the corresponding handler calls quit_force, which calls do_final_cleanups - one of the final cleanups is finalize_python, which calls Py_Finalize - quit_force calls exit, which triggers the exit handlers - one of the exit handlers is the destructor of the runnables vector - destruction of the vector triggers destruction of the remaining element - the remaining element is a gdbpy_event, and the destructor (indirectly) calls PyGILState_Ensure It's good to note that both events (EOF and SIGHUP) are caused by this line in the test-case: ... catch "close -i $gdb_spawn_id" ... where "expect close" closes the stdin and stdout file descriptors, which causes the SIGHUP to be send. So, for the system I'm running this on, the send_gdb("quit") is actually not needed. I'm not sure if we support any systems where it's actually needed. Fix this by removing the send_gdb("quit"). Tested on aarch64-linux. PR dap/31306 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31306

When building gdb with -O0 -fsanitize=address, and running test-case gdb.ada/uninitialized_vars.exp, I run into: ... (gdb) info locals a = 0 z = (a => 1, b => false, c => 2.0) ================================================================= ==66372==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000097f58 at pc 0xffff52c0da1c bp 0xffffc90a1d40 sp 0xffffc90a1d80 READ of size 4 at 0x602000097f58 thread T0 #0 0xffff52c0da18 in memmove (/lib64/libasan.so.8+0x6da18) #1 0xbcab24 in unsigned char* std::__copy_move_backward<false, true, std::random_access_iterator_tag>::__copy_move_b<unsigned char const, unsigned char>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:748 #2 0xbc9bf4 in unsigned char* std::__copy_move_backward_a2<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:769 #3 0xbc898c in unsigned char* std::__copy_move_backward_a1<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:778 #4 0xbc715c in unsigned char* std::__copy_move_backward_a<false, unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:807 #5 0xbc4e6c in unsigned char* std::copy_backward<unsigned char const*, unsigned char*>(unsigned char const*, unsigned char const*, unsigned char*) /usr/include/c++/13/bits/stl_algobase.h:867 #6 0xbc2934 in void gdb::copy<unsigned char const, unsigned char>(gdb::array_view<unsigned char const>, gdb::array_view<unsigned char>) gdb/../gdbsupport/array-view.h:223 #7 0x20e0100 in value::contents_copy_raw(value*, long, long, long) gdb/value.c:1239 #8 0x20e9830 in value::primitive_field(long, int, type*) gdb/value.c:3078 #9 0x20e98f8 in value_field(value*, int) gdb/value.c:3095 #10 0xcafd64 in print_field_values gdb/ada-valprint.c:658 #11 0xcb0fa0 in ada_val_print_struct_union gdb/ada-valprint.c:857 #12 0xcb1bb4 in ada_value_print_inner(value*, ui_file*, int, value_print_options const*) gdb/ada-valprint.c:1042 #13 0xc66e04 in ada_language::value_print_inner(value*, ui_file*, int, value_print_options const*) const (/home/vries/gdb/build/gdb/gdb+0xc66e04) #14 0x20ca1e8 in common_val_print(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1092 #15 0x20caabc in common_val_print_checked(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1184 #16 0x196c524 in print_variable_and_value(char const*, symbol*, frame_info_ptr, ui_file*, int) gdb/printcmd.c:2355 #17 0x1d99ca0 in print_variable_and_value_data::operator()(char const*, symbol*) gdb/stack.c:2308 #18 0x1dabca0 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::operator()(gdb::fv_detail::erased_callable, char const*, symbol*) const gdb/../gdbsupport/function-view.h:305 #19 0x1dabd14 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::_FUN(gdb::fv_detail::erased_callable, char const*, symbol*) gdb/../gdbsupport/function-view.h:299 #20 0x1dab34c in gdb::function_view<void (char const*, symbol*)>::operator()(char const*, symbol*) const gdb/../gdbsupport/function-view.h:289 #21 0x1d9963c in iterate_over_block_locals gdb/stack.c:2240 #22 0x1d99790 in iterate_over_block_local_vars(block const*, gdb::function_view<void (char const*, symbol*)>) gdb/stack.c:2259 #23 0x1d9a598 in print_frame_local_vars gdb/stack.c:2380 #24 0x1d9afac in info_locals_command(char const*, int) gdb/stack.c:2458 #25 0xfd7b30 in do_simple_func gdb/cli/cli-decode.c:95 #26 0xfe5a2c in cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 #27 0x1f03790 in execute_command(char const*, int) gdb/top.c:575 #28 0x1384080 in command_handler(char const*) gdb/event-top.c:566 #29 0x1384e2c in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:802 #30 0x1f731e4 in tui_command_line_handler gdb/tui/tui-interp.c:104 #31 0x1382a58 in gdb_rl_callback_handler gdb/event-top.c:259 #32 0x21dbb80 in rl_callback_read_char readline/readline/callback.c:290 #33 0x1382510 in gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 #34 0x138277c in gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 #35 0x1fe9b40 in stdin_event_handler gdb/ui.c:155 #36 0x35ff1bc in handle_file_event gdbsupport/event-loop.cc:573 #37 0x35ff9d8 in gdb_wait_for_event gdbsupport/event-loop.cc:694 #38 0x35fd284 in gdb_do_one_event(int) gdbsupport/event-loop.cc:264 #39 0x1768080 in start_event_loop gdb/main.c:408 #40 0x17684c4 in captured_command_loop gdb/main.c:472 #41 0x176cfc8 in captured_main gdb/main.c:1342 #42 0x176d088 in gdb_main(captured_main_args*) gdb/main.c:1361 #43 0xb73edc in main gdb/gdb.c:39 #44 0xffff519b09d8 in __libc_start_call_main (/lib64/libc.so.6+0x309d8) #45 0xffff519b0aac in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x30aac) #46 0xb73c2c in _start (/home/vries/gdb/build/gdb/gdb+0xb73c2c) 0x602000097f58 is located 0 bytes after 8-byte region [0x602000097f50,0x602000097f58) allocated by thread T0 here: #0 0xffff52c65218 in calloc (/lib64/libasan.so.8+0xc5218) #1 0xcbc278 in xcalloc gdb/alloc.c:97 #2 0x35f21e8 in xzalloc(unsigned long) gdbsupport/common-utils.cc:29 #3 0x20de270 in value::allocate_contents(bool) gdb/value.c:937 #4 0x20edc08 in value::fetch_lazy() gdb/value.c:4033 #5 0x20dadc0 in value::entirely_covered_by_range_vector(std::vector<range, std::allocator<range> > const&) gdb/value.c:229 #6 0xcb2298 in value::entirely_optimized_out() gdb/value.h:560 #7 0x20ca6fc in value_check_printable gdb/valprint.c:1133 #8 0x20caa8c in common_val_print_checked(value*, ui_file*, int, value_print_options const*, language_defn const*) gdb/valprint.c:1182 #9 0x196c524 in print_variable_and_value(char const*, symbol*, frame_info_ptr, ui_file*, int) gdb/printcmd.c:2355 #10 0x1d99ca0 in print_variable_and_value_data::operator()(char const*, symbol*) gdb/stack.c:2308 #11 0x1dabca0 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::operator()(gdb::fv_detail::erased_callable, char const*, symbol*) const gdb/../gdbsupport/function-view.h:305 #12 0x1dabd14 in gdb::function_view<void (char const*, symbol*)>::bind<print_variable_and_value_data>(print_variable_and_value_data&)::{lambda(gdb::fv_detail::erased_callable, char const*, symbol*)#1}::_FUN(gdb::fv_detail::erased_callable, char const*, symbol*) gdb/../gdbsupport/function-view.h:299 #13 0x1dab34c in gdb::function_view<void (char const*, symbol*)>::operator()(char const*, symbol*) const gdb/../gdbsupport/function-view.h:289 #14 0x1d9963c in iterate_over_block_locals gdb/stack.c:2240 #15 0x1d99790 in iterate_over_block_local_vars(block const*, gdb::function_view<void (char const*, symbol*)>) gdb/stack.c:2259 #16 0x1d9a598 in print_frame_local_vars gdb/stack.c:2380 #17 0x1d9afac in info_locals_command(char const*, int) gdb/stack.c:2458 #18 0xfd7b30 in do_simple_func gdb/cli/cli-decode.c:95 #19 0xfe5a2c in cmd_func(cmd_list_element*, char const*, int) gdb/cli/cli-decode.c:2735 #20 0x1f03790 in execute_command(char const*, int) gdb/top.c:575 #21 0x1384080 in command_handler(char const*) gdb/event-top.c:566 #22 0x1384e2c in command_line_handler(std::unique_ptr<char, gdb::xfree_deleter<char> >&&) gdb/event-top.c:802 #23 0x1f731e4 in tui_command_line_handler gdb/tui/tui-interp.c:104 #24 0x1382a58 in gdb_rl_callback_handler gdb/event-top.c:259 #25 0x21dbb80 in rl_callback_read_char readline/readline/callback.c:290 #26 0x1382510 in gdb_rl_callback_read_char_wrapper_noexcept gdb/event-top.c:195 #27 0x138277c in gdb_rl_callback_read_char_wrapper gdb/event-top.c:234 #28 0x1fe9b40 in stdin_event_handler gdb/ui.c:155 #29 0x35ff1bc in handle_file_event gdbsupport/event-loop.cc:573 SUMMARY: AddressSanitizer: heap-buffer-overflow (/lib64/libasan.so.8+0x6da18) in memmove ... The error happens when trying to print either variable y or y2: ... type Variable_Record (A : Boolean := True) is record case A is when True => B : Integer; when False => C : Float; D : Integer; end case; end record; Y : Variable_Record := (A => True, B => 1); Y2 : Variable_Record := (A => False, C => 1.0, D => 2); ... when the variables are uninitialized. The error happens only when printing the entire variable: ... (gdb) p y.a $2 = 216 (gdb) p y.b There is no member named b. (gdb) p y.c $3 = 9.18340949e-41 (gdb) p y.d $4 = 1 (gdb) p y <AddressSanitizer: heap-buffer-overflow> ... The error happens as follows: - field a functions as discriminant, choosing either the b, or c+d variant. - when y.a happens to be set to 216, as above, gdb interprets this as the variable having the c+d variant (which is why trying to print y.b fails). - when printing y, gdb allocates a value, copies the bytes into it from the target, and then prints the value. - gdb allocates the value using the type size, which is 8. It's 8 because that's what the DW_AT_byte_size indicates. Note that for valid values of a, it gives correct results: if a is 0 (c+d variant), size is 12, if a is 1 (b variant), size is 8. - gdb tries to print field d, which is at an 8 byte offset, and that results in a out-of-bounds access for the allocated 8-byte value. Fix this by handling this case in value::contents_copy_raw, such that we have: ... (gdb) p y $1 = (a => 24, c => 9.18340949e-41, d => <error reading variable: access outside bounds of object>) ... An alternative (additional) fix could be this: in compute_variant_fields_inner gdb reads the discriminant y.a to decide which variant is active. It would be nice to detect that the value (y.a == 24) is not a valid Boolean, and give up on choosing a variant altoghether. However, the situation regarding the internal type CODE_TYPE_BOOL is currently ambiguous (see PR31282) and it's not possible to reliably decide what valid values are. The test-case source file gdb.ada/uninitialized-variable-record/parse.adb is a reduced version of gdb.ada/uninitialized_vars/parse.adb, so it copies the copyright years. Note that the test-case needs gcc-12 or newer, it's unsupported for older gcc versions. [ So, it would be nice to rewrite it into a dwarf assembly test-case. ] The test-case loops over all languages. This is inherited from an earlier attempt to fix this, which had language-specific fixes (in print_field_values, cp_print_value_fields, pascal_object_print_value_fields and f_language::value_print_inner). I've left this in, but I suppose it's not strictly necessary anymore. Tested on x86_64-linux. PR exp/31258 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31258

From the Python API, we can execute GDB commands via gdb.execute. If the command gives an exception, however, we need to recover the GDB prompt and enable stdin, because the exception does not reach top-level GDB or normal_stop. This was done in commit commit 1ba1ac8 Author: Andrew Burgess <andrew.burgess@embecosm.com> Date: Tue Nov 19 11:17:20 2019 +0000 gdb: Enable stdin on exception in execute_gdb_command with the following code: catch (const gdb_exception &except) { /* If an exception occurred then we won't hit normal_stop (), or have an exception reach the top level of the event loop, which are the two usual places in which stdin would be re-enabled. So, before we convert the exception and continue back in Python, we should re-enable stdin here. */ async_enable_stdin (); GDB_PY_HANDLE_EXCEPTION (except); } In this patch, we explain what happens when we run a GDB command in the context of a synchronous command, e.g. via Python observer notifications. As an example, suppose we have the following objfile event listener, specified in a file named file.py: ~~~ import gdb class MyListener: def __init__(self): gdb.events.new_objfile.connect(self.handle_new_objfile_event) self.processed_objfile = False def handle_new_objfile_event(self, event): if self.processed_objfile: return print("loading " + event.new_objfile.filename) self.processed_objfile = True gdb.execute('add-inferior -no-connection') gdb.execute('inferior 2') gdb.execute('target remote | gdbserver - /tmp/a.out') gdb.execute('inferior 1') the_listener = MyListener() ~~~ Using this Python file, we see the behavior below: $ gdb -q -ex "source file.py" -ex "run" --args a.out Reading symbols from a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 [New inferior 2] Added inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] stdin/stdout redirected Process /tmp/a.out created; pid = 3075406 Remote debugging using stdio Reading /tmp/a.out from remote target... ... [Switching to inferior 1 [process 3075400] (/tmp/a.out)] [Switching to thread 1.1 (process 3075400)] #0 0x00007ffff7fe3290 in ?? () from /lib64/ld-linux-x86-64.so.2 (gdb) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 1 (process 3075400) exited normally] Note how the GDB prompt comes in-between the debugger output. We have this obscure behavior, because the executed command, "target remote", triggers an invocation of `normal_stop` that enables stdin. After that, however, the Python notification context completes and GDB continues with its normal flow of executing the 'run' command. This can be seen in the call stack below: (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 #1 0x00005555561c3acd in normal_stop () at src/gdb/infrun.c:9432 #2 0x00005555561b328e in start_remote (from_tty=0) at src/gdb/infrun.c:3801 #3 0x0000555556441224 in remote_target::start_remote_1 (this=0x5555587882e0, from_tty=0, extended_p=0) at src/gdb/remote.c:5225 #4 0x000055555644166c in remote_target::start_remote (this=0x5555587882e0, from_tty=0, extended_p=0) at src/gdb/remote.c:5316 #5 0x00005555564430cf in remote_target::open_1 (name=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0, extended_p=0) at src/gdb/remote.c:6175 #6 0x0000555556441707 in remote_target::open (name=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0) at src/gdb/remote.c:5338 #7 0x00005555565ea63f in open_target (args=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0, command=0x555558589280) at src/gdb/target.c:824 #8 0x0000555555f0d89a in cmd_func (cmd=0x555558589280, args=0x55555878525e "| gdbserver - /tmp/a.out", from_tty=0) at src/gdb/cli/cli-decode.c:2735 #9 0x000055555661fb42 in execute_command (p=0x55555878529e "t", from_tty=0) at src/gdb/top.c:575 #10 0x0000555555f1a506 in execute_control_command_1 (cmd=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:529 #11 0x0000555555f1abea in execute_control_command (cmd=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:701 #12 0x0000555555f19fc7 in execute_control_commands (cmdlines=0x555558756f00, from_tty=0) at src/gdb/cli/cli-script.c:411 #13 0x0000555556400d91 in execute_gdb_command (self=0x7ffff43b5d00, args=0x7ffff440ab60, kw=0x0) at src/gdb/python/python.c:700 #14 0x00007ffff7a96023 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #15 0x00007ffff7a4dadc in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #16 0x00007ffff79e9a1c in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #17 0x00007ffff7b303af in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #18 0x00007ffff7a50358 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #19 0x00007ffff7a4f3f4 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #20 0x00007ffff7a4f883 in PyObject_CallFunctionObjArgs () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #21 0x00005555563a9758 in evpy_emit_event (event=0x7ffff42b5430, registry=0x7ffff42b4690) at src/gdb/python/py-event.c:104 #22 0x00005555563cb874 in emit_new_objfile_event (objfile=0x555558761700) at src/gdb/python/py-newobjfileevent.c:52 #23 0x00005555563b53bc in python_new_objfile (objfile=0x555558761700) at src/gdb/python/py-inferior.c:195 #24 0x0000555555d6dff0 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61 #25 0x0000555555d6be18 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111 #26 0x0000555555d69661 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffd080: 0x555558761700) at /usr/include/c++/11/bits/std_function.h:290 #27 0x0000555556314caf in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555585b5860, __args#0=0x555558761700) at /usr/include/c++/11/bits/std_function.h:590 #28 0x000055555631444e in gdb::observers::observable<objfile*>::notify (this=0x55555836eea0 <gdb::observers::new_objfile>, args#0=0x555558761700) at src/gdb/../gdbsupport/observable.h:166 #29 0x0000555556599b3f in symbol_file_add_with_addrs (abfd=..., name=0x55555875d310 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1125 #30 0x0000555556599ca4 in symbol_file_add_from_bfd (abfd=..., name=0x55555875d310 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1160 #31 0x0000555556546371 in solib_read_symbols (so=..., flags=...) at src/gdb/solib.c:692 #32 0x0000555556546f0f in solib_add (pattern=0x0, from_tty=0, readsyms=1) at src/gdb/solib.c:1015 #33 0x0000555556539891 in enable_break (info=0x55555874e180, from_tty=0) at src/gdb/solib-svr4.c:2416 #34 0x000055555653b305 in svr4_solib_create_inferior_hook (from_tty=0) at src/gdb/solib-svr4.c:3058 #35 0x0000555556547cee in solib_create_inferior_hook (from_tty=0) at src/gdb/solib.c:1217 #36 0x0000555556196f6a in post_create_inferior (from_tty=0) at src/gdb/infcmd.c:275 #37 0x0000555556197670 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at src/gdb/infcmd.c:486 #38 0x000055555619783f in run_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:512 #39 0x0000555555f0798d in do_simple_func (args=0x0, from_tty=1, c=0x555558567510) at src/gdb/cli/cli-decode.c:95 #40 0x0000555555f0d89a in cmd_func (cmd=0x555558567510, args=0x0, from_tty=1) at src/gdb/cli/cli-decode.c:2735 #41 0x000055555661fb42 in execute_command (p=0x7fffffffe2c4 "", from_tty=1) at src/gdb/top.c:575 #42 0x000055555626303b in catch_command_errors (command=0x55555661f4ab <execute_command(char const*, int)>, arg=0x7fffffffe2c1 "run", from_tty=1, do_bp_actions=true) at src/gdb/main.c:513 #43 0x000055555626328a in execute_cmdargs (cmdarg_vec=0x7fffffffdaf0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffda3c) at src/gdb/main.c:612 #44 0x0000555556264849 in captured_main_1 (context=0x7fffffffdd40) at src/gdb/main.c:1293 #45 0x0000555556264a7f in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1314 #46 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 #47 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) The use of the "target remote" command here is just an example. In principle, we would reproduce the problem with any command that triggers an invocation of `normal_stop`. To omit enabling the stdin in `normal_stop`, we would have to check the context we are in. Since we cannot do that, we add a new field to `struct ui` to track whether the prompt was already blocked, and set the tracker flag in the Python context before executing a GDB command. After applying this patch, the output becomes ... Reading symbols from a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 [New inferior 2] Added inferior 2 [Switching to inferior 2 [<null>] (<noexec>)] stdin/stdout redirected Process /tmp/a.out created; pid = 3032261 Remote debugging using stdio Reading /tmp/a.out from remote target... ... [Switching to inferior 1 [process 3032255] (/tmp/a.out)] [Switching to thread 1.1 (process 3032255)] #0 0x00007ffff7fe3290 in ?? () from /lib64/ld-linux-x86-64.so.2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [Inferior 1 (process 3032255) exited normally] (gdb) Let's now consider a secondary scenario, where the command executed from the Python raises an error. As an example, suppose we have the Python file below: def handle_new_objfile_event(self, event): ... print("loading " + event.new_objfile.filename) self.processed_objfile = True gdb.execute('print a') The executed command, "print a", gives an error because "a" is not defined. Without this patch, we see the behavior below, where the prompt is again placed incorrectly: ... Reading symbols from /tmp/a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 Python Exception <class 'gdb.error'>: No symbol "a" in current context. (gdb) [Inferior 1 (process 3980401) exited normally] This time, `async_enable_stdin` is called from the 'catch' block in `execute_gdb_command`: (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 #1 0x0000555556400f0a in execute_gdb_command (self=0x7ffff43b5d00, args=0x7ffff440ab60, kw=0x0) at src/gdb/python/python.c:713 #2 0x00007ffff7a96023 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #3 0x00007ffff7a4dadc in _PyObject_MakeTpCall () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #4 0x00007ffff79e9a1c in _PyEval_EvalFrameDefault () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #5 0x00007ffff7b303af in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #6 0x00007ffff7a50358 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #7 0x00007ffff7a4f3f4 in ?? () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #8 0x00007ffff7a4f883 in PyObject_CallFunctionObjArgs () from /lib/x86_64-linux-gnu/libpython3.10.so.1.0 #9 0x00005555563a9758 in evpy_emit_event (event=0x7ffff42b5430, registry=0x7ffff42b4690) at src/gdb/python/py-event.c:104 #10 0x00005555563cb874 in emit_new_objfile_event (objfile=0x555558761410) at src/gdb/python/py-newobjfileevent.c:52 #11 0x00005555563b53bc in python_new_objfile (objfile=0x555558761410) at src/gdb/python/py-inferior.c:195 #12 0x0000555555d6dff0 in std::__invoke_impl<void, void (*&)(objfile*), objfile*> (__f=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:61 #13 0x0000555555d6be18 in std::__invoke_r<void, void (*&)(objfile*), objfile*> (__fn=@0x5555585b5860: 0x5555563b5360 <python_new_objfile(objfile*)>) at /usr/include/c++/11/bits/invoke.h:111 #14 0x0000555555d69661 in std::_Function_handler<void (objfile*), void (*)(objfile*)>::_M_invoke(std::_Any_data const&, objfile*&&) (__functor=..., __args#0=@0x7fffffffd080: 0x555558761410) at /usr/include/c++/11/bits/std_function.h:290 #15 0x0000555556314caf in std::function<void (objfile*)>::operator()(objfile*) const (this=0x5555585b5860, __args#0=0x555558761410) at /usr/include/c++/11/bits/std_function.h:590 #16 0x000055555631444e in gdb::observers::observable<objfile*>::notify (this=0x55555836eea0 <gdb::observers::new_objfile>, args#0=0x555558761410) at src/gdb/../gdbsupport/observable.h:166 #17 0x0000555556599b3f in symbol_file_add_with_addrs (abfd=..., name=0x55555875d020 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1125 #18 0x0000555556599ca4 in symbol_file_add_from_bfd (abfd=..., name=0x55555875d020 "/lib64/ld-linux-x86-64.so.2", add_flags=..., addrs=0x7fffffffd2f0, flags=..., parent=0x0) at src/gdb/symfile.c:1160 #19 0x0000555556546371 in solib_read_symbols (so=..., flags=...) at src/gdb/solib.c:692 #20 0x0000555556546f0f in solib_add (pattern=0x0, from_tty=0, readsyms=1) at src/gdb/solib.c:1015 #21 0x0000555556539891 in enable_break (info=0x55555874a670, from_tty=0) at src/gdb/solib-svr4.c:2416 #22 0x000055555653b305 in svr4_solib_create_inferior_hook (from_tty=0) at src/gdb/solib-svr4.c:3058 #23 0x0000555556547cee in solib_create_inferior_hook (from_tty=0) at src/gdb/solib.c:1217 #24 0x0000555556196f6a in post_create_inferior (from_tty=0) at src/gdb/infcmd.c:275 #25 0x0000555556197670 in run_command_1 (args=0x0, from_tty=1, run_how=RUN_NORMAL) at src/gdb/infcmd.c:486 #26 0x000055555619783f in run_command (args=0x0, from_tty=1) at src/gdb/infcmd.c:512 #27 0x0000555555f0798d in do_simple_func (args=0x0, from_tty=1, c=0x555558567510) at src/gdb/cli/cli-decode.c:95 #28 0x0000555555f0d89a in cmd_func (cmd=0x555558567510, args=0x0, from_tty=1) at src/gdb/cli/cli-decode.c:2735 #29 0x000055555661fb42 in execute_command (p=0x7fffffffe2c4 "", from_tty=1) at src/gdb/top.c:575 #30 0x000055555626303b in catch_command_errors (command=0x55555661f4ab <execute_command(char const*, int)>, arg=0x7fffffffe2c1 "run", from_tty=1, do_bp_actions=true) at src/gdb/main.c:513 #31 0x000055555626328a in execute_cmdargs (cmdarg_vec=0x7fffffffdaf0, file_type=CMDARG_FILE, cmd_type=CMDARG_COMMAND, ret=0x7fffffffda3c) at src/gdb/main.c:612 #32 0x0000555556264849 in captured_main_1 (context=0x7fffffffdd40) at src/gdb/main.c:1293 #33 0x0000555556264a7f in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1314 #34 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 #35 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) Again, after we enable stdin, GDB continues with its normal flow of the 'run' command and receives the inferior's exit event, where it would have enabled stdin, if we had not done it prematurely. (top-gdb) bt #0 async_enable_stdin () at src/gdb/event-top.c:523 #1 0x00005555561c3acd in normal_stop () at src/gdb/infrun.c:9432 #2 0x00005555561b5bf1 in fetch_inferior_event () at src/gdb/infrun.c:4700 #3 0x000055555618d6a7 in inferior_event_handler (event_type=INF_REG_EVENT) at src/gdb/inf-loop.c:42 #4 0x000055555620ecdb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4316 #5 0x0000555556f33035 in handle_file_event (file_ptr=0x5555587024e0, ready_mask=1) at src/gdbsupport/event-loop.cc:573 #6 0x0000555556f3362f in gdb_wait_for_event (block=0) at src/gdbsupport/event-loop.cc:694 #7 0x0000555556f322cd in gdb_do_one_event (mstimeout=-1) at src/gdbsupport/event-loop.cc:217 #8 0x0000555556262df8 in start_event_loop () at src/gdb/main.c:407 #9 0x0000555556262f85 in captured_command_loop () at src/gdb/main.c:471 #10 0x0000555556264a84 in captured_main (data=0x7fffffffdd40) at src/gdb/main.c:1324 #11 0x0000555556264b2e in gdb_main (args=0x7fffffffdd40) at src/gdb/main.c:1343 #12 0x0000555555ceccab in main (argc=9, argv=0x7fffffffde78) at src/gdb/gdb.c:39 (top-gdb) The solution implemented by this patch addresses the problem. After applying the patch, the output becomes $ gdb -q -ex "source file.py" -ex "run" --args a.out Reading symbols from /tmp/a.out... Starting program: /tmp/a.out loading /lib64/ld-linux-x86-64.so.2 Python Exception <class 'gdb.error'>: No symbol "a" in current context. [Inferior 1 (process 3984511) exited normally] (gdb) Regression-tested on X86_64 Linux using the default board file (i.e. unix). Co-Authored-By: Oguzhan Karakaya <oguzhan.karakaya@intel.com> Reviewed-By: Guinevere Larsen <blarsen@redhat.com> Approved-By: Tom Tromey <tom@tromey.com>

This commit fixes bug PR 28942, that is, creating a conditional breakpoint in a multi-threaded inferior, where the breakpoint condition includes an inferior function call. Currently, when a user tries to create such a breakpoint, then GDB will fail with: (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ()) Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2460150)] [New Thread 0x7ffff745c700 (LWP 2460151)] [New Thread 0x7ffff6c5b700 (LWP 2460152)] [New Thread 0x7ffff645a700 (LWP 2460153)] [New Thread 0x7ffff5c59700 (LWP 2460154)] Error in testing breakpoint condition: Couldn't get registers: No such process. An error occurred while in a function called from GDB. Evaluation of the expression containing the function (return_true) will be abandoned. When the function is done executing, GDB will silently stop. Selected thread is running. (gdb) Or, in some cases, like this: (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1)) Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2461106)] [New Thread 0x7ffff745c700 (LWP 2461107)] ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. The precise error depends on the exact thread state; so there's race conditions depending on which threads have fully started, and which have not. But the underlying problem is always the same; when GDB tries to execute the inferior function call from within the breakpoint condition, GDB will, incorrectly, try to resume threads that are already running - GDB doesn't realise that some threads might already be running. The solution proposed in this patch requires an additional member variable thread_info::in_cond_eval. This flag is set to true (in breakpoint.c) when GDB is evaluating a breakpoint condition. In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is true, then GDB will only try to resume the current thread, that is, the thread for which the breakpoint condition is being evaluated. This solves the problem of GDB trying to resume threads that are already running. The next problem is that inferior function calls are assumed to be synchronous, that is, GDB doesn't expect to start an inferior function call in thread #1, then receive a stop from thread #2 for some other, unrelated reason. To prevent GDB responding to an event from another thread, we update fetch_inferior_event and do_target_wait in infrun.c, so that, when an inferior function call (on behalf of a breakpoint condition) is in progress, we only wait for events from the current thread (the one evaluating the condition). In do_target_wait I had to change the inferior_matches lambda function, which is used to select which inferior to wait on. Previously the logic was this: auto inferior_matches = [&wait_ptid] (inferior *inf) { return (inf->process_target () != nullptr && ptid_t (inf->pid).matches (wait_ptid)); }; This compares the pid of the inferior against the complete ptid we want to wait on. Before this commit wait_ptid was only ever minus_one_ptid (which is special, and means any process), and so every inferior would match. After this commit though wait_ptid might represent a specific thread in a specific inferior. If we compare the pid of the inferior to a specific ptid then these will not match. The fix is to compare against the pid extracted from the wait_ptid, not against the complete wait_ptid itself. In fetch_inferior_event, after receiving the event, we only want to stop all the other threads, and call inferior_event_handler with INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint. If we are, then all the other threads should be left doing whatever they were before. The inferior_event_handler call will be performed once the breakpoint condition has finished being evaluated, and GDB decides to stop or not. The final problem that needs solving relates to GDB's commit-resume mechanism, which allows GDB to collect resume requests into a single packet in order to reduce traffic to a remote target. The problem is that the commit-resume mechanism will not send any resume requests for an inferior if there are already events pending on the GDB side. Imagine an inferior with two threads. Both threads hit a breakpoint, maybe the same conditional breakpoint. At this point there are two pending events, one for each thread. GDB selects one of the events and spots that this is a conditional breakpoint, GDB evaluates the condition. The condition includes an inferior function call, so GDB sets up for the call and resumes the one thread, the resume request is added to the commit-resume queue. When the commit-resume queue is committed GDB sees that there is a pending event from another thread, and so doesn't send any resume requests to the actual target, GDB is assuming that when we wait we will select the event from the other thread. However, as this is an inferior function call for a condition evaluation, we will not select the event from the other thread, we only care about events from the thread that is evaluating the condition - and the resume for this thread was never sent to the target. And so, GDB hangs, waiting for an event from a thread that was never fully resumed. To fix this issue I have added the concept of "forcing" the commit-resume queue. When enabling commit resume, if the force flag is true, then any resumes will be committed to the target, even if there are other threads with pending events. A note on authorship: this patch was based on some work done by Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made some changes to their work in this version. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942 [1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com> Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>

…ro linux When running test-case gdb.threads/attach-stopped.exp on aarch64-linux, using the manjaro linux distro, I get: ... (gdb) thread apply all bt^M ^M Thread 2 (Thread 0xffff8d8af120 (LWP 278116) "attach-stopped"):^M #0 0x0000ffff8d964864 in clock_nanosleep () from /usr/lib/libc.so.6^M #1 0x0000ffff8d969cac in nanosleep () from /usr/lib/libc.so.6^M #2 0x0000ffff8d969b68 in sleep () from /usr/lib/libc.so.6^M #3 0x0000aaaade370828 in func (arg=0x0) at attach-stopped.c:29^M #4 0x0000ffff8d930aec in ?? () from /usr/lib/libc.so.6^M #5 0x0000ffff8d99a5dc in ?? () from /usr/lib/libc.so.6^M ^M Thread 1 (Thread 0xffff8db62020 (LWP 278111) "attach-stopped"):^M #0 0x0000ffff8d92d2d8 in ?? () from /usr/lib/libc.so.6^M #1 0x0000ffff8d9324b8 in ?? () from /usr/lib/libc.so.6^M #2 0x0000aaaade37086c in main () at attach-stopped.c:45^M (gdb) FAIL: gdb.threads/attach-stopped.exp: threaded: attach2 to stopped bt ... The problem is that the test-case expects to see start_thread: ... gdb_test "thread apply all bt" ".*sleep.*start_thread.*" \ "$threadtype: attach2 to stopped bt" ... but lack of symbols makes that impossible. Fix this by allowing " in ?? () from " as well. Tested on aarch64-linux. PR testsuite/31451 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31451

When running test-case gdb.server/connect-with-no-symbol-file.exp on aarch64-linux (specifically, an opensuse leap 15.5 container on a fedora asahi 39 system), I run into: ... (gdb) detach^M Detaching from program: target:connect-with-no-symbol-file, process 185104^M Ending remote debugging.^M terminate called after throwing an instance of 'gdb_exception_error'^M ... The detailed backtrace of the corefile is: ... (gdb) bt #0 0x0000ffff75504f54 in raise () from /lib64/libpthread.so.0 #1 0x00000000007a86b4 in handle_fatal_signal (sig=6) at gdb/event-top.c:926 #2 <signal handler called> #3 0x0000ffff74b977b4 in raise () from /lib64/libc.so.6 #4 0x0000ffff74b98c18 in abort () from /lib64/libc.so.6 #5 0x0000ffff74ea26f4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6 #6 0x0000ffff74ea011c in ?? () from /usr/lib64/libstdc++.so.6 #7 0x0000ffff74ea0180 in std::terminate() () from /usr/lib64/libstdc++.so.6 #8 0x0000ffff74ea0464 in __cxa_throw () from /usr/lib64/libstdc++.so.6 #9 0x0000000001548870 in throw_it (reason=RETURN_ERROR, error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...) at gdbsupport/common-exceptions.cc:203 #10 0x0000000001548920 in throw_verror (error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed", ap=...) at gdbsupport/common-exceptions.cc:211 #11 0x0000000001548a00 in throw_error (error=TARGET_CLOSE_ERROR, fmt=0x16c7810 "Remote connection closed") at gdbsupport/common-exceptions.cc:226 #12 0x0000000000ac8f2c in remote_target::readchar (this=0x233d3d90, timeout=2) at gdb/remote.c:9856 #13 0x0000000000ac9f04 in remote_target::getpkt (this=0x233d3d90, buf=0x233d40a8, forever=false, is_notif=0x0) at gdb/remote.c:10326 #14 0x0000000000acf3d0 in remote_target::remote_hostio_send_command (this=0x233d3d90, command_bytes=13, which_packet=17, remote_errno=0xfffff1a3cf38, attachment=0xfffff1a3ce88, attachment_len=0xfffff1a3ce90) at gdb/remote.c:12567 #15 0x0000000000ad03bc in remote_target::fileio_fstat (this=0x233d3d90, fd=3, st=0xfffff1a3d020, remote_errno=0xfffff1a3cf38) at gdb/remote.c:12979 #16 0x0000000000c39878 in target_fileio_fstat (fd=0, sb=0xfffff1a3d020, target_errno=0xfffff1a3cf38) at gdb/target.c:3315 #17 0x00000000007eee5c in target_fileio_stream::stat (this=0x233d4400, abfd=0x2323fc40, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:467 #18 0x00000000007f012c in <lambda(bfd*, void*, stat*)>::operator()(bfd *, void *, stat *) const (__closure=0x0, abfd=0x2323fc40, stream=0x233d4400, sb=0xfffff1a3d020) at gdb/gdb_bfd.c:955 #19 0x00000000007f015c in <lambda(bfd*, void*, stat*)>::_FUN(bfd *, void *, stat *) () at gdb/gdb_bfd.c:956 #20 0x0000000000f9b838 in opncls_bstat (abfd=0x2323fc40, sb=0xfffff1a3d020) at bfd/opncls.c:665 #21 0x0000000000f90adc in bfd_stat (abfd=0x2323fc40, statbuf=0xfffff1a3d020) at bfd/bfdio.c:431 #22 0x000000000065fe20 in reopen_exec_file () at gdb/corefile.c:52 #23 0x0000000000c3a3e8 in generic_mourn_inferior () at gdb/target.c:3642 #24 0x0000000000abf3f0 in remote_unpush_target (target=0x233d3d90) at gdb/remote.c:6067 #25 0x0000000000aca8b0 in remote_target::mourn_inferior (this=0x233d3d90) at gdb/remote.c:10587 #26 0x0000000000c387cc in target_mourn_inferior ( ptid=<error reading variable: Cannot access memory at address 0x2d310>) at gdb/target.c:2738 #27 0x0000000000abfff0 in remote_target::remote_detach_1 (this=0x233d3d90, inf=0x22fce540, from_tty=1) at gdb/remote.c:6421 #28 0x0000000000ac0094 in remote_target::detach (this=0x233d3d90, inf=0x22fce540, from_tty=1) at gdb/remote.c:6436 #29 0x0000000000c37c3c in target_detach (inf=0x22fce540, from_tty=1) at gdb/target.c:2526 #30 0x0000000000860424 in detach_command (args=0x0, from_tty=1) at gdb/infcmd.c:2817 #31 0x000000000060b594 in do_simple_func (args=0x0, from_tty=1, c=0x231431a0) at gdb/cli/cli-decode.c:94 #32 0x00000000006108c8 in cmd_func (cmd=0x231431a0, args=0x0, from_tty=1) at gdb/cli/cli-decode.c:2741 #33 0x0000000000c65a94 in execute_command (p=0x232e52f6 "", from_tty=1) at gdb/top.c:570 #34 0x00000000007a7d2c in command_handler (command=0x232e52f0 "") at gdb/event-top.c:566 #35 0x00000000007a8290 in command_line_handler (rl=...) at gdb/event-top.c:802 #36 0x0000000000c9092c in tui_command_line_handler (rl=...) at gdb/tui/tui-interp.c:103 #37 0x00000000007a750c in gdb_rl_callback_handler (rl=0x23385330 "detach") at gdb/event-top.c:258 #38 0x0000000000d910f4 in rl_callback_read_char () at readline/readline/callback.c:290 #39 0x00000000007a7338 in gdb_rl_callback_read_char_wrapper_noexcept () at gdb/event-top.c:194 #40 0x00000000007a73f0 in gdb_rl_callback_read_char_wrapper (client_data=0x22fbf640) at gdb/event-top.c:233 #41 0x0000000000cbee1c in stdin_event_handler (error=0, client_data=0x22fbf640) at gdb/ui.c:154 #42 0x000000000154ed60 in handle_file_event (file_ptr=0x232be730, ready_mask=1) at gdbsupport/event-loop.cc:572 #43 0x000000000154f21c in gdb_wait_for_event (block=1) at gdbsupport/event-loop.cc:693 #44 0x000000000154dec4 in gdb_do_one_event (mstimeout=-1) at gdbsupport/event-loop.cc:263 #45 0x0000000000910f98 in start_event_loop () at gdb/main.c:400 #46 0x0000000000911130 in captured_command_loop () at gdb/main.c:464 #47 0x0000000000912b5c in captured_main (data=0xfffff1a3db58) at gdb/main.c:1338 #48 0x0000000000912bf4 in gdb_main (args=0xfffff1a3db58) at gdb/main.c:1357 #49 0x00000000004170f4 in main (argc=10, argv=0xfffff1a3dcc8) at gdb/gdb.c:38 (gdb) ... The abort happens because a c++ exception escapes to c code, specifically opncls_bstat in bfd/opncls.c. Compiling with -fexceptions works around this. Fix this by catching the exception just before it escapes, in stat_trampoline and likewise in few similar spot. Add a new template catch_exceptions to do so in a consistent way. Tested on aarch64-linux. Approved-by: Pedro Alves <pedro@palves.net> PR remote/31577 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31577

If threads are disabled, either by --disable-threading explicitely, or by missing std::thread support, you get the following ASAN error when loading symbols: ==7310==ERROR: AddressSanitizer: heap-use-after-free on address 0x614000002128 at pc 0x00000098794a bp 0x7ffe37e6af70 sp 0x7ffe37e6af68 READ of size 1 at 0x614000002128 thread T0 #0 0x987949 in index_cache_store_context::store() const ../../gdb/dwarf2/index-cache.c:163 #1 0x943467 in cooked_index_worker::write_to_cache(cooked_index const*, deferred_warnings*) const ../../gdb/dwarf2/cooked-index.c:601 #2 0x1705e39 in std::function<void ()>::operator()() const /gcc/9/include/c++/9.2.0/bits/std_function.h:690 #3 0x1705e39 in gdb::task_group::impl::~impl() ../../gdbsupport/task-group.cc:38 0x614000002128 is located 232 bytes inside of 408-byte region [0x614000002040,0x6140000021d8) freed by thread T0 here: #0 0x7fd75ccf8ea5 in operator delete(void*, unsigned long) ../../.././libsanitizer/asan/asan_new_delete.cc:177 #1 0x9462e5 in cooked_index::index_for_writing() ../../gdb/dwarf2/cooked-index.h:689 #2 0x9462e5 in operator() ../../gdb/dwarf2/cooked-index.c:657 #3 0x9462e5 in _M_invoke /gcc/9/include/c++/9.2.0/bits/std_function.h:300 It's happening because cooked_index_worker::wait always returns true in this case, which tells cooked_index::wait it can delete the m_state cooked_index_worker member, but cooked_index_worker::write_to_cache tries to access it immediately afterwards. Fixed by making cooked_index_worker::wait only return true if desired_state is CACHE_DONE, same as if threading was enabled, so m_state will not be prematurely deleted. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31694 Approved-By: Tom Tromey <tom@tromey.com>

Similar to the x86_64 testcases, some .s files contain the corresponding CFI directives. This helps in validating the synthesized CFI by running those tests with and without the --scfi=experimental command line option. GAS issues some diagnostics, enabled by default, with --scfi=experimental. The diagnostics have been added with an intent to help user correct inadvertent errors in their hand-written asm. An error is issued when GAS finds that input asm is not amenable to accurate CFI synthesis. The existing scfi-diag-*.s tests in the gas/testsuite/gas/scfi/x86_64 directory test some SCFI diagnostics already: - (#1) "Warning: SCFI: Asymetrical register restore" - (#2) "Error: SCFI: usage of REG_FP as scratch not supported" - (#3) "Error: SCFI: unsupported stack manipulation pattern" - (#4) "Error: untraceable control flow for func 'XXX'" In the newly added aarch64 testsuite, further tests for additional diagnostics have been added: - scfi-diag-1.s in this patch highlights an aarch64-specific diagnostic: (#5) "Warning: SCFI: ignored probable save/restore op with reg offset" Additionally, some testcases are added to showcase the (currently) unsupported patterns, e.g., scfi-unsupported-1.s mov x16, 4384 sub sp, sp, x16 gas/testsuite/: * gas/scfi/README: Update comment to include aarch64. * gas/scfi/aarch64/scfi-aarch64.exp: New file. * gas/scfi/aarch64/ginsn-arith-1.l: New test. * gas/scfi/aarch64/ginsn-arith-1.s: New test. * gas/scfi/aarch64/ginsn-cofi-1.l: New test. * gas/scfi/aarch64/ginsn-cofi-1.s: New test. * gas/scfi/aarch64/ginsn-ldst-1.l: New test. * gas/scfi/aarch64/ginsn-ldst-1.s: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-1.d: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-1.l: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-1.s: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-2.d: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-2.l: New test. * gas/scfi/aarch64/scfi-callee-saved-fp-2.s: New test. * gas/scfi/aarch64/scfi-cb-1.d: New test. * gas/scfi/aarch64/scfi-cb-1.l: New test. * gas/scfi/aarch64/scfi-cb-1.s: New test. * gas/scfi/aarch64/scfi-cfg-1.d: New test. * gas/scfi/aarch64/scfi-cfg-1.l: New test. * gas/scfi/aarch64/scfi-cfg-1.s: New test. * gas/scfi/aarch64/scfi-cfg-2.d: New test. * gas/scfi/aarch64/scfi-cfg-2.l: New test. * gas/scfi/aarch64/scfi-cfg-2.s: New test. * gas/scfi/aarch64/scfi-cfg-3.d: New test. * gas/scfi/aarch64/scfi-cfg-3.l: New test. * gas/scfi/aarch64/scfi-cfg-3.s: New test. * gas/scfi/aarch64/scfi-cfg-4.l: New test. * gas/scfi/aarch64/scfi-cfg-4.s: New test. * gas/scfi/aarch64/scfi-cond-br-1.d: New test. * gas/scfi/aarch64/scfi-cond-br-1.l: New test. * gas/scfi/aarch64/scfi-cond-br-1.s: New test. * gas/scfi/aarch64/scfi-diag-1.l: New test. * gas/scfi/aarch64/scfi-diag-1.s: New test. * gas/scfi/aarch64/scfi-diag-2.l: New test. * gas/scfi/aarch64/scfi-diag-2.s: New test. * gas/scfi/aarch64/scfi-diag-3.l: New test. * gas/scfi/aarch64/scfi-diag-3.s: New test. * gas/scfi/aarch64/scfi-ldrp-1.d: New test. * gas/scfi/aarch64/scfi-ldrp-1.l: New test. * gas/scfi/aarch64/scfi-ldrp-1.s: New test. * gas/scfi/aarch64/scfi-ldrp-2.d: New test. * gas/scfi/aarch64/scfi-ldrp-2.l: New test. * gas/scfi/aarch64/scfi-ldrp-2.s: New test. * gas/scfi/aarch64/scfi-ldstnap-1.d: New test. * gas/scfi/aarch64/scfi-ldstnap-1.l: New test. * gas/scfi/aarch64/scfi-ldstnap-1.s: New test. * gas/scfi/aarch64/scfi-strp-1.d: New test. * gas/scfi/aarch64/scfi-strp-1.l: New test. * gas/scfi/aarch64/scfi-strp-1.s: New test. * gas/scfi/aarch64/scfi-strp-2.d: New test. * gas/scfi/aarch64/scfi-strp-2.l: New test. * gas/scfi/aarch64/scfi-strp-2.s: New test. * gas/scfi/aarch64/scfi-unsupported-1.l: New test. * gas/scfi/aarch64/scfi-unsupported-1.s: New test. * gas/scfi/aarch64/scfi-unsupported-2.l: New test. * gas/scfi/aarch64/scfi-unsupported-2.s: New test.

On arm-linux, I run into: ... PASS: gdb.ada/mi_task_arg.exp: mi runto task_switch.break_me Expecting: ^(-stack-list-arguments 1[^M ]+)?(\^done,stack-args=\[frame={level="0",args=\[\]},frame={level="1",args=\[{name="<_task>",value="0x[0-9A-Fa-f]+"}(,{name="<_taskL>",value="[0-9]+"})?\]},frame={level="2",args=\[({name="self_id",value="(0x[0-9A-Fa-f]+|<optimized out>)"})?\]},.*[^M ]+[(]gdb[)] ^M [ ]*) -stack-list-arguments 1^M ^done,stack-args=[frame={level="0",args=[]},frame={level="1",args=[{name="<_task>",value="0x40bc48"}]},frame={level="2",args=[]}]^M (gdb) ^M FAIL: gdb.ada/mi_task_arg.exp: -stack-list-arguments 1 (unexpected output) ... The problem is that the test-case expects a level 3 frame, but there is none. This can be reproduced using cli bt: ... $ gdb -q -batch outputs/gdb.ada/mi_task_arg/task_switch \ -ex "b task_switch.break_me" \ -ex run \ -ex bt Breakpoint 1 at 0x34b4: file task_switch.adb, line 57. Thread 3 "my_caller" hit Breakpoint 1, task_switch.break_me () \ at task_switch.adb:57 57 null; #0 task_switch.break_me () at task_switch.adb:57 #1 0x00403424 in task_switch.caller (<_task>=0x40bc48) at task_switch.adb:51 #2 0xf7f95a08 in ?? () from /lib/arm-linux-gnueabihf/libgnarl-12.so Backtrace stopped: previous frame identical to this frame (corrupt stack?) ... The purpose of the test-case is printing the frame at level 1, so I don't think we should bother about the presence of the frame at level 3. Fix this by allowing the backtrace to stop at level 2. Tested on arm-linux. Approved-By: Luis Machado <luis.machado@arm.com> Approved-By: Andrew Burgess <aburgess@redhat.com>

Since commit b1da98a ("gdb: remove use of alloca in new_macro_definition"), if cached_argv is empty, we call macro_bcache with a nullptr data. This ends up caught by UBSan deep down in the bcache code: $ ./gdb -nx -q --data-directory=data-directory /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp -readnow Reading symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp... Expanding full symbols from /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.base/macscp/macscp... /home/smarchi/src/binutils-gdb/gdb/bcache.c:195:12: runtime error: null pointer passed as argument 2, which is declared to never be null The backtrace: #1 0x00007ffff619a05d in __ubsan::__ubsan_handle_nonnull_arg_abort (Data=<optimized out>) at ../../../../src/libsanitizer/ubsan/ubsan_handlers.cpp:750 #2 0x000055556337fba2 in gdb::bcache::insert (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.c:195 #3 0x0000555564b49222 in gdb::bcache::insert<char const*, void> (this=0x62d0000c8458, addr=0x0, length=0, added=0x0) at /home/smarchi/src/binutils-gdb/gdb/bcache.h:158 #4 0x0000555564b481fa in macro_bcache<char const*> (t=0x62100007ae70, addr=0x0, len=0) at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:117 #5 0x0000555564b42b4a in new_macro_definition (t=0x62100007ae70, kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:573 #6 0x0000555564b44674 in macro_define_internal (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", kind=macro_function_like, special_kind=macro_ordinary, argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:777 #7 0x0000555564b44ae2 in macro_define_function (source=0x6210000ab9e0, line=469, name=0x7fffffffa710 "__va_arg_pack", argv=std::__debug::vector of length 0, capacity 0, replacement=0x62a00003af3a "__builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/macrotab.c:816 #8 0x0000555563f62fc8 in parse_macro_definition (file=0x6210000ab9e0, line=469, body=0x62a00003af2a "__va_arg_pack() __builtin_va_arg_pack ()") at /home/smarchi/src/binutils-gdb/gdb/dwarf2/macro.c:203 This can be reproduced by running gdb.base/macscp.exp. Avoid calling macro_bcache if the macro doesn't have any arguments. Change-Id: I33b5a7c3b3a93d5adba98983fcaae9c8522c383d

Some flavors of indirect call and jmp instructions were not being handled earlier, leading to a GAS error (#1): (#1) "Error: SCFI: unhandled op 0xff may cause incorrect CFI" Not handling jmp/call (direct or indirect) ops is an error (as shown above) because SCFI needs an accurate CFG to synthesize CFI correctly. Recall that the presence of indirect jmp/call, however, does make the CFG ineligible for SCFI. In other words, generating the ginsns for them now, will eventually cause SCFI to bail out later with an error (#2) anyway: (#2) "Error: untraceable control flow for func 'XXX'" The first error (#1) gives the impression of missing functionality in GAS. So, it seems cleaner to synthesize a GINSN_TYPE_JUMP / GINSN_TYPE_CALL now in the backend, and let SCFI machinery complain with the error as expected. The handling for these indirect jmp/call instructions is similar, so reuse the code by carving out a function for the same. Adjust the testcase to include the now handled jmp/call instructions as well. gas/ * config/tc-i386-ginsn.c (x86_ginsn_indirect_branch): New function. (x86_ginsn_new): Refactor out functionality to above. gas/testsuite/ * gas/scfi/x86_64/ginsn-cofi-1.l: Adjust the output. * gas/scfi/x86_64/ginsn-cofi-1.s: Add further varieties of jmp/call opcodes.

With test-case gdb.dwarf2/dw2-lines.exp on arm-linux, I run into: ... (gdb) break bar_label^M Breakpoint 2 at 0x4004f6: file dw2-lines.c, line 29.^M (gdb) continue^M Continuing.^M ^M Breakpoint 2, bar () at dw2-lines.c:29^M 29 foo (2);^M (gdb) PASS: $exp: cv=2: cdw=32: lv=2: ldw=32: continue to breakpoint: foo $1$ ... The pass is incorrect because the continue lands at line 29 with "foo (2)" instead of line line 27 with "foo (1)". A minimal version is: ... $ gdb -q -batch dw2-lines.cv-2-cdw-32-lv-2-ldw-32 -ex "b bar_label" Breakpoint 1 at 0x4f6: file dw2-lines.c, line 29. ... where: ... 000004ec <bar>: 4ec: b580 push {r7, lr} 4ee: af00 add r7, sp, #0 000004f0 <bar_label>: 4f0: 2001 movs r0, #1 4f2: f7ff fff1 bl 4d8 <foo> 000004f6 <bar_label_2>: 4f6: 2002 movs r0, #2 4f8: f7ff ffee bl 4d8 <foo> ... So, how does this happen? In short: - skip_prologue_sal calls arm_skip_prologue with pc == 0x4ec, - thumb_analyze_prologue returns 0x4f2 (overshooting by 1 insn, PR tdep/31981), and - skip_prologue_sal decides that we're mid-line, and updates to 0x4f6. However, this is a test-case about .debug_line info, so why didn't arm_skip_prologue use the line info to skip the prologue? The answer is that the line info starts at bar_label, not at bar. Fixing that allows us to work around PR tdep/31981. Likewise in gdb.dwarf2/dw2-line-number-zero.exp. Instead, add a new test-case gdb.arch/skip-prologue.exp that is dedicated to checking quality of architecture-specific prologue analysis, without being written in an architecture-specific way. If fails on arm-linux for both marm and mthumb: ... FAIL: gdb.arch/skip-prologue.exp: f2: $bp_addr == $prologue_end_addr (skipped too much) FAIL: gdb.arch/skip-prologue.exp: f4: $bp_addr == $prologue_end_addr (skipped too much) ... and passes for: - x86_64-linux for {m64,m32}x{-fno-PIE/-no-pie,-fPIE/-pie} - aarch64-linux. Tested on arm-linux.

When GDB opens a core file, in 'core_target::build_file_mappings ()', we collection information about the files that are mapped into the core file, specifically, the build-id and the DT_SONAME attribute for the file, which will be set for some shared libraries. We then cache the DT_SONAME to build-id information on the core file bfd object in the function set_cbfd_soname_build_id. Later, when we are loading the shared libraries for the core file, we can use the library's file name to look in the DT_SONAME to build-id map, and, if we find a matching entry, we can use the build-id to validate that we are loading the correct shared library. This works OK, but has some limitations: not every shared library will have a DT_SONAME attribute. Though it is good practice to add such an attribute, it's not required. A library without this attribute will not have its build-id checked, which can lead to GDB loading the wrong shared library. What I want to do in this commit is to improve GDB's ability to use the build-ids extracted in core_target::build_file_mappings to both validate the shared libraries being loaded, and then to use these build-ids to potentially find (via debuginfod) the shared library. To do this I propose making the following changes to GDB: (1) Rather than just recording the DT_SONAME to build-id mapping in set_cbfd_soname_build_id, we should also record, the full filename to build-id mapping, and also the memory ranges to build-id mapping for every memory range covered by every mapped file. (2) Add a new callback solib_ops::find_solib_addr. This callback takes a solib object and returns an (optional) address within the inferior that is part of this library. We can use this address to find a mapped file using the stored memory ranges which will increase the cases in which a match can be found. (3) Move the mapped file record keeping out of solib.c and into corelow.c. Future commits will make use of this information from other parts of GDB. This information was never solib specific, it lived in the solib.c file because that was the only user of the data, but really, the data is all about the core file, and should be stored in core_target, other parts of GDB can then query this data as needed. Now, when we load a shared library for a core file, we do the following lookups: 1. Is the exact filename of the shared library found in the filename to build-id map? If so then use this build-id for validation. 2. Find an address within the shared library using ::find_solib_addr and then look for an entry in the mapped address to build-id map. If an entry is found then use this build-id. 3. Finally, look in the soname to build-id map. If an entry is found then use this build-id. The addition of step #2 here means that GDB is now far more likely to find a suitable build-id for a shared library. Having acquired a build-id the existing code for using debuginfod to lookup a shared library object can trigger more often. On top of this, we also create a build-id to filename map. This is useful as often a shared library is implemented as a symbolic link to the actual shared library file. The mapped file information is stored based on the actual, real file name, while the shared library information holds the original symbolic link file name. If when loading the shared library, we find the symbolic link has disappeared, we can use the build-id to file name map to check if the actual file is still around, if it is (and if the build-id matches) then we can fall back to use that file. This is another way in which we can slightly increase the chances that GDB will find the required files when loading a core file. Adding all of the above required pretty much a full rewrite of the existing set_cbfd_soname_build_id function and the corresponding get_cbfd_soname_build_id function, so I have taken the opportunity to move the information caching out of solib.c and into corelow.c where it is now accessed through the function core_target_find_mapped_file. At this point the benefit of this move is not entirely obvious, though I don't think the new location is significantly worse than where it was originally. The benefit though is that the cached information is no longer tied to the shared library loading code. I already have a second set of patches (not in this series) that make use of this caching from elsewhere in GDB. I've not included those patches in this series as this series is already pretty big, but even if those follow up patches don't arrive, I think the new location is just as good as the original location. Rather that caching the information within the core file BFD via the registry mechanism, the information used for the mapped file lookup is now stored within the core_file target directly.

The commit: commit c6b4867 Date: Thu Mar 30 19:21:22 2023 +0100 gdb: parse pending breakpoint thread/task immediately Introduce a use bug where the value of a temporary variable was being used after it had gone out of scope. This was picked up by the address sanitizer and would result in this error: (gdb) maintenance selftest create_breakpoint_parse_arg_string Running selftest create_breakpoint_parse_arg_string. ================================================================= ==2265825==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fbb08046511 at pc 0x000001632230 bp 0x7fff7c2fb770 sp 0x7fff7c2fb768 READ of size 1 at 0x7fbb08046511 thread T0 #0 0x163222f in create_breakpoint_parse_arg_string(char const*, std::unique_ptr<char, gdb::xfree_deleter<char> >*, int*, int*, int*, std::unique_ptr<char, gdb::xfree_deleter<char> >*, bool*) ../../src/gdb/break-cond-parse.c:496 #1 0x1633026 in test ../../src/gdb/break-cond-parse.c:582 #2 0x163391b in create_breakpoint_parse_arg_string_tests ../../src/gdb/break-cond-parse.c:649 #3 0x12cfebc in void std::__invoke_impl<void, void (*&)()>(std::__invoke_other, void (*&)()) /usr/include/c++/13/bits/invoke.h:61 #4 0x12cc8ee in std::enable_if<is_invocable_r_v<void, void (*&)()>, void>::type std::__invoke_r<void, void (*&)()>(void (*&)()) /usr/include/c++/13/bits/invoke.h:111 #5 0x12c81e5 in std::_Function_handler<void (), void (*)()>::_M_invoke(std::_Any_data const&) /usr/include/c++/13/bits/std_function.h:290 #6 0x18bb51d in std::function<void ()>::operator()() const /usr/include/c++/13/bits/std_function.h:591 #7 0x4193ef9 in selftests::run_tests(gdb::array_view<char const* const>, bool) ../../src/gdbsupport/selftest.cc:100 #8 0x21c2206 in maintenance_selftest ../../src/gdb/maint.c:1172 ... etc ... The problem was caused by three lines like this one: thread_info *thr = parse_thread_id (std::string (t.get_value ()).c_str (), &tmptok); After parsing the thread-id TMPTOK would be left pointing into the temporary string which had been created on this line. When on the next line we did this: gdb_assert (*tmptok == '\0'); The value of *TMPTOK is undefined. Fix this by creating the std::string earlier in the scope. Now the contents of the string will remain valid when we check *TMPTOK. The address sanitizer issue is now resolved.

tromey added the rust label Apr 14, 2016

tromey added this to the upstream milestone Apr 14, 2016

tromey closed this as completed May 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove FIXME comments #2

remove FIXME comments #2

tromey commented Apr 14, 2016

tromey commented May 27, 2016

remove FIXME comments #2

remove FIXME comments #2

Comments

tromey commented Apr 14, 2016

tromey commented May 27, 2016