Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM64] Change INS_bkpt to INS_brk for INS_BREAKPOINT #892

Merged
merged 7 commits into from
Jan 6, 2020
Merged

[ARM64] Change INS_bkpt to INS_brk for INS_BREAKPOINT #892

merged 7 commits into from
Jan 6, 2020

Conversation

xiangzhai
Copy link
Contributor

#606

Cheers,
Leslie Zhai

@jkotas jkotas added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 16, 2019
@xiangzhai
Copy link
Contributor Author

xiangzhai commented Dec 16, 2019

False positive?

  • runtime-coreclr — #20191215.15 failed
    Not found failure in the Details

@BruceForstall
Copy link
Member

Since arm64 shouldn't ever use bkpt, can you also:

  • change the use of INS_bkpt in CodeGen::genCallFinally() in codegenarm64.cpp,
  • remove bkpt from the comment in emitarm64.cpp (case IF_SN_0A: // bkpt, nop)
  • remove the encoding for bkpt in instrsarm64.h:
INST1(bkpt,    "bkpt",   0, 0, IF_SN_0A,  0xD43E0000)
                                   //  brpt                         SN_0A  1101010000111110 0000000000000000   D43E 0000   0xF000

?

@xiangzhai
Copy link
Contributor Author

Hi @BruceForstall

Thanks for your review!

I updated the patch as you suggested, please review it again.

Cheers,
Leslie Zhai

@xiangzhai
Copy link
Contributor Author

##[error]The HTTP request timed out after 00:01:40.

How to ci-help?

@BruceForstall
Copy link
Member

Unfortunately, it looks like the situation is a little more complicated that it should be.

I reproduced the failure and your fix on Linux, and it looks good.

I tried the existing code and your fix on Windows, and it doesn't work. With the existing code (using bkpt), setting COMPlus_JitHalt to some function and running a test, the Windows windbg debugger works as desired, and stops at that instruction. With your change, windbg shows an illegal instruction exception, and kills the process.

So, I think we need to leave the existing code, but #ifdef your fix (for defining INS_BREAKPOINT especially) under TARGET_UNIX.

@xiangzhai
Copy link
Contributor Author

So, I think we need to leave the existing code, but #ifdef your fix (for defining INS_BREAKPOINT especially) under TARGET_UNIX.

Updated as you suggested. Please review it again.

Cheers,
Leslie Zhai

@xiangzhai
Copy link
Contributor Author

##error Format job found errors, please apply the format patch.
##[error]C:\Python\python.exe failed with return code: 4294967294

Any suggestion?

@BruceForstall
Copy link
Member

@xiangzhai It looks like you ran into a pre-existing problem with the formatting jobs (see #990 for some discussion). I would normally encourage you to look at https://github.com/dotnet/jitutils and run "jit-format" over your code. However, it looks like that hasn't been properly updated to work with the new dotnet/runtime repo. For now, you might need to just wait until this gets fixed before re-testing.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change looks good to me. Thanks!

@xiangzhai
Copy link
Contributor Author

Hi @BruceForstall

Thanks for your review and teaching!

Cheers,
Leslie Zhai

@TamarChristinaArm
Copy link
Contributor

Since arm64 shouldn't ever use bkpt, can you also:

  • change the use of INS_bkpt in CodeGen::genCallFinally() in codegenarm64.cpp,
  • remove bkpt from the comment in emitarm64.cpp (case IF_SN_0A: // bkpt, nop)
  • remove the encoding for bkpt in instrsarm64.h:
INST1(bkpt,    "bkpt",   0, 0, IF_SN_0A,  0xD43E0000)
                                   //  brpt                         SN_0A  1101010000111110 0000000000000000   D43E 0000   0xF000

?

Something is odd here.. I was wondering how this was working at all on Windows if this was really a bkpt instruction. Turns out it's not. This encoding is for brk so it seems that it's just the instruction label is wrong. The encoding for bkpt in instrarm64.h is the encoding for brk 0xF00 which is a valid breakpoint in AArch64.

the encoding for brk seems to be encoding brk 0x0 but both instructions should have worked in both debuggers..

0000000000000000 <.text>:
   0:   d43e0000        brk     #0xf000
   4:   d4200000        brk     #0x0

The immediate effects what kind of signal the kernel raises on Linux, but for both 0x0 and 0xF00 the kernel correctly raises SIGTRAP but gdb is choosing not to handle it.. I'm not sure why, but because of this brk 0xF00 causes a loop in the signal handler of gdb and the program "hangs".

I would suggest renaming bkpt in instrsarm64.h to something else to avoid this confusion.. My suggestion would be brk_windows and brk_linux.

@TamarChristinaArm
Copy link
Contributor

It seems the issue for GDB is that whenever it traps the SIGTRAP it loops through its list of breakpoints and because it doesn't find a breakpoint set at that point it continues the program, which triggers another SIGTRAP and so on.
For some reason it's treating brk 0x0 differently but I'm not sure why.

GDB is expecting the breakpoint to be set through ptrace with NT_ARM_HW_BREAK but in my opinion the brk should work... I'll chase this up with the GDB folks, but for the current and old versions of GDB brk 0x0 seems to be the only one that'll work.

@BruceForstall
Copy link
Member

@TamarChristinaArm Thanks for looking. I did see that windbg disassembled what the JIT calls bkpt as brk 0xf000, but I didn't propagate that information up to this issue. So for some reason, then, apparently windbg / Windows likes this one, but doesn't like brk 0x0.

@TamarChristinaArm
Copy link
Contributor

@TamarChristinaArm Thanks for looking. I did see that windbg disassembled what the JIT calls bkpt as brk 0xf000, but I didn't propagate that information up to this issue. So for some reason, then, apparently windbg / Windows likes this one, but doesn't like brk 0x0.

Yeah, for GDB it seems that brk 0x0 was hardcoded into it. They're looking if they can relax this, or at the very least not get into an infinite loop. windbg probably has a similar reason I would imagine.

@xiangzhai
Copy link
Contributor Author

xiangzhai commented Dec 19, 2019

Hi @TamarChristinaArm

Thanks for your teaching!

I reproduced the failure and your fix on Linux, and it looks good.

} Jitted Entry 0d0 at 0000007f`3d3b2360 method HelloWorldTest.Program:Main(ref) size 00000054

Thread 1 "corerun" received signal SIGTRAP, Trace/breakpoint trap.
0x0000007f3d3b2364 in ?? ()
(gdb) set $pc+=4
(gdb) x/22i 0x0000007f3d3b2364-44
   0x7f3d3b2338:	adr	x12, 0x7f3d3b2338
   0x7f3d3b233c:	ldr	x11, 0x7f3d3b2348
   0x7f3d3b2340:	br	x11
   0x7f3d3b2344:	.inst	0x00f20000 ; undefined
   0x7f3d3b2348:	tbnz	x28, #36, 0x7f3d3b57c8
   0x7f3d3b234c:	.inst	0x0000007f ; undefined
   0x7f3d3b2350:	ldr	b24, [x21,#2644]
   0x7f3d3b2354:	.inst	0x0000007f ; undefined
   0x7f3d3b2358:	ldr	b16, [x7,#2696]
   0x7f3d3b235c:	.inst	0x0000007f ; undefined
   0x7f3d3b2360:	nop
   0x7f3d3b2364:	brk	#0x0
=> 0x7f3d3b2368:	stp	x29, x30, [sp,#-32]!
   0x7f3d3b236c:	mov	x29, sp
   0x7f3d3b2370:	str	x0, [x29,#24]
   0x7f3d3b2374:	mov	x0, #0x75f8                	// #30200
   0x7f3d3b2378:	movk	x0, #0x3d5d, lsl #16
   0x7f3d3b237c:	movk	x0, #0x7f, lsl #32
   0x7f3d3b2380:	ldr	w0, [x0]
   0x7f3d3b2384:	cbz	w0, 0x7f3d3b238c
   0x7f3d3b2388:	bl	0x7f3d377bd0
   0x7f3d3b238c:	nop
(gdb) i r x0 x1 x2
x0             0x7f18007bf8	545863531512
x1             0x7fffffe4d0	549755806928
x2             0x7fffffe0c0	549755805888
(gdb) si
0x0000007f3d3b236c in ?? ()
(gdb) si
0x0000007f3d3b2370 in ?? ()
(gdb) si
0x0000007f3d3b2374 in ?? ()
(gdb) x/22i $pc
=> 0x7f3d3b2374:	mov	x0, #0x75f8                	// #30200
   0x7f3d3b2378:	movk	x0, #0x3d5d, lsl #16
   0x7f3d3b237c:	movk	x0, #0x7f, lsl #32
   0x7f3d3b2380:	ldr	w0, [x0]
   0x7f3d3b2384:	cbz	w0, 0x7f3d3b238c
   0x7f3d3b2388:	bl	0x7f3d377bd0
   0x7f3d3b238c:	nop
   0x7f3d3b2390:	mov	x0, #0x11a8                	// #4520
   0x7f3d3b2394:	movk	x0, #0x2800, lsl #16
   0x7f3d3b2398:	movk	x0, #0x7f, lsl #32
   0x7f3d3b239c:	ldr	x0, [x0]
   0x7f3d3b23a0:	bl	0x7f3d3b1c40
   0x7f3d3b23a4:	nop
   0x7f3d3b23a8:	nop
   0x7f3d3b23ac:	ldp	x29, x30, [sp],#32
   0x7f3d3b23b0:	ret

Cheers,
Leslie Zhai

@xiangzhai
Copy link
Contributor Author

Hi @BruceForstall

How to retrigger the CI checkers? #990 has been merged. Could you please help me to retrigger the test?

Thanks,
Leslie Zhai

@BruceForstall
Copy link
Member

Looks like I managed to re-trigger it.

cc @CarolEidt

@BruceForstall
Copy link
Member

(If you click on "Details", there is a "Re-run" link on failing jobs)

@xiangzhai
Copy link
Contributor Author

(If you click on "Details", there is a "Re-run" link on failing jobs)

The "Re-run" link is not available to me. Perhaps I do not have the access permission?

@xiangzhai
Copy link
Contributor Author

Hi @BruceForstall

For now, you might need to just wait until this gets fixed before re-testing.

#990 fixed the issue or not?

Build #20191223.30 had test failures

Regression?

Thanks,
Leslie Zhai

@xiangzhai
Copy link
Contributor Author

Merry Christmas :)

@BruceForstall
Copy link
Member

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@BruceForstall
Copy link
Member

/azp list

@BruceForstall
Copy link
Member

I'm going to close and re-open to (hopefully) get re-testing. Note that the Windows ARM jobs are still broken (#702).

@BruceForstall
Copy link
Member

Looks like there are actual formatting job issues:

clang-format: there are formatting errors in F:\workspace\_work\1\s\src\coreclr\src\jit\compiler.h
At Line 10610 Before:
const instruction INS_MULADD     = INS_madd;
After:
const instruction INS_MULADD = INS_madd;

Clang-format found formatting errors.
jit-format found formatting errors. Run:
	jit-format --fix --untidy

@xiangzhai
Copy link
Contributor Author

Hi @BruceForstall

Welcome back :)

Looks like there are actual formatting job issues:

clang-format: there are formatting errors in F:\workspace\_work\1\s\src\coreclr\src\jit\compiler.h
At Line 10610 Before:
const instruction INS_MULADD     = INS_madd;
After:
const instruction INS_MULADD = INS_madd;

Before is someone might want to right alignment?

Could I change the const instruction INS_MULADD = INS_madd;?

Thanks,
Leslie Zhai

@BruceForstall
Copy link
Member

You could try that.

Or, you could set yourself up to run jit-format from the dotnet/jitutils repo -- which would be good to be able to do.

Or, you can download the patch files created by the failing CI job and apply those. From the failing Linux formatting job, it gives these instructions:

There were errors in formatting. Please run jit-format locally with: 

	jit-format -a x64 -b Checked -o Linux -c <absolute-path-to-coreclr> --verbose --fix --projects dll


Or download and apply generated patch:
1. From the GitHub 'Checks' page on the Pull Request, with the failing Formatting
   job selected (e.g., 'Formatting Linux x64'), click the 'View more details on
   Azure Pipelines' link.
3. Select the 'Summary' tab.
4. Open the 'Build artifacts published' entry.
5. Find the link to the OS/architecture appropriate format patch file.
6. Click on the link to download it.
7. Unzip the patch file.
8. git apply format.patch

Unfortunately, it looks like Azure DevOps changed their UI. But the bottom-line is: find the format.patch artifacts stored with the CI job (they will also have OS / architecture in their name), and apply them.

@xiangzhai
Copy link
Contributor Author

Hi @BruceForstall

No format issue any more.

But I have no idea about other failures. False positive?

Thanks,
Leslie Zhai

@sandreenko
Copy link
Contributor

But I have no idea about other failures. False positive?

Most likely from what I see, arm and crossgen-comparison are known. x64 failed during "Download native test artifacts" which I have not seen before, but it is clearly infra.

@xiangzhai
Copy link
Contributor Author

Hi @sandreenko

Thanks for your kind response!

The patch is accepted and ready to land?

Cheers,
Leslie Zhai

@BruceForstall
Copy link
Member

Since it looks like infrastructure issues in the CI failures, I'll go ahead and merge it.

@xiangzhai Thanks for your patience and various iterations!

@BruceForstall BruceForstall merged commit 3493718 into dotnet:master Jan 6, 2020
@xiangzhai
Copy link
Contributor Author

Hi,

You are welcome :)

Cheers,
Leslie Zhai

@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants