-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nibblemapmacros do a linear scan #93550
Comments
I edited the post above to remove a reference to JVM source code. On the .NET project we have to be careful that we don't use GPL source (or have any appearance or question that it might have been used). Sorry, I know it can be awkward limitation, but we need to ensure potential contributors don't read related GPL source so there is no question about the origin of contributions. |
@fabled I am doing some investigation here, nothing too deep though. I know that @davidwrighton was also noodling on this data structure. |
@AaronRobinsonMSFT Yep, I've noodled on this a bit, and come up with a model that implements the find algorithm as a O(1) operation... Its also somewhat complicated by a desire to support the lock free nature of reads of this structure, and iteration, and I'm not convinced it entirely meets all of those needs, but it might be a good choice if we choose to do any work in this space this release. NOTE: while I've looked at this as a theoretical improvement, I have not looked into whether or not this would provide a performance improvement to details such as stack walking. In my previous explorations of the performance of that system, I identified that the linked list we had for finding the CodeHeap itself was a significant performance problem, but the nibble table walks never showed up in any of my performance tests. (See #79021 where I introduced the I would expect that this algorithm would be unlikely to significantly affect the performance of much code. However, I do understand that EBPF filters have strong guarantees of forward progress implemented by the lack of real support for iteration, which the previous theoretically unbounded iteration algorithm breaks. I think that support for being possible to write an EBPF based stack walker could be quite useful for diagnostics on Linux platforms, and efforts to make that possible seem like a good idea to me. Note the approach below here effectively implements jumps as you suggested, and I believe could be implemented as a single linear set of if statements and operations. Proposed new nibble map layout lookup algorithm and structureThe intention here is to build a data structure which permits O(1) access once we reach a lookup at the nibble map layer. The dependencies it takes, are that:
The data structure must support the following algorithms
There shall be an array of 32bit unsigned integers ( Each 32bit unsigned value may be an array of 8 4-bit nibbles, or an encoded 32bit offset from the start of the CodeHeap memory section to the start of the associated method.
The algorithm to parse this data is as follows. Be aware that this algorithm has not actually be tested/implemented, so this is likely nearly correct, but unlikely to be completely correct.
HELPER Methods
|
Do I understand correctly that this is for looking up methods from their addresses? If so, would improvements in this area solve the issue I had in #85197 where invalid pointers would crash the VM when looking them up? |
This specific improvement would not help with your issue. The data structure proposed here has same function properties as the nibble map, it just has different perf characteristics. |
@davidwrighton - I spent a little while dissecting what you have and it seemed good to me. This was my simplified mental model of it which others might find useful. I know it doesn't include all the details, but let me know if anything it does include is inaccurate.
|
This comment was marked as off-topic.
This comment was marked as off-topic.
@AaronRobinsonMSFT No, there is only 1 method start in a 32byte region in the algorithm I presented, at the 256 byte region up to 8 method starts could exist. Now that I've spent a few more moments thinking on this, I suspect we should do some analysis to determine the number of methods which are < 48 and greater than 24 bytes bytes on our 64bit platforms, as you could reduce the amount of memory needed for this sort of map so that each 32bit value represents 64bytes of executable space. If it turns out that we have very few of those, it would probably be preferable to have the system do all of its alignment in terms of 8 bytes instead of 4 bytes, working with 64 byte and 512 byte regions. |
Apologies, you're right. I misinterpreted the 256 and confused that with DWORDs and other units that are defined in the current implementation. |
I am currently researching the dotnet stack unwinding, and came across the nibblemap at https://github.com/dotnet/runtime/blob/main/src/coreclr/inc/nibblemapmacros.h.
It seems that the code is optimized to assume that JIT code generated is very small code blobs which are potentially unaligned. And on the contrast it performs poorly when mapping RIP from the end of a large code blobs. This is because a linear scan of the nibblemap is required to find the function beginning. Such linear scanning is done at e.g. https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/codeman.cpp#L4051-L4056 (but also at other places).
I am also currently looking into reimplementing all of this in ebpf code (in Linux). This puts some constraints on how many iterations of the nibblemap scanning can be done. While some of these can be worked around, I think if some other data structure could serve here better.
I believe optimizing this map would also speed up unwinding in dotnet core runtime itself too. Do you have plans to improve this map, or would you welcome contributed work in this?
The existing implementation could be just improved by adding a special encoding to indicate a jump.
I would like to discuss potential algorithmic / data structures improvements, and potentially contribute work towards making it happen if a mutually acceptable approach can be reached.
The text was updated successfully, but these errors were encountered: