Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hooks to debug OpenSSL memory #101626

Closed
wants to merge 24 commits into from
Closed

add hooks to debug OpenSSL memory #101626

wants to merge 24 commits into from

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Apr 26, 2024

We had several cases when users complained about large memory use. For than native it is quite difficult to figure out where the memory goes. This PR aims to make that somewhat easier.

OpenSSL provides hooks for memory function so this PR adds switch to optimally hook into that.
The only one caveat that the CRYPTO_set_mem_functions works only if called before any allocations e.g. it needs to be done very early in the process. So I end up putting into initialization process .... even if I originally envisioned it somewhere else.

The simple use pattern is something like

export DOTNET_SYSTEM_NET_SECURITY_OPENSSL_MEMORY_DEBUG=1
var ci = typeof(SslStream).Assembly.GetTypes().First(t => t.Name == "CryptoInitializer");


do some TLS/crypto work


Console.WriteLine($"Bytes known to GC [{GC.GetTotalMemory(false)}], process working set [{process.WorkingSet64}]");
Console.WriteLine("OpenSSL memory {0}", ci.InvokeMember("TotalAllocatedMemory", BindingFlags.GetField | BindingFlags.NonPublic | BindingFlags.Static, null, null, null));

That provides insight how much memory is actually used by OpenSSL.
It allocates little bit more memory to store extra info but it should be reasonably cheap.

If somebody cares about more details they can do something like

ci.InvokeMember("EnableTracking", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static, null, null, null);

do some TLS/crypto work

Tuple<IntPtr, int, string>[] allocations = (Tuple<IntPtr, int, string>[])si.InvokeMember("GetIncrementalAllocations", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static | BindingFlags.Instance , null, null, null);
for (int j = 0; j < allocations.Length; j++)
{
    (IntPtr ptr, int size, string src) = allocations[j];
    Console.WriteLine("Allocated {0} bytes at 0x{1:x} from {2}", size, ptr, src);
}

this would provide something like

Allocated 81 bytes at 0x7f0c8013d448 from ../crypto/err/err.c:820
Allocated 3 bytes at 0x7f0c8000bd28 from ../crypto/asn1/asn1_lib.c:308
Allocated 81 bytes at 0x7f0c8c13a108 from ../crypto/err/err.c:820
Allocated 40 bytes at 0x7f0c80126df8 from ../crypto/x509/x_name.c:92
Allocated 13 bytes at 0x7f0c8013b438 from ../crypto/asn1/asn1_lib.c:308

dumping large allocation data set is slow and expensive. It is done under local so it blocks all other OpenSSL allocations. I feel this is ok for now but it should be used with caution. I also feel that access through Reflection is OK since this is only last resort debug hook e.g. it does not need stable API and convenient access.

@wfurt
Copy link
Member Author

wfurt commented Apr 27, 2024

It looks like the build is failing because we are trying to build agains OpenSSL 1.0 that is EOS since 2019.

 -- Found OpenSSL: /crossrootfs/x64/usr/lib/x86_64-linux-gnu/libcrypto.so (found version "1.0.2g")  

I was tempted to simply disable the debug feature for that version as the very old OpenSSL has different prototype.
But it seems like we would loose that for all platforms.

Any thoughts @bartonjs on moving the build to at least 1.1.1 that is EOS only since last year and some distributions we support are still using it?

There are probably different ways how to solve the build problems but I feel it is perhaps finally time to ditch 1.0.

@jkotas
Copy link
Member

jkotas commented Apr 27, 2024

the build problems

We build against Ubuntu 16.04 headers and libs currently #83428 . It is likely going to stay that way for .NET 9.

@wfurt
Copy link
Member Author

wfurt commented Apr 28, 2024

the build problems

We build against Ubuntu 16.04 headers and libs currently #83428 . It is likely going to stay that way for .NET 9.

What is reason for it @jkotas? It seems like even 8.0 does not support 16.04: https://github.com/dotnet/core/blob/main/release-notes/8.0/supported-os.md

@jkotas
Copy link
Member

jkotas commented Apr 28, 2024

It is same deal as Windows 7. It is not supported, but we avoid intentionally breaking it to help some important customers.

@vcsjones
Copy link
Member

trying to build agains OpenSSL 1.0 that is EOS since 2019.

OpenSSL version support is not as simple as the OpenSSL support policy. Distros will continue to use EOL versions of OpenSSL but backport fixes under their own LTS support policy.

I don't think .NET actually "officially" supports any Linux distros with 1.0.2 anymore. However 1.0.2k/g has played an important role far past its 2019 EOL and there are a number of Linux distros that still support it today.

@vcsjones
Copy link
Member

That said, since this is a diagnostic feature, I don't know that it makes sense to go through any particular lengths to get it working with 1.0.

@jkotas
Copy link
Member

jkotas commented Apr 30, 2024

malloc/free can be used in more places than it is ok to run managed code. For example, you can use malloc/free in thread detach callback, but running managed code in thread detach callback is not safe/reliable. What's our confidence level that OpenSSL only ever calls malloc/free in places where it is safe to run managed code? At minimum, this should get a full CI run with this instrumentation unconditionally enabled to see whether it is going to hit any crashes.

@filipnavara
Copy link
Member

recently I have seen high memory usage due to lots of memory buffers being cached by malloc internally, see #101552

OT: Turns out this was immensely useful hint. We added tracking of the malloc metrics from mallinfo2. The data after a few days show that the memory usage growth is the malloc arena and that the size of the free list also grows. There may not be a memory leak after all, just a lot of reserved memory from the native allocator.

Comment on lines +29 to +31
#pragma warning disable CA1823
private static readonly bool MemoryDebug = GetMemoryDebug();
#pragma warning restore CA1823
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the field is not used anywhere, can we move the call to GetMemoryDebug to cctor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be moved to Interop.Crypto, because following code

var ci = typeof(SslStream).Assembly.GetTypes().First(t => t.Name == "Crypto");
ci.InvokeMember("EnableTracking", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static, null, null, null);

Will enable tracking, but later when Interop.OpenSsl gets initialized, it turns the tracking of because of

                Interop.Crypto.EnableTracking();
                Interop.Crypto.GetIncrementalAllocations();
                Interop.Crypto.DisableTracking();

in GetMemoryDebug()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the comment. Based on the feedback I specifically removed the option to enable the detailed tracking via environment to avoid cases when managed code may be invoked on incompatible thread. What remains is ability to subscribe for the detailed reporting later (when all the initialization is done) when caller feels it is safe. I know this part may be tricky to describe. But so far I failed to construct case where it failed e.g. all the cases I was interested in so far just worked and provided the info I was looking for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following program will not print any memory addresses, although one would expect that it should

// var ossl = typeof(SslStream).Assembly.GetTypes().First(t => t.Name == "OpenSsl");
// ossl.InvokeMember("GetMemoryDebug", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static, null, null, null);
var ci = typeof(SslStream).Assembly.GetTypes().First(t => t.Name == "Crypto");
ci.InvokeMember("EnableTracking", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static, null, null, null);

HttpClient client = new HttpClient();
await client.GetAsync("https://www.microsoft.com");

using Process process = Process.GetCurrentProcess();

Console.WriteLine($"Bytes known to GC [{GC.GetTotalMemory(false)}], process working set [{process.WorkingSet64}]");
Console.WriteLine("OpenSSL memory {0}", ci.InvokeMember("GetOpenSslAllocatedMemory", BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.Static, null, null, null));

Tuple<UIntPtr, int, string>[] allocations = (Tuple<UIntPtr, int, string>[])ci.InvokeMember("GetIncrementalAllocations", BindingFlags.InvokeMethod | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static | BindingFlags.Instance, null, null, null);
for (int j = 0; j < allocations.Length; j++)
{
    (UIntPtr ptr, int size, string src) = allocations[j];
    Console.WriteLine("Allocated {0} bytes at 0x{1:x} from {2}", size, ptr, src);
}

It starts to work if you uncomment the first two lines.

Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
Comment on lines +263 to +270
Tuple<UIntPtr, int, string>[] allocations = new Tuple<UIntPtr, int, string>[_allocations.Count];
int index = 0;
foreach ((UIntPtr ptr, UIntPtr value) in _allocations)
{
ref MemoryEntry entry = ref *(MemoryEntry*)ptr;
allocations[index] = new Tuple<UIntPtr, int, string>(ptr + Offset, entry.Size, $"{Marshal.PtrToStringAnsi((IntPtr)entry.File)}:{entry.Line}");
index++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the _allocations.Count value change while iterating over _allocations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I expect the ConcurrentDictionary to handle parallel access gracefully but you may not get consistent snapshot. We did it while back with the HashSet and manual locking. Since the iteration can take a while I feel it is better to allow concurrent access and deliver what we can. In ideal situation one would enable tracking run the repro and dump whatever it remains.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that the new size may not necessarily match the size of the allocations array, and allocations[index] access may throw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, ConcurrentDictionary.Count is extremely slow.

_allocations!.Clear();
}

public static unsafe Tuple<UIntPtr, int, string>[] GetIncrementalAllocations()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original PR had IntPtr, is there some reason to use UIntPtr over IntPtr here?

@wfurt wfurt marked this pull request as draft June 25, 2024 07:27
return Array.Empty<Tuple<UIntPtr, int, string>>();
}

Tuple<UIntPtr, int, string>[] allocations = new Tuple<UIntPtr, int, string>[_allocations.Count];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was browsing PRs and accidentally stumbled upon this one.

Is there a specific reason this code uses obsolete Tuple over ValueTple aka (UIntPtr, int, string)? Thanks!

@dotnet-policy-service dotnet-policy-service bot removed this from the 9.0.0 milestone Jul 27, 2024
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@wfurt wfurt reopened this Jul 28, 2024
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@karelz karelz added this to the 9.0.0 milestone Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants