Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE loader/image cleanups. No-copy mapping of R2R PEs on Windows. #61938

Merged
merged 14 commits into from
Dec 14, 2021

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Nov 22, 2021

A few changes with the goal of rationalizing/simplifying handling of PE image layouts. Also adds support for no-copy mapping of R2R assemblies when need to load manually (i.e. when loading from a singlefile bundle).

  • reduced the number of layout slots per PEImage to 2 - FLAT and LOADED.
    (MAPPED was functionally the same as LOADED, the original reason for it was likely support for native resources)
  • reduced number of PEImageLayout types. There was quite a bit of overlap between them, mostly because of MAPPED.
  • implemented support for no-copy mapping of R2R images on Windows.
    (this makes startup of composite r2r singlefile apps the same as non-singlefile, which is up to 2x improvement, depending on scenario,this opens the path to making composite the default mode for singlefile r2r now)
  • added a bunch of comments that document the intent and implementation details of PEImage/PEImageLayout machinery.
  • a few small cleanups and fixes.

NOTE: requirements of memory mapping APIs on Windows in combination with desire to always produce PE files compatible with OS loader require the following:

  1. file sections of R2R files are aligned on 4K
  2. files inside singlefile bundle are aligned on 4K
    (this is only a requirement for R2R files, but done in all cases for simplicity for now, this can change in the future)

Files that do not conform with above will work as before - via making a copy when loaded in the context of a singlefile bundle.
The tradeoff is slight increase in file sizes. As an example - in case of untrimmed composite r2r singlefile HelloWrold app, the overall increase is 65841K -> 68036K, which is ~3%

Another effect of this change is making R2R to work reliably on OSX. Turns out we had issues with doing manual mapping of executable code on OSX and also with performing relocations. As a result R2R on OSX would only work in cases where doing layout by copying would get the virtual address that PE specifies. That is a fragile condition and R2R worked rarely (if at all).

With this change performing manual layouts is more straightforward and failures would cause asserts, so the issues were noticed and fixed.

@VSadov VSadov added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it labels Nov 22, 2021
@VSadov VSadov force-pushed the peLoad branch 5 times, most recently from 94c38cc to b346156 Compare November 27, 2021 03:21
@VSadov VSadov force-pushed the peLoad branch 3 times, most recently from afa8bdf to 5e53f38 Compare December 10, 2021 08:16
@VSadov VSadov changed the title PE loader/image investigations PE loader/image cleanups. No-copy mapping of R2R PEs on Windows. Dec 10, 2021
@VSadov VSadov removed NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) NO-REVIEW Experimental/testing PR, do NOT review it labels Dec 10, 2021
@VSadov VSadov marked this pull request as ready for review December 13, 2021 17:43
@VSadov
Copy link
Member Author

VSadov commented Dec 13, 2021

I think this is ready for review. @trylek @janvorli - Could you take a look?

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me and a great cleanup too, thanks Vlad! I have a couple of minor comments and observations but it would be great if the other reviewers could also take a look, most notably I'm thinking about JanV as another pair of eyes to audit the page protection logic.

{
// To minimize wasted VA space on 32-bit systems, align file to page boundaries (presumed to be 4K)
//
// On Windows we use 4K file alignment, per requirements of memory mapping API (MapViewOfFile3, et al).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - according to the above condition this section is used both on Windows and on 32-bit Unix (most notably Linux on arm I guess); please consider making the comment more precise.

{
// To minimize wasted VA space on 32-bit systems, align file to page boundaries (presumed to be 4K)
//
// On Windows we use 4K file alignment, per requirements of memory mapping API (MapViewOfFile3, et al).
// We also want files always accepted by the native loader, thus we can't use the same approach as on Unix.
fileAlignment = 0x1000;
}

int sectionAlignment = 0x1000;
if (!target.IsWindows && is64BitTarget)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition !target.IsWindows && is64BitTarget is apparently a negation of the condition target.IsWindows || !is64BitTarget but it takes a bit of Boolean algebra to double-check that. Could we use an if-else construct or a temporary boolean variable with a sufficiently descriptive name, e.g. isWindowsOr32Bit?

@@ -372,7 +372,10 @@ CHECK PEDecoder::CheckSection(COUNT_T previousAddressEnd, COUNT_T addressStart,
CHECK(alignedSize >= VAL32(pNT->OptionalHeader.SizeOfImage));

// Check expected alignments
#if TARGET_WINDOWS
// R2R files on Unix do not align section RVAs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo - should this be 'R2R files on Windows'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, in a way the comment does hold on Unix but it's strange that it's in a TARGET_WINDOWS conditional compilation block.

Copy link
Member Author

@VSadov VSadov Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows we make section RVAs always a multiple of SectionAlignment. That is not an explicit requirement in the documentation, but it looks like OS loader expects that and our own validation here expects that too.

That is not the case on Unix where we keep lower 16bit of the RVA intact. I do not know why this asserts is not triggered on Unix. It should. When I tried using the same approach on Windows as on Unix, this assert was triggered.

Perhaps this style of assert does not run on Unix? (that would be mildly disturbing).

Anyways - I think the assert should be only enabled on Windows, where the condition is expected. At least as a matter of documenting the expectations.

Copy link
Member Author

@VSadov VSadov Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I should use a better terminology to make it more clear.

I.E. on Windows section start RVAs are multiples of SectionAlignment. On Unix this is not true.

Any suggestions?

(same applies to having gaps between sections when loaded - Windows PEs are not expected to have gaps, Unix PEs may have gaps)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess that probably the only cause of my confusion was the wording - having a comment about Unix behavior under a conditional compilation section TARGET_WINDOWS at the very least looks somewhat surprising. Perhaps changing the wording to something like 'On Windows the sections are aligned blah blah blah (in contrast to Unix where they're not blah blah blah)' would be more than sufficient to address my suggestion.


static bool HavePlaceholderAPI()
{
const MapViewOfFile3Fn INVALID_ADDRESS_CENTINEL = (MapViewOfFile3Fn)1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - typo, should be SENTINEL I believe.

src/coreclr/vm/peimagelayout.cpp Show resolved Hide resolved
@@ -2228,7 +2228,15 @@ public static int Main(string[] args)
RunTest("ExplicitlySizedClassTest", ExplicitlySizedClassTest());
RunTest("GenericLdtokenTest", GenericLdtokenTest());
RunTest("ArrayLdtokenTests", ArrayLdtokenTests());
RunTest("TestGenericMDArrayBehavior", TestGenericMDArrayBehavior());

// TODO: BUG BUG: allocation of MD arrays is failing on OSX/ARM64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a GitHub bug we could reference here? Or should we create one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a bug for this and add a link to this comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2043,9 +2043,9 @@ MAPmmapAndRecord(

// Set the requested mapping with forced PROT_WRITE to ensure data from the file can be read there,
// read the data in and finally remove the forced PROT_WRITE
if ((mprotect(pvBaseAddress, len + adjust, prot | PROT_WRITE) == -1) ||
if ((mprotect(pvBaseAddress, len + adjust, PROT_WRITE) == -1) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious - why was this change necessary?

Copy link
Member Author

@VSadov VSadov Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change if prot is executable, the prot | PROT_WRITE becomes WX. mprotect will accept that, but pread will fail and we would bail from this method. This would always happen in an R2R PE, since .text is executable.

We then would go on the path of doing layout via copying, which would work, but often end up not in the desired location and later fail when applying relocations, where we had similar WX issue. Failed relocations would result in R2R disabled. The code would still run - in pure IL/JIT mode.

Ultimately we would have R2R enabled on OSX only when we are very lucky.

@@ -208,15 +214,16 @@ void Assembly::Init(AllocMemTracker *pamTracker, LoaderAllocator *pLoaderAllocat
// loading it entirely.
//CacheFriendAssemblyInfo();

if (IsCollectible())
if (IsCollectible() && !pPEAssembly->IsDynamic())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are dynamic assemblies excluded here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic assemblies do not have PE layouts. The GetLoadedImageContents below would return NULL.

@@ -1192,7 +1004,12 @@ HRESULT PEImage::TryOpenFile()
{
ErrorModeHolder mode(SEM_NOOPENFILEERRORBOX | SEM_FAILCRITICALERRORS);
m_hFile=WszCreateFile((LPCWSTR)GetPathToLoad(),
GENERIC_READ,
GENERIC_READ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - this function shares 90% with the GetFileHandle. Since you are touching this code, it seems it would make sense to call TryOpenFile from the GetFileHandle to get rid of the code duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The difference is in taking a lock and in hadling the failure. I will just make one call another.

// PE relocations are applied
// native exception handlers are registered with OS.
//
// Flat layouts are sufficient for operatons that do not require running native code,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Flat layouts are sufficient for operatons that do not require running native code,
// Flat layouts are sufficient for operations that do not require running native code,

#if defined(TARGET_UNIX)
if (((pSection->Characteristics & VAL32(IMAGE_SCN_MEM_EXECUTE)) != 0))
{
#ifdef __APPLE__
dwNewProtection = PAGE_READWRITE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the macOS issue that you've mentioned in the PR description?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one of two issues. The other one is WX case in MAPmmapAndRecord

src/tests/readytorun/crossgen2/Program.cs Outdated Show resolved Hide resolved
@VSadov
Copy link
Member Author

VSadov commented Dec 13, 2021

@trylek @janvorli - Thanks for the feedback!
I think I have addressed all the concerns. Please let me know if I've missed something or you see something else that needs adjusting.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@VSadov
Copy link
Member Author

VSadov commented Dec 14, 2021

Thanks!!

@VSadov VSadov merged commit 35e4e97 into dotnet:main Dec 14, 2021
@VSadov VSadov deleted the peLoad branch December 14, 2021 18:24
@ghost ghost locked as resolved and limited conversation to collaborators Jan 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants