Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate 280-byte memset from ReadDir iterator #103137

Merged
merged 1 commit into from
Oct 23, 2022
Merged

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Oct 17, 2022

This guy:

let mut copy: dirent64 = mem::zeroed();

It turns out libc::dirent64 is quite big—https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In #103135 this memset accounted for 0.9% of the runtime of iterating a big directory.

Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (d_ino and d_type) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead.

History

This code got added in #93459 and tweaked in #94272 and #94750.

Prior to #93459, there was no memset but a full 280 bytes were being copied from the entry_ptr.

copy 280 bytes

This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it.

memset 280 bytescopy 19 bytes

However this was still buggy because it used addr_of!((*entry_ptr).d_name), which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.)

The UB got fixed by #94272 by replacing addr_of with some pointer manipulation based on offset_from, but still fundamentally the same operation.

memset 280 bytescopy 19 bytes

Then #94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value.

memset 280 bytescopy 19 bytescopy 9 bytes

After my PR we just grab the 9 needed bytes directly from entry_ptr.

copy 9 bytes

The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just entry_ptr->d_name. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better.

References

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 17, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 17, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 17, 2022
} else {
#[allow(deref_nullptr)]
{
ptr::addr_of!((*ptr::null::<dirent64>()).$field)
Copy link
Member

@est31 est31 Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ptr::addr_of!((*ptr::null::<dirent64>()).$field)
None::<* const dirent64>.map(|p| ptr::addr_of!((*p).$field)).unwrap()

This is less scary-looking than the ub-in-dead-code version IMO. Edit: actually nevermind, the map's closure is also UB due to the special situation we are in. When given the choice between obvious on first sight UB-in-dead-code, and UB-in-dead-code where you don't see the UB-iness immediately, I think it's better to prefer former.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I also considered:

match $entry_ptr {
    entry_ptr => {
        if true {
            entry_ptr.byte_offset(OFFSET).cast::<_>()
        } else {
            ptr::addr_of!((*entry_ptr).$field)
        }
    }
}

Not sure if that seems better to you. It avoids the allow(deref_nullptr).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the UB causing version that used to be there earlier in the history, right? Yeah it might actually be good to have that as it makes it clear that this version was specifically not chosen, and gives the opportunity to put a 1-2 line comment about the UB-iness of the code. Even if it repeats the comment above, it helps bring the point across.

Also FTR, I reconsidered after my edit. None::<* const dirent64>.map(|p| ptr::addr_of!((*p).$field)).unwrap() isn't actually UB, unless my understanding of UB is wrong. All that you need is the type anyways.


let entry = dirent64_min {
d_ino: copy.d_ino as u64,
d_ino: *offset_ptr!(entry_ptr, d_ino) as u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we could avoid the awkward else branch in offset_ptr! if this used ptr::addr_of!, which I think is fine for these fields that are properly typed (and in-bounds)? Or am I missing something?

(Then offset_ptr can be char_offset_ptr and assume that we're casting to *const c_char).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it this way because the way you suggest is (shockingly) not allowed.

use std::ptr;

#[repr(C)]
struct Small { x: u16, y: u16 }

#[repr(C)]
struct Big { x: u16, y: u16, z: u16 }

fn main() {
    unsafe {
        let small = Box::into_raw(Box::new(Small { x: 0, y: 0 }));
        let big = small as *const Big;
        let y = *ptr::addr_of!((*big).y);
        let _ = Box::from_raw(small);
        println!("{}", y);
    }
}
error: Undefined Behavior: dereferencing pointer failed: alloc1550 has size 4, so pointer to 6 bytes starting at offset 0 is out-of-bounds
  --> src/main.rs:13:18
   |
13 |         let y = *ptr::addr_of!((*big).y);
   |                  ^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: alloc1550 has size 4, so pointer to 6 bytes starting at offset 0 is out-of-bounds
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that Small/Big code, all of the following possible behaviors for ptr::addr_of!((*big).y) are sensible and justifiable, with different tradeoffs:

(a) DGAF, it just adds 2 bytes to the value of big
(b) Requires 2 bytes starting at big to be in bounds of the same allocation
(c) Requires 4 bytes starting at big to be in bounds of the same allocation
(d) Requires 6 bytes starting at big to be in bounds of the same allocation

Currently in Rust, (d) is the only one that is definitively not UB. Personally I think (b) should be the requirement (that's the case in a C &big->y, for example) but we need to follow (d) for now until whoever makes these decisions relaxes that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yeah, that makes sense - I believe this is an area where the exact rules aren't yet nailed down and Miri is just being conservative here.

I'll go ahead and r+ this implementation, but I agree that we need a better solution here, since in practice the current requirements are overly strict. (It may also be worth having a macro or function accessors like this in libc since presumably any user of readdir ends up needing this macro).

@Mark-Simulacrum
Copy link
Member

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Oct 23, 2022

📌 Commit 0bb6eb1 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 23, 2022
@bors
Copy link
Contributor

bors commented Oct 23, 2022

⌛ Testing commit 0bb6eb1 with merge 7fcf850...

@bors
Copy link
Contributor

bors commented Oct 23, 2022

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 7fcf850 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 23, 2022
@bors bors merged commit 7fcf850 into rust-lang:master Oct 23, 2022
@rustbot rustbot added this to the 1.66.0 milestone Oct 23, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7fcf850): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean1 range count2
Regressions ❌
(primary)
0.6% [0.6%, 0.6%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.6% [0.6%, 0.6%] 1

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Footnotes

  1. the arithmetic mean of the percent change

  2. number of relevant changes

@dtolnay dtolnay deleted the readdir branch November 1, 2022 20:34
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
Eliminate 280-byte memset from ReadDir iterator

This guy:

https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589

It turns out `libc::dirent64` is quite big&mdash;https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory.

Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead.

## History

This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750.

Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr.

<table><tr><td>copy 280 bytes</td></tr></table>

This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table>

However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.)

The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table>

Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table>

After my PR we just grab the 9 needed bytes directly from entry_ptr.

<table><tr><td>copy 9 bytes</td></tr></table>

The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better.

## References

- https://man7.org/linux/man-pages/man3/readdir.3.html
@dtolnay dtolnay added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants