Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ReadDir iterator #103135

Closed
wants to merge 1 commit into from
Closed

Optimize ReadDir iterator #103135

wants to merge 1 commit into from

Conversation

dtolnay
Copy link
Member

@dtolnay dtolnay commented Oct 17, 2022

I was recently dealing with a directory containing a large number of files and tried to see if there was any room for optimizing std::fs::ReadDir for this situation. This commit ended up improving count() by only 8.5% which I didn't find to be enough to justify merging after taking into account how often anybody else would care about this operation. However I think some of the changes to next() are an improvement independent of performance so I am going to move those into a separate PR.

use std::{env, fs};

fn main() {
    let mut args_os = env::args_os();
    args_os.next().unwrap();
    let read_dir = fs::read_dir(args_os.next().unwrap()).unwrap();

    // Before: 4.718 sec
    // After: 4.676 sec (-0.9%)
    let count = read_dir.fold(0, |count, _| count + 1);

    // Before: 4.718 sec
    // After: 4.296 sec (-8.5%)
    let count = read_dir.count();

    println!("{}", count);
}

r? @ghost

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Oct 17, 2022
@rustbot
Copy link
Collaborator

rustbot commented Oct 17, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@dtolnay dtolnay closed this Oct 17, 2022
@dtolnay dtolnay deleted the readdir branch October 17, 2022 04:46
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-tools failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
  IMAGE: x86_64-gnu-tools
##[endgroup]
From https://github.com/rust-lang/rust
 * branch              master     -> FETCH_HEAD
Searching for toolstate changes between a501e6699ed979ef0540949211fc28256336a3f3 and 06ba4bc1531cd4eee7ef342ef3dbdf352ff95ec6
Tool subtrees were updated
##[group]Run src/ci/scripts/verify-channel.sh
src/ci/scripts/verify-channel.sh
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
env:
---
tests/pass-dep/concurrency/libc_pthread_cond.rs ... ok
tests/pass-dep/concurrency/linux-futex.rs ... ok

tests/pass-dep/shims/fs.rs FAILED:
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools-bin/miri" "--error-format=json" "--edition" "2018" "-Astable-features" "-Aunused" "-Zui-testing" "--extern" "getrandom_1=/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps/libgetrandom-ac993ae34a47d010.rmeta" "--extern" "getrandom_2=/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps/libgetrandom-20b4cd671fbdd122.rmeta" "--extern" "libc=/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps/liblibc-5c91b8ee2f9a761f.rmeta" "--extern" "num_cpus=/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps/libnum_cpus-f87ab7e9e692ab6c.rmeta" "--extern" "page_size=/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps/libpage_size-c9761f548003f841.rmeta" "--extern" "rand=/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps/librand-5e75cbc746a2ceda.rmeta" "--extern" "tokio=/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps/libtokio-664c0f42e4d71cb6.rmeta" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/log-5da2e714312fda90" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug/deps" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/deps" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/syn-3bb8f7ac0d0402d5" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/libc-19a0c4dd785bdb63" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/parking_lot_core-de8d28cfa3a1467f" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/tokio-5b679bca69988a36" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/proc-macro2-eeb167ea843a342f" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/lock_api-2402edfbc2151f87" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/quote-bbb475710904fa81" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/x86_64-unknown-linux-gnu/debug" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/getrandom-dd8529f5bb54559b" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2-tools/miri/debug/build/memchr-3aa21f28168e754b" "tests/pass-dep/shims/fs.rs" "-Zmiri-disable-isolation"
Pass got exit status: 1

actual output differed from expected
--- tests/pass-dep/shims/fs.stderr
--- tests/pass-dep/shims/fs.stderr
+++ <stderr output>
-hello dup fd
+error: Undefined Behavior: dereferencing pointer failed: ALLOC has size 31, so pointer to 280 bytes starting at offset 0 is out-of-bounds
+  --> RUSTLIB/std/src/sys/PLATFORM/fs.rs:LL:CC
+   |
+LL |                 let name = CStr::from_ptr(ptr::addr_of!((*entry_ptr).d_name) as *const c_char);
+   |                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: ALLOC has size 31, so pointer to 280 bytes starting at offset 0 is out-of-bounds
+   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
+   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
+   = note: BACKTRACE:
+   = note: inside `<std::sys::PLATFORM::fs::ReadDir as std::iter::Iterator>::next` at RUSTLIB/core/src/ptr/mod.rs:LL:CC
---
-hello dup fd


There were 1 unmatched diagnostics that occurred outside the testfile and had no pattern
    Error: Undefined Behavior: dereferencing pointer failed: alloc59726 has size 31, so pointer to 280 bytes starting at offset 0 is out-of-bounds
full stderr:
full stderr:
error: Undefined Behavior: dereferencing pointer failed: alloc59726 has size 31, so pointer to 280 bytes starting at offset 0 is out-of-bounds
   |
   |
LL |                 let name = CStr::from_ptr(ptr::addr_of!((*entry_ptr).d_name) as *const c_char);
   |                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dereferencing pointer failed: alloc59726 has size 31, so pointer to 280 bytes starting at offset 0 is out-of-bounds
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE:
   = note: inside `<std::sys::unix::fs::ReadDir as std::iter::Iterator>::next` at /checkout/library/core/src/ptr/mod.rs:2005:5

bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 23, 2022
Eliminate 280-byte memset from ReadDir iterator

This guy:

https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589

It turns out `libc::dirent64` is quite big&mdash;https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory.

Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead.

## History

This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750.

Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr.

<table><tr><td>copy 280 bytes</td></tr></table>

This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table>

However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.)

The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table>

Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table>

After my PR we just grab the 9 needed bytes directly from entry_ptr.

<table><tr><td>copy 9 bytes</td></tr></table>

The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better.

## References

- https://man7.org/linux/man-pages/man3/readdir.3.html
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
Eliminate 280-byte memset from ReadDir iterator

This guy:

https://github.com/rust-lang/rust/blob/1536ab1b383f21b38f8d49230a2aecc51daffa3d/library/std/src/sys/unix/fs.rs#L589

It turns out `libc::dirent64` is quite big&mdash;https://docs.rs/libc/0.2.135/libc/struct.dirent64.html. In rust-lang#103135 this memset accounted for 0.9% of the runtime of iterating a big directory.

Almost none of the big zeroed value is ever used. We memcpy a tiny prefix (19 bytes) into it, and then read just 9 bytes (`d_ino` and `d_type`) back out. We can read exactly those 9 bytes we need directly from the original entry_ptr instead.

## History

This code got added in rust-lang#93459 and tweaked in rust-lang#94272 and rust-lang#94750.

Prior to rust-lang#93459, there was no memset but a full 280 bytes were being copied from the entry_ptr.

<table><tr><td>copy 280 bytes</td></tr></table>

This was not legal because not all of those bytes might be initialized, or even allocated, depending on the length of the directory entry's name, leading to a segfault. That PR fixed the segfault by creating a new zeroed dirent64 and copying just the guaranteed initialized prefix into it.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table>

However this was still buggy because it used `addr_of!((*entry_ptr).d_name)`, which is considered UB by Miri in the case that the full extent of entry_ptr is not in bounds of the same allocation. (Arguably this shouldn't be a requirement, but here we are.)

The UB got fixed by rust-lang#94272 by replacing `addr_of` with some pointer manipulation based on `offset_from`, but still fundamentally the same operation.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td></tr></table>

Then rust-lang#94750 noticed that only 9 of those 19 bytes were even being used, so we could pick out only those 9 to put in the ReadDir value.

<table><tr><td>memset 280 bytes</td><td>copy 19 bytes</td><td>copy 9 bytes</td></tr></table>

After my PR we just grab the 9 needed bytes directly from entry_ptr.

<table><tr><td>copy 9 bytes</td></tr></table>

The resulting code is more complex but I believe still worthwhile to land for the following reason. This is an extremely straightforward thing to accomplish in C and clearly libc assumes that; literally just `entry_ptr->d_name`. The extra work in comparison to accomplish it in Rust is not an example of any actual safety being provided by Rust. I believe it's useful to have uncovered that and think about what could be done in the standard library or language to support this obvious operation better.

## References

- https://man7.org/linux/man-pages/man3/readdir.3.html
@dtolnay dtolnay added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants