Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Seek::stream_len impl for File #125087

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tbu-
Copy link
Contributor

@tbu- tbu- commented May 13, 2024

It uses the file metadata on Unix with a fallback for files incorrectly reported as zero-sized. It uses GetFileSizeEx on Windows.

This reduces the number of syscalls needed for determining the file size of an open file from 3 to 1.

@rustbot
Copy link
Collaborator

rustbot commented May 13, 2024

r? @ChrisDenton

rustbot has assigned @ChrisDenton.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-hermit Operating System: Hermit O-solid Operating System: SOLID O-unix Operating system: Unix-like O-wasi Operating system: Wasi, Webassembly System Interface O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 13, 2024
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jul 5, 2024

☔ The latest upstream changes (presumably #127360) made this pull request unmergeable. Please resolve the merge conflicts.

It uses the file metadata on Unix with a fallback for files incorrectly
reported as zero-sized. It uses `GetFileSizeEx` on Windows.

This reduces the number of syscalls needed for determining the file size
of an open file from 3 to 1.
@the8472
Copy link
Member

the8472 commented Jul 13, 2024

with a fallback for files incorrectly reported as zero-sized.

Alas, this is insufficient. There are other filesystems with questionable stat impls that return incorrect values that aren't 0. For example sysfs.

$ stat -c %s /sys/kernel/oops_count 
4096
$ wc -c /sys/kernel/oops_count 
2 /sys/kernel/oops_count

@tbu-
Copy link
Contributor Author

tbu- commented Jul 13, 2024

There are other filesystems with questionable stat impls that return incorrect values that aren't 0. For example sysfs.

Amazing, I did not foresee this.

Alas, this is insufficient.

For this particular example, this doesn't look insufficient, since it doesn't regress anything:

use std::fs::File;
use std::io;
use std::io::Seek as _;
use std::io::SeekFrom;

fn main() -> io::Result<()> {
    let file = File::open("/sys/kernel/oops_count")?;
    println!("{}", (&file).seek(SeekFrom::End(0))?);
    Ok(())
}

This outputs 4096 on my machine, the same as the stat result.

[…]
openat(AT_FDCWD, "/sys/kernel/oops_count", O_RDONLY|O_CLOEXEC) = 3
lseek(3, 0, SEEK_END)                   = 4096
write(1, "4096\n", 5)                   = 5
close(3)                                = 0

@tbu-
Copy link
Contributor Author

tbu- commented Jul 13, 2024

It seems that wc falls back to reading the entire file if the file size is a multiple of the page size (4096 on my machine). I don't think we want to do that in Rust, that sounds horribly inefficient for files that happen to be a multiple of 4096 in size.

https://github.com/coreutils/coreutils/blob/74ef0ac8a56b36ed3d0277c3876fefcbf434d0b6/src/wc.c#L357-L365

EDIT: Seems like I was incorrect. It uses some heuristic to lseek to a bit before the end of file and reads from there. For an 1 GiB large file, it does the following:

openat(AT_FDCWD, "a", O_RDONLY)         = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1073741824, ...}) = 0
lseek(3, 1073618943, SEEK_CUR)          = 1073618943
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 8193
read(3, "", 16384)                      = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x3), ...}) = 0
write(1, "1073741824 a\n", 13)          = 13
close(3)                                = 0

@the8472
Copy link
Member

the8472 commented Jul 13, 2024

Ok, maybe seeking is nonsense on those files too and they're only meant to be used via read and write.

But that still leaves FUSE drivers which can implement basically arbitrary behavior. stat and seek aren't required to be consistent there either and seek might be the method doing the right thing.

And any approach distinguishing good from bad filesystems would require additional syscalls.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 13, 2024

But that still leaves FUSE drivers which can implement basically arbitrary behavior. stat and seek aren't required to be consistent there either and seek might be the method doing the right thing.

What is Rust's platform support for bad FUSE drivers? Who decides which of the methods to trust (fstat vs lseek)?

I can kinda see not crashing the Rust program if a FUSE driver decides to return EBADF for close. But this seems like a case of garbage in, garbage out, and it's on the user to not use file systems that lie to programs.

@the8472
Copy link
Member

the8472 commented Jul 13, 2024

FUSE is just the canary in the coal mine. sysfs shows that in-kernel filesystems do strange things too. If we have 2 examples already then I assume there's some network filesystem will also do weird things when weird remote machines are involved.

So I'd say the trust hierarchy when it comes to "how many bytes does this file contain is" read > seek > stat

It's unfortunate that statx has a return field that's supposed to indicate what is actually supported by the kernel, but they're not using it to signal that reporting the size is unsupported on sysfs.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 14, 2024

FUSE is just the canary in the coal mine. sysfs shows that in-kernel filesystems do strange things too. If we have 2 examples already then I assume there's some network filesystem will also do weird things when weird remote machines are involved.

I had assumed lseek would error out for procfs, but turns out it doesn't:

$ python -c 'import os; print(open("/proc/1/cmdline").seek(0, os.SEEK_END))'
0

So after checking again, it seems that even for procfs lseek offers no advantage over fstat. This means the two examples (procfs, sysfs) we found are not examples where the two methods behave differently.

@ChrisDenton
Copy link
Member

Sorry, just catching up. Given the above, @the8472 are you happy that this is at least no worse than the status quo? I.e. if seek or stat is returning wrong or nonsense values then that's a problem in either case.

@the8472
Copy link
Member

the8472 commented Sep 9, 2024

I think it would be better to only do this for regular files. Non-regular ones have weird behavior in general. E.g. directories have a size according to stat but reading or seeking would error, so they don't have a meaningful stream length.

Even with that restriction I think someone will still eventually run into issues around FUSE, network filesystems or weird filesystems.

That said, stream_len is an unstable API and it's a convenience method, so we don't necessarily have to provide ironclad guarantees. Maybe we should just rephrase the documentation of Seek::stream_len. Currently it says

This method is implemented using up to three seek operations.

this would be better

The default implementation uses up to three seek operations.

I mean, it always applies to trait methods that trait impls can override them and can provide different behavior, so this isn't special. But perhaps it's better to not suggest that implementations will only ever behave exactly like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-hermit Operating System: Hermit O-solid Operating System: SOLID O-unix Operating system: Unix-like O-wasi Operating system: Wasi, Webassembly System Interface O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants