Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less syscalls for the copy_file_range probe #122079

Merged
merged 3 commits into from
May 26, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 48 additions & 21 deletions library/std/src/sys/pal/unix/kernel_copy.rs
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,12 @@ pub(super) fn copy_regular_files(reader: RawFd, writer: RawFd, max_len: u64) ->
// We store the availability in a global to avoid unnecessary syscalls
static HAS_COPY_FILE_RANGE: AtomicU8 = AtomicU8::new(NOT_PROBED);

let mut have_probed = match HAS_COPY_FILE_RANGE.load(Ordering::Relaxed) {
NOT_PROBED => false,
UNAVAILABLE => return CopyResult::Fallback(0),
_ => true,
};

syscall! {
fn copy_file_range(
fd_in: libc::c_int,
Expand All @@ -571,26 +577,6 @@ pub(super) fn copy_regular_files(reader: RawFd, writer: RawFd, max_len: u64) ->
) -> libc::ssize_t
}

match HAS_COPY_FILE_RANGE.load(Ordering::Relaxed) {
NOT_PROBED => {
// EPERM can indicate seccomp filters or an immutable file.
// To distinguish these cases we probe with invalid file descriptors which should result in EBADF if the syscall is supported
// and some other error (ENOSYS or EPERM) if it's not available
let result = unsafe {
cvt(copy_file_range(INVALID_FD, ptr::null_mut(), INVALID_FD, ptr::null_mut(), 1, 0))
};

if matches!(result.map_err(|e| e.raw_os_error()), Err(Some(EBADF))) {
HAS_COPY_FILE_RANGE.store(AVAILABLE, Ordering::Relaxed);
} else {
HAS_COPY_FILE_RANGE.store(UNAVAILABLE, Ordering::Relaxed);
return CopyResult::Fallback(0);
}
}
UNAVAILABLE => return CopyResult::Fallback(0),
_ => {}
};

let mut written = 0u64;
while written < max_len {
let bytes_to_copy = cmp::min(max_len - written, usize::MAX as u64);
Expand All @@ -604,6 +590,11 @@ pub(super) fn copy_regular_files(reader: RawFd, writer: RawFd, max_len: u64) ->
cvt(copy_file_range(reader, ptr::null_mut(), writer, ptr::null_mut(), bytes_to_copy, 0))
};

if !have_probed && copy_result.is_ok() {
have_probed = true;
HAS_COPY_FILE_RANGE.store(AVAILABLE, Ordering::Relaxed);
}

match copy_result {
Ok(0) if written == 0 => {
// fallback to work around several kernel bugs where copy_file_range will fail to
Expand All @@ -619,7 +610,43 @@ pub(super) fn copy_regular_files(reader: RawFd, writer: RawFd, max_len: u64) ->
return match err.raw_os_error() {
// when file offset + max_length > u64::MAX
Some(EOVERFLOW) => CopyResult::Fallback(written),
Some(ENOSYS | EXDEV | EINVAL | EPERM | EOPNOTSUPP | EBADF) if written == 0 => {
Some(raw_os_error @ (ENOSYS | EXDEV | EINVAL | EPERM | EOPNOTSUPP | EBADF))
if written == 0 =>
{
if !have_probed {
let available = match raw_os_error {
EPERM => {
// EPERM can indicate seccomp filters or an
// immutable file. To distinguish these
// cases we probe with invalid file
// descriptors which should result in EBADF
// if the syscall is supported and EPERM or
// ENOSYS if it's not available.
match unsafe {
cvt(copy_file_range(
INVALID_FD,
ptr::null_mut(),
INVALID_FD,
ptr::null_mut(),
1,
0,
))
.map_err(|e| e.raw_os_error())
} {
Err(Some(EPERM | ENOSYS)) => UNAVAILABLE,
Err(Some(EBADF)) => AVAILABLE,
Ok(_) => panic!("unexpected copy_file_range probe success"),
// Treat other errors as the syscall
// being unavailable.
Err(_) => UNAVAILABLE,
}
}
ENOSYS => UNAVAILABLE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a fuse fs to return ENOSYS to the user even if the kernel has support for the syscall?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume so, but what does the author of the FUSE FS expect to happen when they do this? I think we can pretend it doesn't happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to be robust against the kernel misbehaving, but IMO we should be robust against other processes on the system misbehaving.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to challenge that. Where does it end? You can modify the results of arbitrary syscalls using ptrace. We can't defend against that. Why is FUSE special?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probe-first approach is immune to any such file-dependent questions because it uses invalid FDs so it never touches any filesystem, it only checks kernel support.

Copy link
Contributor Author

@tbu- tbu- Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only checking kernel support, our program might be ptraced.

The probe-first approach is immune to any such file-dependent questions because it uses invalid FDs so it never touches any filesystem, it only checks kernel support.

This can obviously be listed as part of the requirements, then it can also be done with the current approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can modify the results of arbitrary syscalls using ptrace.

On Ubuntu (and any other distro which configures yama lsm for this) you can only ptrace a program you spawned yourself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other code in std behaves the same way:

if err.raw_os_error() == Some(libc::ENOSYS) {
STATX_SAVED_STATE.store(STATX_STATE::Unavailable as u8, Ordering::Relaxed);
return None;
}

I think this is not a problem of this particular PR.

_ => AVAILABLE,
};
HAS_COPY_FILE_RANGE.store(available, Ordering::Relaxed);
}

// Try fallback io::copy if either:
// - Kernel version is < 4.5 (ENOSYS¹)
// - Files are mounted on different fs (EXDEV)
Expand Down
Loading