Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CopyFileEx for fs::copy on Windows #26751

Merged
merged 1 commit into from
Jul 10, 2015

Conversation

retep998
Copy link
Member

@retep998 retep998 commented Jul 3, 2015

Using the OS mechanism for copying files allows the OS to optimize the transfer using stuff such as Offloaded Data Transfers (ODX).
Also preserves a lot more information, including NTFS File Streams, which the manual implementation threw away.
In addition, it is an atomic operation, unlike the manual implementation which has extra calls for copying over permissions.

r? @alexcrichton

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jul 3, 2015
@alexcrichton
Copy link
Member

My only worry here is that the semantics of CopyFile may be subtly different than the semantics of the unix implementation, as they should in theory be generally the same for the majority of corner cases. That being said, after reading the documentation for CopyFile I believe that they're equivalent, so I'm not too too worried.

@Stebalien
Copy link
Contributor

The size is kind of a lie (it's racy) although fs::copy doesn't even document what that u64 is in the first place.

@retep998
Copy link
Member Author

retep998 commented Jul 3, 2015

Another option is to use CopyFileEx and use a progress callback which will receive the total file size. If that is desired, just tell me and I'll do it up.

@retep998
Copy link
Member Author

retep998 commented Jul 3, 2015

A further advantage to using CopyFile/CopyFileEx compared to the manual implementation is that it preserves all the File Streams. Yay NTFS!

@retep998
Copy link
Member Author

retep998 commented Jul 5, 2015

I wrote up an alternative version that uses CopyFileEx so it can get the total bytes transferred using a callback.
retep998@7bf491d

@Stebalien
Copy link
Contributor

I wrote up an alternative version that uses CopyFileEx so it can get the total bytes transferred using a callback.

(actually, I was kind of hoping the devs would chime in and say "we don't care" so I could submit a patch that uses the BTRFS clone ioctl on Linux which doesn't provide any way of getting this information. IMO, fs::copy shouldn't return the number of bytes "copied" anyways.)

@retep998
Copy link
Member Author

retep998 commented Jul 5, 2015

@Stebalien fs::copy is a stable function though, so to change the semantics of the return value or even change it entirely would be a breaking change. Not that I'm opposed to such a breaking change, but it would need to be evaluated to determine if anything would break from that. Perhaps do a crater run of a simple change from Result<u64> to Result<()> to determine if any code relied on being able to get that value?

@Stebalien
Copy link
Contributor

@retep998 I know. However, the rust docs don't actually say what the return value of fs::copy means.

try!(cvt(unsafe {
c::CopyFileW(pfrom.as_ptr(), pto.as_ptr(), libc::FALSE)
}));
stat(to).map(|attr| attr.size())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about this implementation because it means that the file could be successfully copied but then this later call to stat could fail, causing the entire operation to be considered as failing. Basically the atomicity of this operation means that it can be somewhat significantly different from the unix implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm well no nevermind, there's a few steps taken in the unix implementation (e.g. copy then set permissions), so this may not be so bad after all.

@alexcrichton alexcrichton added the I-needs-decision Issue: In need of a decision. label Jul 6, 2015
@alexcrichton
Copy link
Member

Can you also add some tests for the various error modes here? For example copying from a file onto a directory, copying from nothing onto something, etc. It'd be good to just ensure that some common error cases are consistent across platforms.

@retep998
Copy link
Member Author

retep998 commented Jul 8, 2015

Implementation switched to the CopyFileEx version and a couple more tests added.

@retep998 retep998 changed the title Use CopyFile for fs::copy on Windows Use CopyFileEx for fs::copy on Windows Jul 8, 2015
@retep998 retep998 force-pushed the copy-that-floppy branch 2 times, most recently from 3ccf636 to 19a3a84 Compare July 8, 2015 11:23
@@ -1814,6 +1814,18 @@ mod tests {
check!(fs::set_permissions(&out, attr.permissions()));
}

#[cfg(windows)]
#[test]
fn copy_file_preserves_streams() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems surprising! Could you elaborate on what's going on here? E.g. how come this test is only enabled for Windows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In windows NTFS), every file name maps to multiple actual files called streams. By default, reading/writing/deleting operates on the "anonymous" stream but you can operate on other streams. Alternate streams are kind of like extended attributes but more flexible.

https://technet.microsoft.com/en-us/sysinternals/bb897440.aspx

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the test specific to Windows since it depends on file streams which are specific to NTFS, and Windows is the only OS we support that consistently uses NTFS. Also, it depends on the implementation properly copying over all the file streams, which the manual implementation does not do, so even if another OS did support NTFS file streams, this test would fail on such an OS at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate further on what's going on, I'm creating the file in.txt but instead of writing to the default data stream, I'm instead writing to the bunny stream.
If I copy the file to a new location it brings all the streams with it, and the number of bytes copied (what I assume the return value from fs::copy means) is the same as the number of bytes I wrote to the bunny stream since its the only stream with data.
When getting the size of the file, I'm getting that information on the default data stream, which is empty so its size is 0. If I wanted information on the other streams I'd have to explicitly specify them by name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for the explanations!

@alexcrichton
Copy link
Member

Ok, I'm comfortable with this. We should keep our eyes peeled for any possible breakage though as this may end up turning up something.

cc @brson for the keeping eyes peeled aspect.

@bors: r+ 19a3a840831ad0d7705a9ae11816dc7d1545b954

@alexcrichton alexcrichton removed the I-needs-decision Issue: In need of a decision. label Jul 10, 2015
@bors
Copy link
Contributor

bors commented Jul 10, 2015

⌛ Testing commit 19a3a84 with merge af09f40...

@bors
Copy link
Contributor

bors commented Jul 10, 2015

💔 Test failed - auto-mac-32-opt

@retep998 retep998 force-pushed the copy-that-floppy branch 3 times, most recently from 8b8a562 to 1d20269 Compare July 10, 2015 08:53
Adds a couple more tests for fs::copy

Signed-off-by: Peter Atashian <retep998@gmail.com>
@retep998
Copy link
Member Author

Okay, fixed the build errors so it builds according to @dotdash

@dotdash
Copy link
Contributor

dotdash commented Jul 10, 2015

@bors r=alexcrichton 1d20269

bors added a commit that referenced this pull request Jul 10, 2015
Using the OS mechanism for copying files allows the OS to optimize the transfer using stuff such as [Offloaded Data Transfers (ODX)](https://msdn.microsoft.com/en-us/library/windows/desktop/hh848056%28v=vs.85%29.aspx).
Also preserves a lot more information, including NTFS [File Streams](https://msdn.microsoft.com/en-us/library/windows/desktop/aa364404%28v=vs.85%29.aspx), which the manual implementation threw away.
In addition, it is an atomic operation, unlike the manual implementation which has extra calls for copying over permissions.

r? @alexcrichton
@bors
Copy link
Contributor

bors commented Jul 10, 2015

⌛ Testing commit 1d20269 with merge d0d3707...

@bors bors merged commit 1d20269 into rust-lang:master Jul 10, 2015
@brson
Copy link
Contributor

brson commented Jul 10, 2015

👍

stephaneyfx added a commit to stephaneyfx/rust that referenced this pull request Sep 28, 2017
On Windows with the NTFS filesystem, `fs::copy` would return the sum of the
lengths of all streams, which can be different from the length reported by
`metadata` and thus confusing for users unaware of this NTFS peculiarity.

This makes `fs::copy` return the same length `metadata` reports which is the
value it used to return before PR rust-lang#26751. Note that alternate streams are still
copied; their length is just not included in the returned value.

This change relies on the assumption that the stream with index 1 is always the
main stream in the `CopyFileEx` callback. I could not find any official
document confirming this but empirical testing has shown this to be true,
regardless of whether the alternate stream is created before or after the main
stream.

Resolves rust-lang#44532
bors added a commit that referenced this pull request Oct 3, 2017
Made `fs::copy` return the length of the main stream

On Windows with the NTFS filesystem, `fs::copy` would return the sum of the
lengths of all streams, which can be different from the length reported by
`metadata` and thus confusing for users unaware of this NTFS peculiarity.

This makes `fs::copy` return the same length `metadata` reports which is the
value it used to return before PR #26751. Note that alternate streams are still
copied; their length is just not included in the returned value.

This change relies on the assumption that the stream with index 1 is always the
main stream in the `CopyFileEx` callback. I could not find any official
document confirming this but empirical testing has shown this to be true,
regardless of whether the alternate stream is created before or after the main
stream.

Resolves #44532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants