Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace fcntl-based file lock with flock #72161

Merged
merged 3 commits into from
May 22, 2020
Merged

Conversation

nbdd0121
Copy link
Contributor

WSL1 does not support fcntl-based lock and will always report success,
therefore creating a race condition when multiple rustc processes are
modifying shared data such as search-index.js. WSL1 does however
support flock.

flock is supported by all unix-like platforms. The only caveat is that
Linux 2.6.11 or earlier does not support flock on NFS mounts, but as
the minimum supported Linux version is 2.6.18, it is not an issue.

Fixes #72157

WSL1 does not support `fcntl`-based lock and will always report success,
therefore creating a race condition when multiple rustc processes are
modifying shared data such as `search-index.js`. WSL1 does however
support `flock`.

`flock` is supported by all unix-like platforms. The only caveat is that
Linux 2.6.11 or earlier does not support `flock` on NFS mounts, but as
the minimum supported Linux version is 2.6.18, it is not an issue.

Fixes rust-lang#72157
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @davidtwco (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 13, 2020
@nbdd0121
Copy link
Contributor Author

r? @cuviper

@rust-highfive rust-highfive assigned cuviper and unassigned davidtwco May 13, 2020
@cuviper
Copy link
Member

cuviper commented May 15, 2020

flock is supported by all unix-like platforms.

I think this is overstated -- it's a syscall of BSD origin, and my Linux man-page says it appears on most UNIX systems, but in particular it seems Solaris still does not have it.

Since there's nothing really wrong with the current fcntl approach, and it's POSIX functionality, I suggest we keep using that on most unix targets. We can change to flock for linux alone, and specifically comment that we're doing this for WSL1's sake.

(I know I suggested switching all unix in #72157, but now I know it is a portability hazard.)

@cuviper cuviper added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 15, 2020
@cuviper
Copy link
Member

cuviper commented May 15, 2020

I found that Cargo uses flock, but it has a caution about blocking on NFS:

https://github.com/rust-lang/cargo/blob/13bded9a335a855a15b95f54779d7f8a889404dd/src/cargo/util/flock.rs#L284-L291

    // File locking on Unix is currently implemented via `flock`, which is known
    // to be broken on NFS. We could in theory just ignore errors that happen on
    // NFS, but apparently the failure mode [1] for `flock` on NFS is **blocking
    // forever**, even if the "non-blocking" flag is passed!
    //
    // As a result, we just skip all file locks entirely on NFS mounts. That
    // should avoid calling any `flock` functions at all, and it wouldn't work
    // there anyway.
    //
    // [1]: https://github.com/rust-lang/cargo/issues/2615

They use this for all unix, except on Solaris they just ignore locking:

https://github.com/rust-lang/cargo/blob/13bded9a335a855a15b95f54779d7f8a889404dd/src/cargo/util/flock.rs#L284-L291

    #[cfg(target_os = "solaris")]
    fn flock(file: &File, flag: libc::c_int) -> Result<()> {
        // Solaris lacks flock(), so simply succeed with a no-op
        Ok(())
    }

cc @pfmooney who rewrote that recently.

@cuviper
Copy link
Member

cuviper commented May 15, 2020

cc @alexcrichton too, who reviewed that cargo change.

@pfmooney
Copy link
Contributor

cc @pfmooney who rewrote that recently.

Much of that locking code was lifted out of fs2, which had become unsuitable due to maintenance concerns (not accepting patches to support new systems/targets). Porting over the fcntl()-based locking for Solaris targets was considered, but was skipped for the sake of simplicity since concurrent (and conflicting) cargo operations were identified as infrequent and not a high priority.

A flock() implementation was prioritized in illumos (drawing its lineage back to OpenSolaris) due to the differing semantics between flock() and fcntl()-based locks, specifically around fork() behavior. I'm not sure how much those differences would relate to your change here.

@nbdd0121
Copy link
Contributor Author

flock is supported by all unix-like platforms.

I think this is overstated -- it's a syscall of BSD origin, and my Linux man-page says it appears on most UNIX systems, but in particular it seems Solaris still does not have it.

Ah indeed. So here is what I originally do to ensure it's okay:

I should probably framed it as "all unix-like platforms" supported by rustc.

Solaris isn't supported by rustc currently. I am not sure if that is a convincing enough reason to not keep the fcntl version?

I found that Cargo uses flock, but it has a caution about blocking on NFS:

https://github.com/rust-lang/cargo/blob/13bded9a335a855a15b95f54779d7f8a889404dd/src/cargo/util/flock.rs#L284-L291

    // File locking on Unix is currently implemented via `flock`, which is known
    // to be broken on NFS. We could in theory just ignore errors that happen on
    // NFS, but apparently the failure mode [1] for `flock` on NFS is **blocking
    // forever**, even if the "non-blocking" flag is passed!
    //
    // As a result, we just skip all file locks entirely on NFS mounts. That
    // should avoid calling any `flock` functions at all, and it wouldn't work
    // there anyway.
    //
    // [1]: https://github.com/rust-lang/cargo/issues/2615

This is interesting. However Linux man page suggested that on Linux, flock is emulated by fcntl lock on NFS drives (I double checked Linux source and it is true). So the "hang" would happen with the current implementation as well. That particular issue is likely a misconfiguration which enabled lock feature on non-NFSv4 drives.

@nbdd0121
Copy link
Contributor Author

I tested flock-based implementation on NFSv4 and it works fine.

BTW the current approach used in Cargo does affect people as well by not locking on NFS, see rust-lang/cargo#4629.

@alexcrichton
Copy link
Member

FWIW we treat locking as quite optional in Cargo due to it being too restrictive if we require it everywhere. We try to use it everywhere we can but it's empirically had a lot of bugs and had to adjust a lot over time. I doubt Cargo is doing "the best thing" for all platforms under the sun still, and it probably still has issues that need to be worked through.

@nbdd0121
Copy link
Contributor Author

@cuviper Is Solaris a platform that rustc ever intends to run on? If so I can bring back fcntl version and use it only for Solaris/Illumos. For other Unices I personally would use flock version because its nicer semantics but I'm okay with either method.

@pfmooney
Copy link
Contributor

@cuviper Is Solaris a platform that rustc ever intends to run on? If so I can bring back fcntl version and use it only for Solaris/Illumos. For other Unices I personally would use flock version because its nicer semantics but I'm okay with either method.

flock() is present on illumos, so the fcntl()-based locking is not necessary there. The solaris target is currently being used by illumos systems as they transition over to the somewhat recently created illumos target. (We're still waiting on it to be enabled as a CI build)

@cuviper
Copy link
Member

cuviper commented May 20, 2020

I tested flock-based implementation on NFSv4 and it works fine.

The thing I'm worried about from that comment is the "failure mode" aspect, which I inferred to mean the behavior when the network disappears. But maybe it's also problematic for NFSv3?

@cuviper Is Solaris a platform that rustc ever intends to run on?

I don't know -- it seems the present folks really care about illumos, not solaris proper, which will be getting an independent target as mentioned.

My feeling is that we should be really careful about changing this, given the known hazards. We do need to care about it more than Cargo does itself, because rustc and rustdoc are definitely going to be called in parallel on a regular basis. In the extreme, we'd only change at all for WSL where we know fcntl doesn't work right -- can we tell from uname or something?

@nbdd0121
Copy link
Contributor Author

nbdd0121 commented May 20, 2020

The thing I'm worried about from that comment is the "failure mode" aspect, which I inferred to mean the behavior when the network disappears. But maybe it's also problematic for NFSv3?

It's up to the configuration of NFS. If NFS is configured to be uninterruptable and infinite retry, then a network disconnection will cause everything to hang. In that case it does not matter how locks behave, because we will run into issues when doing other IOs as well. Most NFS-related issues are actually poor configuration.

If anything can break flock on Linux it'll be NFS. But Linux essentially emulates flock using fcntl on NFS drives, so if flock breaks, fcntl would break as well. I'm not sure about BSD's handling on this - but I think convert fcntl to flock on Linux is pretty safe.

My feeling is that we should be really careful about changing this, given the known hazards. We do need to care about it more than Cargo does itself, because rustc and rustdoc are definitely going to be called in parallel on a regular basis. In the extreme, we'd only change at all for WSL where we know fcntl doesn't work right -- can we tell from uname or something?

We could check if uname contains Microsoft.

@cuviper
Copy link
Member

cuviper commented May 21, 2020

@bors r+

@bors
Copy link
Contributor

bors commented May 21, 2020

📌 Commit a05acbf has been approved by cuviper

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 21, 2020
bors added a commit to rust-lang-ci/rust that referenced this pull request May 22, 2020
Rollup of 4 pull requests

Successful merges:

 - rust-lang#71610 (InvalidUndefBytes: Track size of undef region used)
 - rust-lang#72161 (Replace fcntl-based file lock with flock)
 - rust-lang#72306 (Break tokens before checking if they are 'probably equal')
 - rust-lang#72325 (Always generated object code for `#![no_builtins]`)

Failed merges:

r? @ghost
@bors bors merged commit a8018e2 into rust-lang:master May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rustc file lock not working on WSL
7 participants