Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unix: RwLock is incorrect #53127

Closed
RalfJung opened this issue Aug 6, 2018 · 17 comments · Fixed by #71889
Closed

Unix: RwLock is incorrect #53127

RalfJung opened this issue Aug 6, 2018 · 17 comments · Fixed by #71889
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2018

I think the sys::unix::rwlock implementation is incorrect in the sense that it has undefined behavior, for two reasons:

  • The access to write_locked is not properly synchronized: In read, we access write_locked even if pthread_rwlock_rdlock failed. This is fixed.
  • Worse, POSiX read-write locks have UB when the thread holding the write lock attempts to acquire it again -- and yet nothing is stopping exactly that from happening in write. If we really want to use POSIX rwlocks, I think we have to implement a reentrancy detector. (And then maybe we also want to use that for mutex, so that we can use the static initializer and the most efficient code path?)
@RalfJung
Copy link
Member Author

RalfJung commented Aug 6, 2018

I based my statements on http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_rwlock_wrlock.html. @sfackler points out that at https://linux.die.net/man/3/pthread_rwlock_wrlock, behavior for re-acquiring the write lock is defined as "may deadlock" -- but not UB. However, we also use this implementation on non-Linux platforms, e.g. macOS.

@sfackler
Copy link
Member

sfackler commented Aug 6, 2018

macOS does explicitly identify the behavior as undefined.

@RalfJung
Copy link
Member Author

FWIW, RwLock is also UB on Windows -- actually, even Mutex is wrong there.

I think we should stop using the platform-specific implementations, and just roll our own based on thread (un)parking.

@sfackler sfackler added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 29, 2018
@sfackler
Copy link
Member

cc @alexcrichton - we talked about this a bit during RustConf.

@alexcrichton
Copy link
Member

I agree that we're basically at the point that there's a ridiculous amount of constraints from OS primitives that the only program which doesn't have undefined behavior is correct ones, and we all know how often we write correct programs.

All these discussions are spread out over a number of issues though and PRs, and it's really difficult to keep it all in my own head at least. "This is UB" is a very strong motivation for reimplementing the standard library's primitives, but I think the scale of a change such as this would probably want to go through the RFC process. Such an RFC would be a great location to centralize all the various hazards of OS primitives and how unreasonable it is for us to try to appease all possible implementations. That in turns provides excellent motivation for alternative strategies.

@retep998
Copy link
Member

You mean like this RFC? rust-lang/rfcs#1632

@RalfJung
Copy link
Member Author

Correction: Even some correct programs are UB. Reentrancy on the read lock of an RwLock is a perfectly reasonable thing to do, and yet Windows declares it UB.

Whether reentrancy that leads to deadlocks is "correct" is open for interpretation, I guess ;) but anyway, it is safe, and hence "correct enough" for UB to be critical.

@alexcrichton
Copy link
Member

I've opened an internals post to continue more long-form discussion on the topic of continuing to fix these issues.

bors added a commit that referenced this issue Apr 3, 2019
Use the parking_lot locking primitives

This PR adds the [`parking_lot`](https://crates.io/crates/parking_lot) code to libstd and uses it for the `sys_common::{mutex,rwlock,condvar,remutex}` implementations.

This has been discussed in https://internals.rust-lang.org/t/standard-library-synchronization-primitives-and-undefined-behavior/8439/9

Thanks @Amanieu for mentoring when doing this, and basically all the code is his as well of course.

Fixes #35836
Fixes #53127
bors added a commit that referenced this issue Apr 19, 2019
Use the parking_lot locking primitives

This PR adds the [`parking_lot`](https://crates.io/crates/parking_lot) code to libstd and uses it for the `sys_common::{mutex,rwlock,condvar,remutex}` implementations.

This has been discussed in https://internals.rust-lang.org/t/standard-library-synchronization-primitives-and-undefined-behavior/8439/9

Thanks @Amanieu for mentoring when doing this, and basically all the code is his as well of course.

Fixes #35836
Fixes #53127
@RalfJung
Copy link
Member Author

RalfJung commented May 4, 2020

I just noticed that the latest version of the POSIX docs mandate

If a deadlock condition occurs or the calling thread already owns the read-write lock for writing or reading, the call shall either deadlock or return [EDEADLK].

So some time between the 1997 version and the 2018 version, this seems to have changed.

Can we expect Apple (and generally all our POSIX platforms) to follow a recent enough POSIX to rely on this?

@Amanieu
Copy link
Member

Amanieu commented May 4, 2020

Older versions of glibc (pre-2.25) use hardware lock elision on Intel CPUs which makes recursive locking succeed when it should otherwise deadlock.

@RalfJung
Copy link
Member Author

RalfJung commented May 4, 2020

Looks like even the 2004 version already says

The calling thread acquires the write lock if no other thread (reader or writer) holds the read-write lock rwlock. Otherwise, the thread shall block until it can acquire the lock. The calling thread may deadlock if at the time the call is made it holds the read-write lock (whether a read or write lock).

"may deadlock" here could be interpreted as "or cause UB", but then saying "may deadlock" would be redundant as of course a deadlock is a valid implementation of UB. So arguably, UB was not permitted any more for 15 years now.

@Amanieu
So that's pre-2016 glibc versions, if I got the release date right.

Bummer, this means we still need to keep the extra reentrant-rwlock-checks around in libstd (but we should have comments explaining that this is just to defend against non-compliant implementations).

@RalfJung
Copy link
Member Author

RalfJung commented May 4, 2020

For the question of whether we can rely on macOS using a recent enough POSIX, I found this "conformance statement" but could not figure out which POSIX version it says macOS conforms with.

@Amanieu
Copy link
Member

Amanieu commented May 4, 2020

Relevant issue: #33770

@RalfJung
Copy link
Member Author

RalfJung commented May 4, 2020

Oh, that was for Mutex as well, not rust RwLock?

This is particularly scary... do they still do that? I would think if PTHREAD_MUTEX_DEFAULT == PTHREAD_MUTEX_NORMAL one can assume default mutexes to behave the same as "normal" mutexes... obviously not. Okay, I'll have to figure out a way to make Miri check that.

@Amanieu
Copy link
Member

Amanieu commented May 4, 2020

I believe they still do that, but I haven't checked.

@RalfJung
Copy link
Member Author

RalfJung commented May 4, 2020

For Mutex only though, right? Arguably that is compliant with POSIX, the standard doesn't say that PTHREAD_MUTEX_DEFAULT == PTHREAD_MUTEX_NORMAL implies anything, it just seems like a reasonable assumption (that the glibc people seemingly disagree with).

For RwLock though, if they still do lock elision (and don't have any reentrancy guards themselves), that's a clear POSIX violation -- right?

@Amanieu
Copy link
Member

Amanieu commented May 4, 2020

Yes, only for Mutex and only if you don't use pthread_mutexattr_settype to explicitly select PTHREAD_MUTEX_NORMAL. So they are still standard-compliant (ish).

For RwLock they currently don't do elision so they are standard compliant. But old glibc is not standard compliant.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue May 5, 2020
Explain our RwLock implementation

Turns out that [with the latest POSIX docs](https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_wrlock.html), our `RwLock` implementation is actually correct. However, we cannot fully rely on that due to bugs in older glibc (fix released in 2016). Update the comments to explain that.

I also clarified our Mutex docs a bit and fixed another instance of rust-lang#55865.

r? @Amanieu
Fixes rust-lang#53127
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue May 5, 2020
Explain our RwLock implementation

Turns out that [with the latest POSIX docs](https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_wrlock.html), our `RwLock` implementation is actually correct. However, we cannot fully rely on that due to bugs in older glibc (fix released in 2016). Update the comments to explain that.

I also clarified our Mutex docs a bit and fixed another instance of rust-lang#55865.

r? @Amanieu
Fixes rust-lang#53127
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue May 5, 2020
Explain our RwLock implementation

Turns out that [with the latest POSIX docs](https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_wrlock.html), our `RwLock` implementation is actually correct. However, we cannot fully rely on that due to bugs in older glibc (fix released in 2016). Update the comments to explain that.

I also clarified our Mutex docs a bit and fixed another instance of rust-lang#55865.

r? @Amanieu
Fixes rust-lang#53127
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue May 5, 2020
Explain our RwLock implementation

Turns out that [with the latest POSIX docs](https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_rwlock_wrlock.html), our `RwLock` implementation is actually correct. However, we cannot fully rely on that due to bugs in older glibc (fix released in 2016). Update the comments to explain that.

I also clarified our Mutex docs a bit and fixed another instance of rust-lang#55865.

r? @Amanieu
Fixes rust-lang#53127
@bors bors closed this as completed in e4bda61 May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
5 participants