Introduce RefCell::try_borrow_unguarded #59211

nox · 2019-03-15T14:32:38Z

Come sit next to the fireplace with me, this is going to be a long story.

So, you may already be aware that Servo has weird design constraints that forces us developers working on it to do weird things. The thing that interests us today is that we do layout on a separate thread with its own thread pool to do some things in parallel, whereas the data it uses comes from the script thread, which implements the entire DOM and related pieces, with !Sync data types such as RefCell<T>.

The invariant we maintain is that script does not do anything ever with the DOM data as long as layout is doing its job. That's all nice and all, but one thing we don't ensure is that we don't actually know if script was currently mutably borrowing some RefCell<T> prior to starting layout, which may lead to aliasing mutable memory and obviously undefined behaviour.

This PR reinstates RefCell::borrow_state so that this method can make use of it and return None if the cell was mutably borrowed.

Cc @SimonSapin

This reverts commit 313aab8.

This reverts commit dc2d505.

rust-highfive · 2019-03-15T14:32:41Z

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

Mark-Simulacrum · 2019-03-15T18:03:02Z

I might be misunderstanding, but is there a reason you can't have the function call borrow or try_borrow? And then do the unsafe cast to extend the lifetime of the reference returned as appropriate?

nox · 2019-03-15T18:11:10Z

We call DomRefCell::borrow_for_layout from multiple threads in parallel. The layout threads. Using RefCell::try_borrow would potentially change the borrow flag from multiple threads at once non-atomically. I need to observe it without changing it.

Mark-Simulacrum · 2019-03-15T19:58:03Z

Ah, I see. On the other hand, RefCell doesn't store the state atomically/via locking so I'm not sure it's safe to read the borrow state without some form of locking either?

I think the only reason the as_ptr call technically doesn't cause any sort of problem is that we're only returning a shared reference.

cc @RalfJung as well, might have relevance for stacked borrows and UCG in general

nox · 2019-03-15T20:03:35Z

To send a value over a channel to a different thread, that channel already must do some synchronisation to ensure that memory accesses aren't observed in the wrong order etc, that's independent from what RefCell::borrow_state would allow me to do here.

nox · 2019-03-15T20:24:13Z

Let me try to explain the use case in depth:

We can't get `&mut` values for the entire DOM at once

In Servo, all of DOM (Node, Element, NodeList, etc) is written in Rust but the actual struct values are stored and owned by JavaScript objects, which obviously are owned by the JavaScript engine. The lifecycle of the values is totally managed by the engine, and they will be freed only when no other JavaScript value points to them; furthermore, a lot of these values define cycles on the Rust side (children point to parents and parents point to children), let alone on the JS side where the sky is the limit. I hope that makes it clear that we can't avoid inner mutability.

Layout needs Sync data

Layout is a core piece of a Web browser, and one of the things that define Servo is that our layout component does its job parallelising a damn lot of things, which by definition means it needs Sync data. The problem is that one of its inputs is the DOM itself.

DOM cannot use expensive atomic data structures

DOM is the damn thing that JS accesses whenever it uses Element.getAttribute etc, and attributes and many other things in DOM are mutable. Given how crucial it is for DOM operations to be fast, we cannot afford to use atomic data structures when the JS itself is single-threaded and we only need Sync things when running layout.

So what do we do?

Servo currently defines a lot of Sync wrappers around non-Sync values, and provides methods on these wrappers such that the invariant maintained is "when those wrappers are used from the layout threads, we are sure script doesn't go behind our back and starts doing things again with the DOM".

Why do you need `borrow_state` then?

Because bugs can happen, and there is a single scenario that scares me:

JavaScript calls DOM code.
DOM code mutably borrows something in the DOM and calls back into JS.
JS calls DOM code which triggers a layout operation.
layout calls DomRefCell::borrow_for_layout on the thing that was mutably borrowed in step 2.

During step 4, script is ensured to not run anymore and thus cannot ever change any refcell's borrow state while layout runs, but layout cannot do anything defensively to avoid the UB caused by accessing a refcell that was mutably borrowed prior to running layout.

Does that make sense to you as to why I need RefCell::borrow_state now?

Mark-Simulacrum · 2019-03-15T20:37:02Z

I think my confusion lies not in the need for borrow_state -- I see that -- but why would borrow_state be considered safe?

I'm envisioning that the layout thread calls borrow_state to check if it's currently mutably borrowed, and during that call/after it JS goes in and borrow_mut's the RefCell. Maybe that can't happen though?

nox · 2019-03-15T20:40:49Z

I'm envisioning that the layout thread calls borrow_state to check if it's currently mutably borrowed, and during that call/after it JS goes in and borrow_mut's the RefCell. Maybe that can't happen though?

That's the responsibility of the script/layout interface that ensures that script doesn't run while layout is going on. It indeed can't happen.

Mark-Simulacrum · 2019-03-15T20:43:02Z

Sounds good.

I think the concern/part of the reason we deprecated and removed this originally is that getting it right is hard; and this API is actually not useful for anyone who's not doing correct but hard to get right (i.e., using RefCell from multiple threads) things.

Maybe we should add to the docs on borrow_state that "you really shouldn't use this unless you really know what you are doing."

nox · 2019-03-15T20:47:59Z

I think the concern/part of the reason we deprecated and removed this originally is that getting it right is hard; and this API is actually not useful for anyone who's not doing correct but hard to get right (i.e., using RefCell from multiple threads) things.

Yeah I can sympathise with that. I'm not even entirely sure that we 100% got it correctly in Servo, given there is too much unsafe code all over the place related to the script/layout invariant I described earlier, and nothing to try to avoid the specific scenario I told of. On a side note that's why I'm working on inert these days.

Maybe we should add to the docs on borrow_state that "you really shouldn't use this unless you really know what you are doing."

Sounds good to me, though if someone is calling it in multiple threads at once they already should be aware of being careful because they must have written an unsafe Sync implementation somewhere.

KodrAus · 2019-03-15T22:52:59Z

Can we think of any other use-case for this that doesn't involve pretending RefCell is Sync?

src/libcore/cell.rs

Centril · 2019-03-16T04:25:19Z

As an alternative approach re. #59211 (comment), you may consider two `methods:

fn will_borrow_succeed(&self) -> bool { ... }
fn will_borrow_mut_succeed(&self) -> bool { ... }

These would correspond to the actual checks done for borrow and borrow_mut.

RalfJung · 2019-03-16T09:56:32Z

The method itself seems fine, just from a "is this a safe extension of RefCell" standpoint. But of course that's not the question here.

However, I'm afraid I don't understand enough about how Servo uses this to evaluate whether Servo's use of this is correct. Is it possible to produce a small amount of code demonstrating this?

JavaScript calls DOM code.
DOM code mutably borrows something in the DOM and calls back into JS.
JS calls DOM code which triggers a layout operation.
layout calls DomRefCell::borrow_for_layout on the thing that was mutably borrowed in step 2.

In the last step, which thread will this call happen in? You mentioned layout is in a different thread, so it seems to me that there's a race here between DOM code releasing the mutable borrow and layout calling borrow_state. You said above it is the responsibility of the layout code to make sure that does not happen, but then it seems like this check does not actually help to avoid UB. What am I missing?

nox · 2019-03-16T11:12:55Z

In the last step, which thread will this call happen in? You mentioned layout is in a different thread, so it seems to me that there's a race here between DOM code releasing the mutable borrow and layout calling borrow_state. You said above it is the responsibility of the layout code to make sure that does not happen, but then it seems like this check does not actually help to avoid UB. What am I missing?

There is no race condition because we prevent them by doing the necessary synchronisation. That method prevents UB in the case we had a bug where a refcell was mutably borrowed before layout started. This check does avoid UB, because we wouldn't look into a refcell that was been mutably borrowed.

RalfJung · 2019-03-18T10:51:59Z

I see, so the "doing the synchronization" part is separate from the "prevent reentrant accesses" part, and borrow_state lets you catch bugs in the latter, but you still rely on the former?

nox · 2019-03-18T10:54:59Z

@RalfJung Yes.

SimonSapin · 2019-03-18T14:46:48Z

This method has existed before, and was removed because:

We've added/stabilized try_borrow and try_borrow_mut, these should no longer be needed

But now there is a new use case that was not known at the time and that is not covered by the try_ methods. (More on this below.) Usually we tend to accept PRs for new unstable APIs without more process, but since this reverts a previous decision let’s get team sign off:

@rfcbot fcp merge

Having two methods that return booleans instead of one that returns an enum sounds fine in terms of functionality. (Or maybe just one? If there is no use case without UB for the other one…) Naming them might be awkward, but .borrow() would succeed might be easier to document than “state” semantics of each enum variant.

Imagine that some code has access to &mut RefCell<T>. By definition, no access to the refcell other than through this borrow is possible, as long as this borrow exists. Through .get_mut(), we can obtain &mut T and from that &T. If T: Sync, we can send that &T to multiple threads and operate on the value from there.

The Servo use case is similar: in that we want to do multi-threaded computation on &T with T: Sync, but different in that we start from only &RefCell<T>. We have external synchronization ensuring that no other access to that refcell are going to be made for the duration of this computation, because the thread that the refcell belongs to is blocked while waiting for those computation results. But it remains that accessing &T is still UB if there is cell::RefMut<T> (effectively &mut T) somewhere on the stack of the original thread. There shouldn’t be, but bugs happen. A runtime check helps catch this kind of bug before it causes UB.

Additionally there isn’t just one refcell, but a whole tree of nodes that each contain multiple refcells. Tree traversal is where we have parallelism, so we can’t do the runtime check on each refcell on the original thread. (What we send to computation threads is struct WrapperWithCarefullyLimitedApi<'a>(&'a Node).) Therefore, to avoid a need for atomic access or other synchronization, the check can only read the refcell’s borrow flag/counter, without writing like refcell.try_borrow().is_ok() would.

rfcbot · 2019-03-18T14:46:49Z

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Mark-Simulacrum · 2019-03-18T16:40:21Z

I'd be against the method names that @Centril proposed.

The primary reason is that will_borrow_succeed is misleading in that the "true" result is meaningless.

There seems to be no reason for the mut variant (will_borrow_mut_succeed) as there's no case where you can obtain a RefMut from a RefCell but can't use borrow_mut or try_borrow_mut.

Perhaps the API should instead be unsafe fn borrow_unchecked(&self) -> &T, and would panic if currently borrowed but not increment the borrow counter? That seems to fit Servo's use case and is more obviously not the API to use, unlike borrow_state.

I'm not a fan of the "unchecked" since it does check that it is currently safe; but because it doesn't increment the ref-count it would potentially be unsafe in the future.

To be clear, I'm opposed to re-adding borrow_state as-is because it feels like introducing a safe API that has no use case in safe code, as well as having a bunch of edge-cases that are difficult to properly document. The unsafe fn borrow_unchecked(&self) -> &T API seems better suited for our purposes.

SimonSapin · 2019-03-18T17:22:17Z

This borrow is not completely unchecked, the point is to check that the refcell is in an appropriate state when the borrow starts. But then we don’t track this new borrow or when it ends.

Maybe unsafe fn borrow_untracked(&self) -> Option<&T>?

SimonSapin · 2019-03-20T23:36:31Z

Can you say more about what bindgen did with the borrow sate? Could it have used try_borrow instead, if that was available at the time? Or was it something more complex?

emilio · 2019-03-20T23:50:46Z

This is the commit that removed the RefCell: rust-lang/rust-bindgen@cfdf15f

Looking at the BorrowState usage in the deleted files (rust-lang/rust-bindgen@cfdf15f#diff-2c09afcdc3c420ab0678ba9b5e83959cL421) looks like it wanted try_borrow_mut.

rfcbot · 2019-03-21T21:39:58Z

🔔 This is now entering its final comment period, as per the review above. 🔔

diwic · 2019-03-22T17:58:15Z

@nox

DOM code mutably borrows something in the DOM and calls back into JS.

I'm not sure what that "something" is, but it sounds like a recipe for disaster in itself, because that JS code can in turn call into DOM trying to change the very same thing, causing a RefCell panic?

nox · 2019-03-22T18:01:12Z

I'm not sure what that "something" is, but it sounds like a recipe for disaster in itself, because that JS code can in turn call into DOM trying to change the very same thing, causing a RefCell panic?

Something here is indeed any DOM call, and that is indeed a bug if we call JS while a RefCell from within the DOM is mutably borrowed, because that's a case that will eventually cause a panic as you describe. This PR is about being able to panic from layout too instead of causing straight UB if the JS code calls a DOM method that calls layout.

nox · 2019-03-22T18:04:06Z

Err, misread your comment, "something" here is anything that needs to be mutated sometimes in DOM, be it element's attributes, ids, urls, whatever. Indeed, if we keep that mutably borrowed while we reenter into JS code, that will ultimately lead to a panic. This is by design and not going to change any time soon.

diwic · 2019-03-22T18:53:48Z

@nox

Okay, so when Layout threads call try_borrow_unguarded, they would typically unwrap the result, causing a panic rather than UB, since mutably borrowing something and then calling into JS would be a bug anyway?

Fwiw, if my two cents are worth anything, it seems to me like what you want to do (putting RefCell into some parallel-sync-mode when script is not running) is so niche that maybe you need something else than RefCell to suit your use case. (Maybe something where the borrow flag can be in "Sync" mode as well as "Reading" and "Writing" modes? Not sure.) And if you like to implement this on top of RefCell like you do today with DOMRefCell then borrow_state is at least a little less niche than try_borrow_unguarded.

nox · 2019-03-22T18:56:25Z

I'm 100% fine with your two cents, and I would be ok with exposing borrow_state directly, I changed it to try_borrow_unguarded because other people said borrow_state is too vague and too general-without-a-purpose, if I may say.

rfcbot · 2019-03-31T21:56:55Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

KodrAus · 2019-04-11T03:49:50Z

Thanks for your patience @nox! I'll re-open the tracking issue and push this one through.

@bors r+

bors · 2019-04-11T03:49:51Z

📌 Commit 38811a1 has been approved by KodrAus

bors · 2019-04-11T12:49:56Z

⌛ Testing commit 38811a1 with merge 8509127...

@SimonSapin

Introduce RefCell::try_borrow_unguarded *Come sit next to the fireplace with me, this is going to be a long story.* So, you may already be aware that Servo has weird design constraints that forces us developers working on it to do weird things. The thing that interests us today is that we do layout on a separate thread with its own thread pool to do some things in parallel, whereas the data it uses comes from the script thread, which implements the entire DOM and related pieces, with `!Sync` data types such as `RefCell<T>`. The invariant we maintain is that script does not do anything ever with the DOM data as long as layout is doing its job. That's all nice and all, but one thing we don't ensure is that we don't actually know if script was currently mutably borrowing some `RefCell<T>` prior to starting layout, which may lead to aliasing mutable memory and obviously undefined behaviour. This PR reinstates `RefCell::borrow_state` so that [this method](https://github.com/servo/servo/blob/master/components/script/dom/bindings/cell.rs#L23-L30) can make use of it and return `None` if the cell was mutably borrowed. Cc @SimonSapin

bors · 2019-04-11T15:44:06Z

☀️ Test successful - checks-travis, status-appveyor
Approved by: KodrAus
Pushing 8509127 to master...

nox · 2019-04-13T08:27:58Z

Just wanted to say that try_borrow_unguarded is now used in Servo and, believe it or not, all tests passed without a problem, which means that the soundness issue this method prevents is not currently present in our codebase, woohoo.

Servo has been using this since servo/servo#23196 to add a runtime check to some unsafe code, as discussed in PR rust-lang#59211. Stabilizing would help do more of the same in libraries that also have users on Stable.

Stabilize RefCell::try_borrow_unguarded Servo has been using this since servo/servo#23196 to add a runtime check to some unsafe code, as discussed in PR rust-lang#59211. Stabilizing would help do more of the same in libraries that also have users on Stable.

oconnor663 · 2022-10-04T18:00:41Z

Apologies for pinging an old issue, but I saw try_borrow_unguarded in the official docs, and I wound up here trying to understand exactly what problem it solves. After reading the whole thread, I'm still unsure why exactly the following would've been unsound. Here I'll use Python's "global interpreter lock (GIL)" as a metaphor for whatever synchronization these Servo threads are doing. (Is that a reasonable metaphor?) What's unsound about this:

Thread A                 Thread B

guard = X.borrow_mut()
release the GIL
go to sleep
                         wake up
                         acquire the GIL
                         X.try_borrow()  // Err

If I'm reading the thread above correctly, there's some soundness problem with this arrangement that try_borrow_unguarded solves. Could anyone help me pinpoint what's unsound here?

nox added 2 commits March 15, 2019 15:01

Revert "Remove RefCell::borrow_state"

1c365ce

This reverts commit 313aab8.

Revert "Deprecate std::cell::RefCell::borrow_state"

fab71dd

This reverts commit dc2d505.

rust-highfive assigned KodrAus Mar 15, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 15, 2019

nox added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 15, 2019

Centril requested a review from RalfJung March 16, 2019 02:46

Centril reviewed Mar 16, 2019

View reviewed changes

src/libcore/cell.rs Outdated Show resolved Hide resolved

src/libcore/cell.rs Outdated Show resolved Hide resolved

Tweak documentation of RefCell::borrow_state

a9388c2

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Mar 18, 2019

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Mar 21, 2019

rfcbot added the finished-final-comment-period The final comment period is finished for this PR / Issue. label Mar 31, 2019

rfcbot removed the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Mar 31, 2019

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 11, 2019

KodrAus mentioned this pull request Apr 11, 2019

Tracking issue for borrow_state stabilization #27733

Closed

1 task

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 11, 2019

bors merged commit 38811a1 into rust-lang:master Apr 11, 2019

nox deleted the refcell-borrow-state branch April 13, 2019 08:28

SimonSapin mentioned this pull request May 15, 2019

Stabilize RefCell::try_borrow_unguarded #60850

Merged

Introduce RefCell::try_borrow_unguarded #59211

Introduce RefCell::try_borrow_unguarded #59211

Conversation

nox commented Mar 15, 2019

rust-highfive commented Mar 15, 2019

Mark-Simulacrum commented Mar 15, 2019

nox commented Mar 15, 2019

Mark-Simulacrum commented Mar 15, 2019

nox commented Mar 15, 2019

nox commented Mar 15, 2019 • edited Loading

We can't get &mut values for the entire DOM at once

Layout needs Sync data

DOM cannot use expensive atomic data structures

So what do we do?

Why do you need borrow_state then?

Mark-Simulacrum commented Mar 15, 2019

nox commented Mar 15, 2019

Mark-Simulacrum commented Mar 15, 2019

nox commented Mar 15, 2019

KodrAus commented Mar 15, 2019

Centril commented Mar 16, 2019

RalfJung commented Mar 16, 2019 • edited Loading

nox commented Mar 16, 2019

RalfJung commented Mar 18, 2019

nox commented Mar 18, 2019

SimonSapin commented Mar 18, 2019

rfcbot commented Mar 18, 2019 • edited by dtolnay Loading

Mark-Simulacrum commented Mar 18, 2019

SimonSapin commented Mar 18, 2019

SimonSapin commented Mar 20, 2019

emilio commented Mar 20, 2019

rfcbot commented Mar 21, 2019

diwic commented Mar 22, 2019

nox commented Mar 22, 2019 • edited Loading

nox commented Mar 22, 2019

diwic commented Mar 22, 2019

nox commented Mar 22, 2019

rfcbot commented Mar 31, 2019

KodrAus commented Apr 11, 2019

bors commented Apr 11, 2019

bors commented Apr 11, 2019

bors commented Apr 11, 2019

nox commented Apr 13, 2019

oconnor663 commented Oct 4, 2022 • edited Loading

nox commented Mar 15, 2019 •

edited

Loading

We can't get `&mut` values for the entire DOM at once

Why do you need `borrow_state` then?

RalfJung commented Mar 16, 2019 •

edited

Loading

rfcbot commented Mar 18, 2019 •

edited by dtolnay

Loading

nox commented Mar 22, 2019 •

edited

Loading

oconnor663 commented Oct 4, 2022 •

edited

Loading