Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose rebuild and implement drain_filter for BinaryHeap #104210

Closed
wants to merge 7 commits into from

Conversation

dis-da-moe
Copy link

@dis-da-moe dis-da-moe commented Nov 9, 2022

@rustbot
Copy link
Collaborator

rustbot commented Nov 9, 2022

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Mark-Simulacrum (or someone else) soon.

Please see the contribution instructions for more information.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 9, 2022
@rustbot
Copy link
Collaborator

rustbot commented Nov 9, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@dis-da-moe
Copy link
Author

@rustbot label +T-libs-api -T-lib

@rustbot rustbot added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 9, 2022
@Mark-Simulacrum
Copy link
Member

BinaryHeap doesn't expose iter_mut today either (see #98522 -- this is largely intentional) -- it seems like there's not an API for mutably accessing arbitrary elements. It seems like the implementation here would not leave a correct heap behind if elements are removed (unless I'm missing something)? If that's the intended API, it doesn't seem like a huge benefit over converting into Vec and then calling drain_filter on that.

Can you say more about your use case for this?

r? libs-api -- I'm personally not convinced we should have this API, so shifting to libs-api for review.

@rustbot rustbot assigned dtolnay and unassigned Mark-Simulacrum Nov 13, 2022
@Mark-Simulacrum Mark-Simulacrum removed the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Nov 13, 2022
@dis-da-moe
Copy link
Author

Seeing the issues with IterMut, should the closure accept & T instead? I'm not sure what the use case for this method is since I only saw the issue and decided to implement it. This is my first contribution so advice is appreciated.

@Mark-Simulacrum
Copy link
Member

I think &T/&mut T help, but the broader issue is that drain_filter will leave the BinaryHeap in a non-heap state unless we do a full rebuild afterwards, presuming a subset of the elements are removed. (I think this PR may be missing that rebuild).

If we do a full rebuild, there's no real advantage to having the API on BinaryHeap vs. on Vec, and it seems like an unusual thing to want to do, so that further decreases in my eyes the desire to add it.

The (new) assigned reviewer or libs-api in general will likely weigh in within the next week or two on whether this is an API we want on BinaryHeap or not.

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation/example code as currently written in the PR is definitely misleading as to the implemented behavior and the impact on BinaryHeap's heap invariant and all its other methods. In particular it appears as though the initial data in the example code has been cherry-picked so that the heap coincidentally remains a heap after the drain_filter operation (notice the specific omitted integers in the 1..=15 range). The same code with different initial data would not behave in the way that is implied by the PR:

let mut a = BinaryHeap::from(vec![4, 1, 3]);
let mut _evens = a.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>();
let odds = a.into_sorted_vec();
assert_eq!(odds, vec![1, 3]); // fail, it's actually [3, 1] which is not sorted

I would prefer not to add this method, nor the alternative discussed above where it rebuilds the heap during Drop of the iterator. My counterproposal is that users should use either into_vec() (if they own the BinaryHeap) or mem::take (if they have &mut), then do their drain_filter on the Vec, then use BinaryHeap::from to rebuild the heap invariant.

I think it could be reasonable for BinaryHeap to expose methods that allow temporarily breaking the heap invariant and later cleaning it up, as long as someone proposes a naming convention or other API approach that makes it clear when a method is destructive toward the heap invariant. The "later cleaning it up" part is already potentially useful even without adding a drain_filter-like method, since heaps that contain interior mutability (like BinaryHeap<Cell<i32>>) can already end up broken just using the current API. This is described as a logic error here:

/// It is a logic error for an item to be modified in such a way that the
/// item's ordering relative to any other item, as determined by the [`Ord`]
/// trait, changes while it is in the heap. This is normally only possible
/// through [`Cell`], [`RefCell`], global state, I/O, or unsafe code. The
/// behavior resulting from such a logic error is not specified, but will
/// be encapsulated to the `BinaryHeap` that observed the logic error and not
/// result in undefined behavior. This could include panics, incorrect results,
/// aborts, memory leaks, and non-termination.

but suppose we document which methods in BinaryHeap's API do not assume the heap invariant as a precondition (like iter and into_vec), and then say it's only a logic error to call methods outside of that set while the heap invariant is messed up, and change rebuild() to public (the same operation used by From<Vec<T>> for BinaryHeap<T>) so that the user can restore the heap invariant in-place after a sequence of destructive operations, or mutating Cells or whatever they've done.

Once that's figured out I think it would be reasonable to talk about reintroducing a drain_filter operation that is documented as leaving the heap invariant violated.

@dtolnay dtolnay added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 22, 2022
@dis-da-moe
Copy link
Author

I didn't mean to cherry-pick the data. The omitted integers were a typo. I also didn't know that the heap was left in an invalid state.

Would it be useful to encode whether the BinaryHeap is valid or not in the type system instead of relying on documentation? For example a drain_filter would return an InvalidBinaryHeap which would support the rebuild method to a regular BinaryHeap.

@dtolnay
Copy link
Member

dtolnay commented Nov 22, 2022

An InvalidBinaryHeap is just Vec<T>. I don't expect there would be sufficient utility in having another type.

The use case for rebuild() is for &mut self methods where doing the mem::take/into_vec()/BinaryHeap::from/mem::replace dance is awkward and verbose. It doesn't seem like adding a type reduces the awkward/verbosity in this situation, whereas allowing BinaryHeap to be temporarily invalid does.

pub struct Stuff {
    heap: BinaryHeap<T>,
}

impl Stuff {
    pub fn method(&mut self) {
        // temporarily violate heap invariant in place
        ...
        // then
        self.heap.rebuild();
    }
}

@dis-da-moe dis-da-moe changed the title implement drain_filter for BinaryHeap expose rebuild and implement drain_filter for BinaryHeap Nov 22, 2022
@dtolnay dtolnay added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Dec 12, 2022
@Dylan-DPC Dylan-DPC added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 15, 2022
@bors
Copy link
Contributor

bors commented Dec 28, 2022

☔ The latest upstream changes (presumably #106209) made this pull request unmergeable. Please resolve the merge conflicts.

@m-ou-se m-ou-se removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Jan 10, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 15, 2023
Leak amplification for peek_mut() to ensure BinaryHeap's invariant is always met

In the libs-api team's discussion around rust-lang#104210, some of the team had hesitations around exposing malformed BinaryHeaps of an element type whose Ord and Drop impls are trusted, and which does not contain interior mutability.

For example in the context of this kind of code:

```rust
use std::collections::BinaryHeap;
use std::ops::Range;
use std::slice;

fn main() {
    let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'];
    let cut_points = BinaryHeap::from(vec![4, 2, 7]);
    println!("{:?}", chop(slice, cut_points));
}

// This is a souped up slice::split_at_mut to split in arbitrary many places.
//
// usize's Ord impl is trusted, so 1 single bounds check guarantees all those
// output slices are non-overlapping and in-bounds
fn chop<T>(slice: &mut [T], mut cut_points: BinaryHeap<usize>) -> Vec<&mut [T]> {
    let mut vec = Vec::with_capacity(cut_points.len() + 1);
    let max = match cut_points.pop() {
        Some(max) => max,
        None => {
            vec.push(slice);
            return vec;
        }
    };

    assert!(max <= slice.len());

    let len = slice.len();
    let ptr: *mut T = slice.as_mut_ptr();
    let get_unchecked_mut = unsafe {
        |range: Range<usize>| &mut *slice::from_raw_parts_mut(ptr.add(range.start), range.len())
    };

    vec.push(get_unchecked_mut(max..len));
    let mut end = max;
    while let Some(start) = cut_points.pop() {
        vec.push(get_unchecked_mut(start..end));
        end = start;
    }
    vec.push(get_unchecked_mut(0..end));
    vec
}
```

```console
[['7', '8', '9'], ['4', '5', '6'], ['2', '3'], ['0', '1']]
```

In the current BinaryHeap API, `peek_mut()` is the only thing that makes the above function unsound.

```rust
let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'];
let mut cut_points = BinaryHeap::from(vec![4, 2, 7]);
{
    let mut max = cut_points.peek_mut().unwrap();
    *max = 0;
    std::mem::forget(max);
}
println!("{:?}", chop(slice, cut_points));
```

```console
[['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], [], ['2', '3'], ['0', '1']]
```

Or worse:

```rust
let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'];
let mut cut_points = BinaryHeap::from(vec![100, 100]);
{
    let mut max = cut_points.peek_mut().unwrap();
    *max = 0;
    std::mem::forget(max);
}
println!("{:?}", chop(slice, cut_points));
```

```console
[['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], [], ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '\u{1}', '\0', '?', '翾', '?', '翾', '\0', '\0', '?', '翾', '?', '翾', '?', '啿', '?', '啿', '?', '啿', '?', '啿', '?', '啿', '?', '翾', '\0', '\0', '񤬐', '啿', '\u{5}', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\u{8}', '\0', '`@',` '\0', '\u{1}', '\0', '?', '翾', '?', '翾', '?', '翾', '
thread 'main' panicked at 'index out of bounds: the len is 33 but the index is 33', library/core/src/unicode/unicode_data.rs:319:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

---

This PR makes `peek_mut()` use leak amplification (https://doc.rust-lang.org/1.66.0/nomicon/leaking.html#drain) to preserve the heap's invariant even in the situation that `PeekMut` gets leaked.

I'll also follow up in the tracking issue of unstable `drain_sorted()` (rust-lang#59278) and `retain()` (rust-lang#71503).
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this proposal with the library API team, and came away not on board with exposing invalid heaps. The preference was in the opposite direction (#105851) by closing up all the API gaps that currently allow invalid heaps to occur.

A drain_filter which involves rebuilding the heap in Drop would perhaps be okay (there wasn't consensus about this; some saw drain_filter as just not a good fit for a heap data structure, and the heap rebuild operation is already available via the conversion from Vec) but only if leaking the DrainFilter iterator cannot possibly leave the heap in the invalid state.

@dis-da-moe
Copy link
Author

Alright. I don't think I'll go through with this since it seems like not the best design. Thanks for the experience.

@dis-da-moe dis-da-moe closed this Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add drain_filter for BinaryHeap
8 participants