Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement placement-in protocol for HashMap #40390

Merged
merged 1 commit into from
Mar 12, 2017
Merged

Implement placement-in protocol for HashMap #40390

merged 1 commit into from
Mar 12, 2017

Conversation

F001
Copy link
Contributor

@F001 F001 commented Mar 9, 2017

CC #30172

r? @nagisa

@nagisa
Copy link
Member

nagisa commented Mar 9, 2017

While this works technically, the implementation is not correct. The point of the placement-in protocol is to put value directly into some place, in this case into the HashMap in such a way that copies are avoided. So the Place::pointer should return a pointer to some place directly inside the HashMap allocated storage, and not field in EntryPlace.

To implement this you will likely need to do some internal changes to the Entry(-ies), so it would be possible to obtain a pointer for both Vacant and Occupied entry.

@nrc
Copy link
Member

nrc commented Mar 9, 2017

@bors: delegate @nagisa

@nrc nrc self-assigned this Mar 9, 2017
@F001
Copy link
Contributor Author

F001 commented Mar 9, 2017

Thank you for the review comment!

cc @arthurprs Please correct me if anything is wrong.

I used a temporary field to store the value because of panic safety.

AFAK, if the Place::pointer need to return a pointer to some place directly inside the HashMap allocated storage, I have to do robin_hood first in make_place phase. This will affect existing elements in HashMap. If panic occurs later, I don't know any roll back mechanism to restore valid state. It is the main difference of the implementation of placement-in between VecDeque and HashMap.

I'm looking forward to your suggestions.

@arthurprs
Copy link
Contributor

arthurprs commented Mar 10, 2017

Your suggestion sounds ok to me. It will avoid unnecessary V copies for Entry::Vacant. To avoid any unnecessary V copies for Entry::Occupied you probably need a variant of robin_hood that will make space without copying the uninitialized V into the bucket.

For rollback you can implement Drop for EntryPlace (drop still runs in case of panics) and use pop_internal to fix the table if it comes to that (forget what it returns). BinaryHeap uses a similar strategy to avoid corrupting the structure if T comparisons panics.

@F001
Copy link
Contributor Author

F001 commented Mar 10, 2017

Thanks for your suggestion. I have updated the implementation. For now, it can avoid unnecessary V copy for Entry::Vacant. I'll continue to investigate more optimization.

issue = "30172")]
pub struct EntryPlace<'a, K: 'a, V: 'a> {
bucket: Option<FullBucketMut<'a, K, V>>,
panicked: Cell<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using a finalized flag instead. Also, the flag should probably be the last field as it may save 7 bytes of stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested below, using forget can avoid the flag.

reason = "struct name and placement protocol is subject to change",
issue = "30172")]
pub struct EntryPlace<'a, K: 'a, V: 'a> {
bucket: Option<FullBucketMut<'a, K, V>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something obvious but do we really need to wrap the bucket with Option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just lazy that I want to use existing FullBucket::take to remove the entry. It takes a self parameter. But in the drop method, there is only &mut self, the bucket field can't move.

It is fixed by adding another FullBucket::remove method, which takes a &mut self parameter. In drop method, I can call this remove now.

impl<'a, K, V> InPlace<V> for EntryPlace<'a, K, V> {
type Owner = ();

unsafe fn finalize(self) {
Copy link
Contributor

@arthurprs arthurprs Mar 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about this you can forget(self) here, avoiding the flag altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Fixed.

Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement over the previous version! @arthurprs’ notes seem very relevant (and they are also much more familiar with the HashMap code), so these should be fixed.

issue = "30172")]
pub struct EntryPlace<'a, K: 'a, V: 'a> {
bucket: Option<FullBucketMut<'a, K, V>>,
panicked: Cell<bool>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are unlikely to have more than one of these per hashmap alive at a time, so this is not very concerning. Also we’re getting field reordering soon, which will do this for everybody automatically.

@nagisa
Copy link
Member

nagisa commented Mar 10, 2017

I realised there’s one possible alternative in behaviour. Current implementation tries to recover the previous value if the placement expression fails, however it is not obvious to me whether this is a better approach compared to, say, simply making the key vacant in case of panic.

Here are some points in favour of leaving the entry vacant instead of restoring the value if panic happens:

  1. Saving the old value involves a copy, thus negating most/all of the point of placement-in (as few copies as possible);
  2. Panicking is a very exceptional situation that is not supposed to be recovered from. This means that all the Drop should be responsible for is restoring HashMap into a state that’s safe to Drop, that’s all. Making entry vacant seems equivalent to restoring the old value in that sense.

@arthurprs
Copy link
Contributor

Very good points, leaving a previous filled bucket empty on panic sounds reasonable.

@nagisa
Copy link
Member

nagisa commented Mar 10, 2017

cc @rust-lang/libs

@aturon
Copy link
Member

aturon commented Mar 10, 2017

cc @rust-lang/libs, anyone have feedback on @nagisa's last comment?

@sfackler
Copy link
Member

I agree that the precise state of the value being modified doesn't matter too much.

self.table.size -= 1;
unsafe {
*self.raw.hash = EMPTY_BUCKET;
ptr::read(self.raw.pair); // drop right now
Copy link
Member

@nagisa nagisa Mar 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is possibly incorrect. I think you’ll notice why if you add a test that looks like this (you probably should one similar to it):

struct Banana<'a>(&'a mut bool);
impl Drop for Banana {
    fn drop(&mut self) {
        if !*self.0 { panic!("double drop!"); }
        *self.0 = false;
    }
}

let mut hm = HashMap::new();
let mut can_drop = true;
hm.insert(0, Banana(&mut can_drop));
hm.entry(0) <- panic!("boom") ;
// first drop happens in `make_place`, where the `Banana(true)` gets dropped and `can_drop` is set to false
// then a `*place.pointer() = panic!("boom")` is executed, which unwinds, thus dropping the place
// place destructor drops the `Banana(false)`, and thus double-panic occurs and the process aborts.
//
// In other words, current implementation of Drop reads uninitialized memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! Good point. Fixed.

Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 more, likely final, tweaks.

self.table.size -= 1;
unsafe {
*self.raw.hash = EMPTY_BUCKET;
ptr::read(self.raw.pair); // drop right now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: that the code might not reproduce exactly the way I described it, but it is still reading uninitialized memory.

let b = match self {
Occupied(mut o) => {
let uninit = unsafe { mem::uninitialized() };
o.insert(uninit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid doing this mem::uninitialized dance by simply doing a

std::ptr::drop_in_place(o.elem.bucket.read_mut().1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

issue = "30172")]
impl<'a, K, V> Drop for EntryPlace<'a, K, V> {
fn drop(&mut self) {
self.bucket.remove();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will drop and uninitialized V as you only inserted the key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, nagisa has mentioned this. I'm fixing it.

Copy link
Member

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve only got nits left. Marking the functions internal functions as unsafe makes sense as they leave around uninitialized data which the caller should handle appropriately.

r=me once nits are fixed

assert_eq!(map.len(), 9);
assert!(!map.contains_key(&100));

// correctly drop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be factored out into a separate test. (i.e. a different #[test] function)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


/// Remove this bucket's key and value from the hashtable.
/// Only used for inplacement insertion.
pub fn remove_key(&mut self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, whole function unsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


/// Puts given key, remain value uinitialized.
/// It is only used for inplacement insertion.
pub fn put_key(mut self, hash: SafeHash, key: K) -> FullBucket<K, V, M> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d probably make this whole function unsafe. (i.e. pub unsafe fn put key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

// Only used for InPlacement insert. Avoid unnecessary value copy.
fn insert_key(self) -> FullBucketMut<'a, K, V> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be unsafe fn too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

@bors r+

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

Oh, bors didn’t notice the delegation above :/

@eddyb
Copy link
Member

eddyb commented Mar 12, 2017

@bors delegate=nagisa

@bors
Copy link
Contributor

bors commented Mar 12, 2017

✌️ @nagisa can now approve this pull request

@nagisa
Copy link
Member

nagisa commented Mar 12, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Mar 12, 2017

📌 Commit 584c798 has been approved by nagisa

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Mar 12, 2017
Implement placement-in protocol for `HashMap`

CC rust-lang#30172

r? @nagisa
bors added a commit that referenced this pull request Mar 12, 2017
Rollup of 5 pull requests

- Successful merges: #40369, #40390, #40426, #40449, #40453
- Failed merges:
@bors bors merged commit 584c798 into rust-lang:master Mar 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants