Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Set.choose return the fastest available element #751

Merged
merged 1 commit into from
May 5, 2017

Conversation

rixed
Copy link
Contributor

@rixed rixed commented May 3, 2017

I've always assumed Set.choose was returning the quickest element available and
that it had complexity O(1). Instead, I just discovered that it's actually
looking for the smallest element, for no good reason that I can see. A comment
from @thelema suggests that he also expected that: "I'd rather this chose the
root, but okay".

Maybe there were some good reasons at the time not to return the root? In case
those reasons do not apply any more this patch makes it return the root.

@gasche
Copy link
Member

gasche commented May 3, 2017

The specification says that choose will return equal values for equal set/map. I am not convinced that the root element is equal for all possible balancing of the same set of elements. If you can prove that it is, or if you can implement a function (relying on the ranks of the subtree for example) that gets an element in a way that is robust to rebalancing and is faster than just taking the min, go for it. Note that taking the mean is not too slow in practice as it's only a logarithmic distance away.

@UnixJunkie
Copy link
Member

we need a new function for what rixed asked. I think it could be called any, and we should say
in the specification that it may not return the same element for equal sets. I guess it will
return same element only if the two sets were constructed with the same sequence of operations.
I know some people who were also interested by such a functionality (Jun Furuse, in the stdlib if I remember well).

@gasche
Copy link
Member

gasche commented May 3, 2017

Would it be interesting to also return "the rest" of the tree, or have a function for that? (If I want to recursively traverse the elements in no particular order, I would first start on the root, then look at the left and right subtrees.) Or can we convince ourselves that calling split on the any-returned value is just as efficient? (It does a compare call first, though, but we could optimize this with a physical equality first.)

@UnixJunkie
Copy link
Member

Your suggestion of also returning the rest of the tree makes for a more generic functionality.
That would cover use cases we don't see yet but that people may ask in the future.
Sounds good to me.

@rixed
Copy link
Contributor Author

rixed commented May 4, 2017

The specification says that choose will return equal values for equal set/map.

Indeed. The root would not be equal (empty + 0 + 1 will have 0 at its root while empty + 1 + 0 will have 1):

# module S = Set.Make(Int);;
# S.empty |> S.add 0 |> S.add 1 |> S.choose;;
- : S.elt = 0
# S.empty |> S.add 1 |> S.add 0 |> S.choose;;
- : S.elt = 1

Adding a test for that, and implemented any as suggested.

@c-cube
Copy link
Member

c-cube commented May 4, 2017

Agreed with @gasche , a function pop_top : t -> (elt * t * t) option would be the most general and safe. Note that this more or less constrains the implementation to be a balanced tree (probably already the case from other functions, I couldn't tell).

@rixed
Copy link
Contributor Author

rixed commented May 4, 2017

Updated the commit to implement the any suggested by @UnixJunkie.
pop_top would we useful but not when you want a quick glance at an element of the set.

Copy link
Member

@UnixJunkie UnixJunkie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test for any.

@UnixJunkie
Copy link
Member

@c-cube maybe pop_root is a better name. I like if there are two functions; one very specialized
that @rixed submitted code for already, plus the more general one which returns an option triplet.

@UnixJunkie
Copy link
Member

But we have some pop_ functions already, they have a different semantic: an (element, set) pair are returned...

@gasche
Copy link
Member

gasche commented May 4, 2017

The documentation should be clearer than "non-deterministic". You should have a full sentence pointing out that sets that have the same elements could still return different values. (This does not mean that calling the same function several time on the same value may give different result, which is what the current wording suggests.). As a minor detail, I would rather use @raise ... on a separate line (as in other specifications) because it gives a clearer ocamldoc output.

I would also be interested in having a similar function implemented in Map, returning a key/value pair: there is no reason not to have the same interface for both.

Finally, I think there remain the question of whether we want to return an 'a, or an 'a option. A function returning an option could be called some. Maybe we want both?

@UnixJunkie
Copy link
Member

I personally would like both.
And @gascher is right (as usual) that the Map module would welcome the same operations.

@rixed
Copy link
Contributor Author

rixed commented May 5, 2017

Updated the doc and rebased.

Finally, I think there remain the question of whether we want to return an 'a, or an 'a option. A function returning an option could be called some. Maybe we want both?

Why isn't Exceptionless.any not enough?

@rixed
Copy link
Contributor Author

rixed commented May 5, 2017

Map.choose already is O(1) but adding any as an alias for the sake of completeness.

@gasche
Copy link
Member

gasche commented May 5, 2017

Thanks for noticing this! If I understand correctly, the current definition of Map.choose is buggy and it should use min_elt instead.

@rixed
Copy link
Contributor Author

rixed commented May 5, 2017

Indeed, ocaml's stdlib uses min_binding for choose.
Fixing in this PR as it is somewhat related.

@gasche
Copy link
Member

gasche commented May 5, 2017

It may look like this is never-ending, but I think we are almost done!

  • you need a changelog entry
  • in BatSplay you also need choose to be turned into min_binding, given that it uses the same signature. The current implementation of choose is the right one for any.

Choose must return equal elements for equal sets/maps/splay-maps.  Therefore
we need another function to return (in constant time) any element from a
container with no such constraint.
@rixed
Copy link
Contributor Author

rixed commented May 5, 2017

Did all this, added a couple more tests and a Changelog entry.
Also, squashed everything into a single commit.

@gasche gasche merged commit c10c65a into ocaml-batteries-team:master May 5, 2017
@gasche
Copy link
Member

gasche commented May 5, 2017

Merged, thanks! Thanks to @UnixJunkie and @c-cube for their feedback as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants