Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split for BatArray #443

Closed
UnixJunkie opened this issue Sep 9, 2013 · 15 comments
Closed

split for BatArray #443

UnixJunkie opened this issue Sep 9, 2013 · 15 comments

Comments

@UnixJunkie
Copy link
Member

I'd like a split for arrays.
It would do roughly what split does for sets.
PRECONDITION: the array is sorted in increasing order

@ghost ghost assigned UnixJunkie Sep 22, 2013
c-cube added a commit to c-cube/batteries-included that referenced this issue Dec 4, 2013
@UnixJunkie
Copy link
Member Author

The code is interesting, I hope this won't get forgotten.

@gasche
Copy link
Member

gasche commented Dec 10, 2013

Thinking about split more (thanks for the reminder), I'm not sure we need several cases -- maybe we can just return a couple of integers each time. I may be wrong, though.

The defining property of split should be the ability to get the original array back by extracting sub-arrays indicated by split and concatenating them back together. So if split ord t returns the pair (i, j), I would expect:

  • sub t 0 i to be the elements strictly lower than our element
  • sub t j (j-i) to be the elements exactly equal to our element
  • sub j (length t - j) to be the elements strictly above our element

Given that sub t i 0 returns the empty array, I believe we could return (0, 0) in the AllBigger case, and (length t, length t) in the AllLower case, without inconsistency nor loss of information. Does that seem reasonable?

(If we can do that, we must; it is strictly better as in most situations it will free our user from the duty of handling edge cases.)

@UnixJunkie
Copy link
Member Author

I also think that returning a pair then using BatArray.sub would be nice.

I think split should split in two, not in three. Cf. for BatSet:

val split_lt: elt -> t -> t * t
(** [split_lt x s] returns a pair of sets [(l, r)], such that
[l] is the subset of [s] with elements < [x];
[r] is the subset of [s] with elements >= [x]. *)

val split_le: elt -> t -> t * t
(** [split_le x s] returns a pair of sets [(l, r)], such that
[l] is the subset of [s] with elements <= [x];
[r] is the subset of [s] with elements > [x]. *)

I still think it could use BatArray.bsearch to be implemented (bsearch is in another
pull request from Simon Cruanes).

@c-cube
Copy link
Member

c-cube commented Dec 11, 2013

The idea of splitting the split functions into several functions is interesting, but actually I think it may be better to let the user compute the two "edges" (frontier under which elements are strictly lower than the pivot, frontier above which elements are strictly higher). Again, a sorted array is a multiset and not a set, so its split function has more information to return than Set.split (imho).

@gasche: at first my code would return a pair, but corner cases were awkward. Maybe with your definition (returning information needed by sub) it would be simpler and cleaner, so I'm interested.

c-cube added a commit to c-cube/batteries-included that referenced this issue Dec 16, 2013
@UnixJunkie
Copy link
Member Author

OK, I understand now.
I think only the case where bsearch returned : `At of int
some post processing work is needed to implement split.
Maybe during this week I can try this: implementing split with
bsearch, If someone doesn't do it before me...

@c-cube
Copy link
Member

c-cube commented Dec 16, 2013

@UnixJunkie the problem imho with using bsearch is the pathological case where you have an array of 10,000 times the same element, and you split w.r.t this given element. In this case, bsearch can return any index (any is valid), but you'd potentially have to explore the whole array to find edges (for split) if you don't use binary search for each edge.

@UnixJunkie
Copy link
Member Author

Indeed.

@gasche
Copy link
Member

gasche commented Dec 16, 2013

I gave the version returning just a pair of integers a try in the following commit. Any feedback would be welcome.

I have the (light) intuition that it is better, but I'll try not to force my opinion on this. I think that we should think about whether the user needs to be aware of the edge-cases and test for them (in which case Simon's version is definitely better as it forces this special-logic by typing), or whether in natural situations the edge-cases would "do the right thing" with my implementation (in which case it's preferrable to use it rather than bother the users with them). My strong confidence in theorems would make me say that, if the property using "sub" looks so right, it's probably that this code does the right thing.

@gasche
Copy link
Member

gasche commented Dec 16, 2013

I just noticed that we don't have a testcase agains the Invalid_argument situation. This will need to be taken care of before any version goes upstream.

@gasche gasche mentioned this issue Dec 17, 2013
@c-cube
Copy link
Member

c-cube commented Dec 19, 2013

I think @gasche 's solution, that just returns a pair of integers, is actually simpler (has a simple and well defined semantics based on Array.sub). So we should merge his code.

@UnixJunkie
Copy link
Member Author

I also like the semantic with integer pairs.

@gasche
Copy link
Member

gasche commented Apr 27, 2016

split was just merged in master, will be part of 3.0.

@gasche gasche closed this as completed Apr 27, 2016
gasche pushed a commit to gasche/batteries-included that referenced this issue May 8, 2016
rixed pushed a commit to rixed/batteries-included that referenced this issue Oct 1, 2019
@rixed
Copy link
Contributor

rixed commented Oct 1, 2019

Reopening as we have too many Array.splits now: this one, that's been waiting on the v3 branch for a while and for a reason that eludes me, and another one added on master more recently that mimick List.split in that it splits an array of pairs into a pair of arrays.

To solve this conflict of name, I suggest @UnixJunkie and @c-cube go fight a pistol-duel next Sunday morning and we keep the survivor's implementation, or I merely rename this split into something like "pivot_split", because that's what it evokes to me, unless someone has a better suggestion?

@rixed rixed reopened this Oct 1, 2019
rixed added a commit to rixed/batteries-included that referenced this issue Oct 1, 2019
Also and add it to the Cap submodule.

Closes ocaml-batteries-team#443
rixed added a commit to rixed/batteries-included that referenced this issue Oct 1, 2019