split for BatArray #443

UnixJunkie · 2013-09-09T01:16:30Z

I'd like a split for arrays.
It would do roughly what split does for sets.
PRECONDITION: the array is sorted in increasing order

UnixJunkie · 2013-12-10T01:56:45Z

The code is interesting, I hope this won't get forgotten.

gasche · 2013-12-10T08:59:25Z

Thinking about split more (thanks for the reminder), I'm not sure we need several cases -- maybe we can just return a couple of integers each time. I may be wrong, though.

The defining property of split should be the ability to get the original array back by extracting sub-arrays indicated by split and concatenating them back together. So if split ord t returns the pair (i, j), I would expect:

sub t 0 i to be the elements strictly lower than our element
sub t j (j-i) to be the elements exactly equal to our element
sub j (length t - j) to be the elements strictly above our element

Given that sub t i 0 returns the empty array, I believe we could return (0, 0) in the AllBigger case, and (length t, length t) in the AllLower case, without inconsistency nor loss of information. Does that seem reasonable?

(If we can do that, we must; it is strictly better as in most situations it will free our user from the duty of handling edge cases.)

UnixJunkie · 2013-12-11T00:53:10Z

I also think that returning a pair then using BatArray.sub would be nice.

I think split should split in two, not in three. Cf. for BatSet:

val split_lt: elt -> t -> t * t
(** [split_lt x s] returns a pair of sets [(l, r)], such that
[l] is the subset of [s] with elements < [x];
[r] is the subset of [s] with elements >= [x]. *)

val split_le: elt -> t -> t * t
(** [split_le x s] returns a pair of sets [(l, r)], such that
[l] is the subset of [s] with elements <= [x];
[r] is the subset of [s] with elements > [x]. *)

I still think it could use BatArray.bsearch to be implemented (bsearch is in another
pull request from Simon Cruanes).

c-cube · 2013-12-11T08:57:38Z

The idea of splitting the split functions into several functions is interesting, but actually I think it may be better to let the user compute the two "edges" (frontier under which elements are strictly lower than the pivot, frontier above which elements are strictly higher). Again, a sorted array is a multiset and not a set, so its split function has more information to return than Set.split (imho).

@gasche: at first my code would return a pair, but corner cases were awkward. Maybe with your definition (returning information needed by sub) it would be simpler and cleaner, so I'm interested.

UnixJunkie · 2013-12-16T09:10:51Z

OK, I understand now.
I think only the case where bsearch returned : `At of int
some post processing work is needed to implement split.
Maybe during this week I can try this: implementing split with
bsearch, If someone doesn't do it before me...

c-cube · 2013-12-16T09:24:34Z

@UnixJunkie the problem imho with using bsearch is the pathological case where you have an array of 10,000 times the same element, and you split w.r.t this given element. In this case, bsearch can return any index (any is valid), but you'd potentially have to explore the whole array to find edges (for split) if you don't use binary search for each edge.

UnixJunkie · 2013-12-16T10:04:33Z

Indeed.

gasche · 2013-12-16T16:53:50Z

I gave the version returning just a pair of integers a try in the following commit. Any feedback would be welcome.

I have the (light) intuition that it is better, but I'll try not to force my opinion on this. I think that we should think about whether the user needs to be aware of the edge-cases and test for them (in which case Simon's version is definitely better as it forces this special-logic by typing), or whether in natural situations the edge-cases would "do the right thing" with my implementation (in which case it's preferrable to use it rather than bother the users with them). My strong confidence in theorems would make me say that, if the property using "sub" looks so right, it's probably that this code does the right thing.

gasche · 2013-12-16T16:55:21Z

I just noticed that we don't have a testcase agains the Invalid_argument situation. This will need to be taken care of before any version goes upstream.

c-cube · 2013-12-19T14:51:46Z

I think @gasche 's solution, that just returns a pair of integers, is actually simpler (has a simple and well defined semantics based on Array.sub). So we should merge his code.

UnixJunkie · 2013-12-20T00:45:03Z

I also like the semantic with integer pairs.

gasche · 2016-04-27T02:42:11Z

split was just merged in master, will be part of 3.0.

rixed · 2019-10-01T06:11:20Z

Reopening as we have too many Array.splits now: this one, that's been waiting on the v3 branch for a while and for a reason that eludes me, and another one added on master more recently that mimick List.split in that it splits an array of pairs into a pair of arrays.

To solve this conflict of name, I suggest @UnixJunkie and @c-cube go fight a pistol-duel next Sunday morning and we keep the survivor's implementation, or I merely rename this split into something like "pivot_split", because that's what it evokes to me, unless someone has a better suggestion?

Also and add it to the Cap submodule. Closes ocaml-batteries-team#443

ghost assigned UnixJunkie Sep 22, 2013

c-cube added a commit to c-cube/batteries-included that referenced this issue Dec 4, 2013

Array.split implemented, closing ocaml-batteries-team#443

61976d8

c-cube added a commit to c-cube/batteries-included that referenced this issue Dec 16, 2013

Array.split implemented, as requested in ocaml-batteries-team#443

6dd6d67

gasche mentioned this issue Dec 17, 2013

Batarray #470

Closed

UnixJunkie added the for next major release label Nov 12, 2015

gasche mentioned this issue Mar 23, 2016

Promote to 4.03 #670

Closed

gasche pushed a commit that referenced this issue Apr 27, 2016

Array.split implemented, as requested in #443

07a4c2d

gasche closed this as completed Apr 27, 2016

gasche pushed a commit to gasche/batteries-included that referenced this issue May 8, 2016

Array.split implemented, as requested in ocaml-batteries-team#443

4b159c7

gasche pushed a commit that referenced this issue May 12, 2016

Array.split implemented, as requested in #443

9f98e82

gasche pushed a commit that referenced this issue Oct 28, 2016

Array.split implemented, as requested in #443

8455db0

gasche pushed a commit that referenced this issue Dec 6, 2016

Array.split implemented, as requested in #443

eac76e9

gasche pushed a commit that referenced this issue Feb 2, 2017

Array.split implemented, as requested in #443

05ec469

gasche pushed a commit that referenced this issue Feb 2, 2017

Array.split implemented, as requested in #443

5f0062d

rixed pushed a commit to rixed/batteries-included that referenced this issue Oct 1, 2019

Array.split implemented, as requested in ocaml-batteries-team#443

6740e7c

rixed reopened this Oct 1, 2019

rixed added a commit to rixed/batteries-included that referenced this issue Oct 1, 2019

BatArray: rename v3's split into pivot_split

b7c4e50

Also and add it to the Cap submodule. Closes ocaml-batteries-team#443

rixed added a commit to rixed/batteries-included that referenced this issue Oct 1, 2019

BatArray: rename v3's split into pivot_split

219cd73

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split for BatArray #443

split for BatArray #443

UnixJunkie commented Sep 9, 2013

UnixJunkie commented Dec 10, 2013

gasche commented Dec 10, 2013

UnixJunkie commented Dec 11, 2013

c-cube commented Dec 11, 2013

UnixJunkie commented Dec 16, 2013

c-cube commented Dec 16, 2013

UnixJunkie commented Dec 16, 2013

gasche commented Dec 16, 2013

gasche commented Dec 16, 2013

c-cube commented Dec 19, 2013

UnixJunkie commented Dec 20, 2013

gasche commented Apr 27, 2016

rixed commented Oct 1, 2019

split for BatArray #443

split for BatArray #443

Comments

UnixJunkie commented Sep 9, 2013

UnixJunkie commented Dec 10, 2013

gasche commented Dec 10, 2013

UnixJunkie commented Dec 11, 2013

c-cube commented Dec 11, 2013

UnixJunkie commented Dec 16, 2013

c-cube commented Dec 16, 2013

UnixJunkie commented Dec 16, 2013

gasche commented Dec 16, 2013

gasche commented Dec 16, 2013

c-cube commented Dec 19, 2013

UnixJunkie commented Dec 20, 2013

gasche commented Apr 27, 2016

rixed commented Oct 1, 2019