Skip to content

Commit

Permalink
Create C api for two-locus branch stats
Browse files Browse the repository at this point in the history
Adds a C implementation of two-locus branch statistics. It mirrors the
python implementation except where we iterate over edge differences and
collect them for updating the stat. We use tree_seek_index to seek to
arbitrary positions and tsk_tree_next to move from tree to tree. The
tricky part was getting backwards iteration correct. All tests agree
with the python prototype.

In addition, I had to fix a bug in the python implementation where node
ids were being added to our TreeState object instead of sample ids
(encoded in the sample index map). The python tests have also been
updated to remove the slow naive version (after validating that it
agrees with the python and c implementation -- on test cases where the
runtime was reasonable).

The CPython code has been updated to parse positions in addition to
sites.

I also found the need to clean up some of the bounds checking code to
return reasonable error messages to the user (also updated in the
two-locus site stats).

Finally, I added a few unbiased statistics for use in validating this
code.
  • Loading branch information
lkirk committed Aug 28, 2024
1 parent f0bc3aa commit 71a10d2
Show file tree
Hide file tree
Showing 8 changed files with 1,227 additions and 330 deletions.
240 changes: 199 additions & 41 deletions c/tests/test_stats.c

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions c/tskit/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,9 @@ tsk_strerror_internal(int err)
ret = "One of the kept rows in the table refers to a deleted row. "
"(TSK_ERR_KEEP_ROWS_MAP_TO_DELETED)";
break;
case TSK_ERR_POSITION_OUT_OF_BOUNDS:
ret = "Position out of bounds. (TSK_ERR_POSITION_OUT_OF_BOUNDS)";
break;

/* Edge errors */
case TSK_ERR_NULL_PARENT:
Expand Down Expand Up @@ -502,6 +505,24 @@ tsk_strerror_internal(int err)
ret = "Times must be strictly increasing. (TSK_ERR_UNSORTED_TIMES)";
break;

/* Two locus errors */
case TSK_ERR_STAT_UNSORTED_POSITIONS:
ret = "The provided positions are not sorted in strictly increasing "
"order. (TSK_ERR_STAT_UNSORTED_POSITIONS)";
break;
case TSK_ERR_STAT_DUPLICATE_POSITIONS:
ret = "The provided positions contain duplicates. "
"(TSK_ERR_STAT_DUPLICATE_POSITIONS)";
break;
case TSK_ERR_STAT_UNSORTED_SITES:
ret = "The provided sites are not sorted in strictly increasing position "
"order. (TSK_ERR_STAT_UNSORTED_SITES)";
break;
case TSK_ERR_STAT_DUPLICATE_SITES:
ret = "The provided sites contain duplicated entries. "
"(TSK_ERR_STAT_DUPLICATE_SITES)";
break;

/* Mutation mapping errors */
case TSK_ERR_GENOTYPES_ALL_MISSING:
ret = "Must provide at least one non-missing genotype. "
Expand Down
21 changes: 21 additions & 0 deletions c/tskit/core.h
Original file line number Diff line number Diff line change
Expand Up @@ -370,6 +370,11 @@ One of the rows in the retained table refers to a row that has been
deleted.
*/
#define TSK_ERR_KEEP_ROWS_MAP_TO_DELETED -212
/**
A genomic position was less than zero or greater equal to the sequence
length
*/
#define TSK_ERR_POSITION_OUT_OF_BOUNDS -213

/** @} */

Expand Down Expand Up @@ -710,6 +715,22 @@ The vector of quantiles is out of bounds or in nonascending order.
Times are not in ascending order
*/
#define TSK_ERR_UNSORTED_TIMES -917
/*
The provided positions are not provided in strictly increasing order
*/
#define TSK_ERR_STAT_UNSORTED_POSITIONS -918
/**
The provided positions are not unique
*/
#define TSK_ERR_STAT_DUPLICATE_POSITIONS -919
/**
The provided sites are not provided in strictly increasing position order
*/
#define TSK_ERR_STAT_UNSORTED_SITES -920
/**
The provided sites are not unique
*/
#define TSK_ERR_STAT_DUPLICATE_SITES -921
/** @} */

/**
Expand Down
Loading

0 comments on commit 71a10d2

Please sign in to comment.