Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tests #131

Merged
merged 2 commits into from
Jan 26, 2021
Merged

Refactor tests #131

merged 2 commits into from
Jan 26, 2021

Conversation

hyanwong
Copy link
Member

@hyanwong hyanwong commented Jan 24, 2021

This is only a draft of what can be done, so let me know if it looks right @petrelharp, and I can try rolling it out to the other tests too. Only the test_tree_sequence.py::TestSlimTime test has been properly altered to use the basic_slim_recipe fixture, but it does work entirely in parallel, both over the tests and over the recipes, e.g. if you do

pytest test_tree_sequence.py::TestSlimTime -n 4

for all the recipes, or just

pytest test_tree_sequence.py::TestSlimTime -k nonWF_recipe

for one of them. Annoyingly the way we decided to do it means that we call the test function for every script, but simply do nothing if the script doesn't need this test running on it. I suspect that means the total count of tests run will be over inflated . Perhaps there's a pytest way to decide, programmatically within the test, to say "skip this test and don't count it in the totals" (pytest-dev/pytest#3730 seems relevant)?

The general setup is mostly now done in conftest.py, and similar to what I did in SLiM. the slim_outfiles() function defines the file names, SLiM, variable names, and a function to apply if the file is produced. Instead of the tree files having the name of the recipe, by default they are all called out.trees, but they are stored in a directory named by the recipe instead. I pass 2 SLiM variable names in, one for the trees file and another for the pedigree, but the second isn't always used by the script (which is fine, I would have thought).

There's a FIXME because I'm not sure where in the code we should normally set the random seed if these tests are all parallelized.

The only fixture so far is basic_slim_recipe. I intend to have another one for restarted_slim_recipe, but I haven't yet worked out how I'll do the restarting thing, as it will rely on the previous recipe having been run.

Also, I have had to change the names of all the recipes to put "recipe" as a suffix, rather than a prefix (i.e. nonWF_recipe rather than, as previously, recipe_nonWF), because the -k flag is always a partial matching thing, so that if you do -k recipe_nonWF it will run both recipe_nonWF and recipe_nonWF_early. I hope the name change is OK.

@hyanwong hyanwong force-pushed the refactor-tests branch 2 times, most recently from 0585a30 to 8b9bcc1 Compare January 25, 2021 14:40
@hyanwong
Copy link
Member Author

Perhaps there's a pytest way to decide, programmatically within the test, to say "skip this test and don't count it in the totals" (pytest-dev/pytest#3730 seems relevant)?

Actually, I've just found a much nicer way to do all this, using @pytest.mark.parametrize to restrict which recipes we run the tests on:

class TestHasIndividualParents(tests.PyslimTestCase):

    pedigree_recipes = [k for k, v in basic_slim_recipe_specs.items() if "pedigree" in v]

    @pytest.mark.parametrize('basic_slim_recipe', pedigree_recipes, indirect=True)
    def test_pedigree_has_parents(self, basic_slim_recipe):
         # this will be called only for recipes listed with {'pedigree': True}
         ts = basic_slim_recipe["ts"]
         # ts is a ts with an associated pedigree file

@@ -0,0 +1,229 @@
generation stage individual age parent1 parent2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not commit these temporary files produced by tests!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Well spotted. Must have been left from a previous trial. Sorry!

tests/README.md Outdated
@@ -0,0 +1,25 @@
Install the dependencies using

`pip install -r requirements/CI/requirements.txt`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're saying python3 below don't you need to also say pip3?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, possibly. No harm in that. Will change.

tests/README.md Outdated
To restrict to specific recipes in the examples dir, the syntax is:

```
python3 -m pytest -k nonWF_recipe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice. But, what is nonWF_recipe? There aren't any example slim scripts named this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's obvious that these can be combined?

python3 -m pytest tests/test_tree_sequence.py::test_slim_generation -k nonWF_recipe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. But, that's recipe_nonWF, not nonWF_recipe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait maybe you renamed all the recipes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for that? I thought you were removing the repetitive recipe_ at the start... but it's at the end, now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -k flag matches its arg anywhere in the test string. Since there were some recipes named recipe_nonWF.slim and recipe_nonWF_early.slim, then running pytest -k recipe_nonWF would run both. If you put _recipe on the end then that serves as a useful terminator to stop this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the other way would be to require the .slim prefix to be passed: pytest -k recipe_nonWF.slim, and then I would have to cut off the .slim before checking for the settings in slim_recipe_specs.py. Perhaps you prefer this? It's a bit more fiddly though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I forgot that the prefix isn't unique either. So we have WF_early_recipe and nonWF_early_recipe, both of which will now match with -k WF_early_recipe. Argh. Looks like we need recipe_ back on the front and .slim on the end too.

Copy link
Contributor

@petrelharp petrelharp Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg, that's fiddly. But it makes it a bit more clear to have the full file name, IMO.

# These SLiM scripts read in an existing trees file; the "input" gives a .trees files produced
# by recipes above that is appropriate for starting this script.
restarted_slim_recipe_specs = {
"nucleotides_restart": {"WF": True, "nucleotides": True, "no_op": True, "input": f"tests/examples/recipe_nucleotides.trees"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this input looks different than the others

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be. I haven't tested this yet. I just want to get some feedback as to whether the general approach was OK.

os.makedirs(out_dir, exist_ok=True)
outfiles = run_slim(recipe=recipe_name, out_dir=out_dir)

ret = {o.key: o.post_process(o.path) for o in outfiles if os.path.isfile(o.path)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very slick!

@petrelharp
Copy link
Contributor

Gee, this is great! Looks good. (And, sorry I got confused about the file names before I read your whole comment. :sheepish:)

There's a FIXME because I'm not sure where in the code we should normally set the random seed if these tests are all parallelized.

I'm not sure either.

No other concerns! I do wonder if passing in the TREES_FILE and PEDIGREE_FILE is unnecessary complexity, since we could just pass in OUTDIR instead, but no big deal.

@hyanwong
Copy link
Member Author

Gee, this is great! Looks good. (And, sorry I got confused about the file names before I read your whole comment. :sheepish:)

No probs. I'll try to get the whole thing working then.

There's a FIXME because I'm not sure where in the code we should normally set the random seed if these tests are all parallelized.

I'm not sure either.

I'll leave it as a FIXME for the time being then, and maybe ask Ben.

No other concerns! I do wonder if passing in the TREES_FILE and PEDIGREE_FILE is unnecessary complexity, since we could just pass in OUTDIR instead, but no big deal.

If we explicitly pass the full file names, we only define them in one place. Otherwise we need to decide on a filename (e.g. out.trees) in SLiM, and redefine the same names in the python code too (and make sure they keep in sync). And we would also need to do the filename munging in SLiM. And the code to create the dir+filename in SLiM would be repeated in all the recipes, so I suspect it would result in more code duplication.

@petrelharp
Copy link
Contributor

Oh - about the seed - I think I do know the answer. The test set-up doesn't do any random stuff, so it doesn't need a seed set. And all the randomness we do in tests should have the seed specified, like

def test_whatever( ):
  random.seed(1234)
  a = random.sample(x)
  ...

... as we do in tskit. So no need to put it there. (It's possible that some of my tests don't set hte seed, but they should.)

@hyanwong hyanwong force-pushed the refactor-tests branch 4 times, most recently from ed88aca to cf8d4db Compare January 26, 2021 01:24
@hyanwong hyanwong marked this pull request as ready for review January 26, 2021 01:24
@hyanwong
Copy link
Member Author

Hoorah! The tests all seem to be passing now. You can fully parallelise all the tests & recipes now, SLiM simulation output is now all saved to tmp dirs (but only run once each, even when multi-threaded, I hope), and there's a slick way to call the restarted tests while making sure that the sims on which they depend have been run:

    @pytest.mark.parametrize(
        'restart_name, basic_recipe', restarted_recipe_eq("no_op"), indirect=["basic_recipe"])

Then use the basic_recipe["ts"] which you are now sure is available, and run SLiM on the output. The fancy stuff is all done in restarted_recipe_eq() at the bottom of recipe_specs.py. I can explain how it works if it's not clear.

Weirdly the restart_and_run_WF.slim and restart_and_run_nonWF.slim seem not to have slim recipes in tests/examples, so I've commented them out for the time being.

The pytest command will also now work from whatever dir you want to run it in.

Ready for final review, I reckon @petrelharp . Then I can try to move on to the "retain"ing PR.

@hyanwong
Copy link
Member Author

P.s. there's not much left in __init__.py - it's mostly been moved to conftest.py. In fact, some of the stuff in __init__.py like assertTablesEqual can probably be swapped for functions which are now built in to tskit (I think?)

@codecov-io
Copy link

codecov-io commented Jan 26, 2021

Codecov Report

Merging #131 (b52d1c4) into main (04ab7ea) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #131   +/-   ##
=======================================
  Coverage   89.39%   89.39%           
=======================================
  Files           5        5           
  Lines         886      886           
  Branches      166      166           
=======================================
  Hits          792      792           
  Misses         64       64           
  Partials       30       30           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04ab7ea...b52d1c4. Read the comment docs.


def test_load_without_provenance(self):
@pytest.mark.parametrize(
'restart_name, basic_recipe', restarted_recipe_eq("no_op"), indirect=["basic_recipe"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what's indirect do? Makes it cached? The docs are kinda cryptic...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it feeds the parameter value through the fixture function. So passing ("restart_WF.slim", "recipe_WF.slim") to "restart_name, basic_recipe", ..., indirect=["basic_recipe"] will call basic_recipe("recipe_WF.slim") and return that as the basic_recipe value in the test, whereas the restart_name parameter in the test function just contains the plain value "restart_WF.slim"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh!! gotcha.

tests/test_metadata.py Outdated Show resolved Hide resolved
@petrelharp
Copy link
Contributor

Wow, that was a lot of work! And, it works!! Thanks for the wonderful solutions, not to mention the cleaning up after me in a few places. I have been over this with a fairly coarse comb, and have muddled through these fixture things enough to convince myself it's doing all the stuff it did before. (although I may need you to explain how it all works sometime soon, while it's fresh...) I only found two very minor changes in tests/examples/recipe_nonWF.slim, and one multiply-defined test; otherwise I'm happy to merge.

(And, I've done a diff of the the tests run before and after this PR, and other than that multiply-defined test, they're all being run!)

@hyanwong
Copy link
Member Author

Wow, that was a lot of work! And, it works!!

🎉

I may need you to explain how it all works sometime soon, while it's fresh...

Yeah, it's not immediately obvious, sadly. But I think that's OK, the intention is clear to a reader.

I only found two very minor changes in tests/examples/recipe_nonWF.slim, and one multiply-defined test; otherwise I'm happy to merge.

Done. Merge away!

(And, I've done a diff of the the tests run before and after this PR, and other than that multiply-defined test, they're all being run!)

Great. How do you do that, by the way?

@petrelharp
Copy link
Contributor

Yay!

I did

python3 -m pytest tests -n 1 -v &> temp_old
git checkout refactor-tests
python3 -m pytest tests -n 1 -v &> temp_new
diff <(cat temp_old | grep "^tests" | sort | uniq) <(cat temp_new | grep "^tests" | sed -e 's/\[.*\]//' | sort | uniq)

It's not checking whether the tests are being run on exactly the same scripts, but all the tests are being run, and none are being run on the wrong scripts, so I think it's all good.

@petrelharp petrelharp merged commit 4fa846b into tskit-dev:main Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants