feat: migrate to `nox`, run tests in parallel, revamp test infra #632

agriyakhetarpal · 2024-08-24T22:33:01Z

Description

This PR adds several changes to the testing infrastructure:

Migration from tox's DSL to Pythonic nox, with three sessions: one for linting, one for tests, and one to build a source distribution + wheel and validate them. This is in line with the earlier tox environments, with a few minor changes, and a major one in that SciPy is installed as a dependency while testing instead of having a separate environment for it. The linting session hasn't been activated yet because that will raise several warnings to fix throughout the codebase.
The workflow run now cancels pending/running jobs when new commits are pushed, thereby saving time on repeated pushes. This is enabled through the concurrency: key.
The original testing configuration (as it was before this PR) is retained – without having to have tox -e test and tox-e test-scipy. The tests for macOS and Windows for PyPy are excluded from the GHA matrix instead, because SciPy does not yet provide wheels for PyPy and installing it from source is lengthy (and while still easy, ever so slightly trickier on Windows and macOS as compared to Linux – we could set up a Miniconda environment to test them if needed).
uv is used in the nox sessions and in CI, using https://github.com/yezz123/setup-uv. We don't need any extra caching for our dependencies, because it is faster to download them rather than to retrieve them from a cache.
A [test] set of extras has been added, with a minimal set of test dependencies.
We now use pytest-xdist to run tests in parallel, which completes the tests in 4 seconds on my machine, compared to ~35 seconds in serial mode. This is a ~9x speedup! We can notice similar gains in CI.
I noticed that some tests failed intermittently, which suggested that the testing was not being performed in a deterministic fashion. I added a fixed NumPy random seed as a pytest fixture that runs before every test. This can be improved further if we make more changes to and refactor the tests in the future.
nox -s nightly-tests uses environment variables from uv to download nightly versions of NumPy (and SciPy where applicable) and installs them before testing. Jobs have been included for this purpose in the CI matrix.

There are some more changes needed across the codebase, such as fixing some missing coverage and either using CodeCov or just using an HTML coverage report (see https://hynek.me/articles/ditch-codecov-python/), so that the coverage tracking from the tests is actually made use of. Some of the missing coverage came from things that were associated with Python 2 code, though, so it might be good to remove those entirely once we get to them as a part of linting and autoformatting the entire codebase.

Closes #631 and related to #630

agriyakhetarpal · 2024-08-24T22:49:40Z

I now see why the PyPy tests were taking a long time – SciPy is being built from the source distribution because it doesn't publish wheels for PyPy, while NumPy does, of course (hence I'm myself answering the question I had). To keep CI time low, I can add another nox session for PyPy (edit: or I'll just keep them in the same session, but maybe add a separate job for better separation of logs?)

agriyakhetarpal · 2024-08-24T23:14:44Z

It looks like some of the tests have not been entirely deterministic, and failures with differences in precision between macOS Intel vs M-series devices were being reported (only in CI, though, I didn't face any of them locally). Setting a fixed random seed seems to have resolved the problem. This PR enables testing against both Intel and ARM macOS platforms, which we can keep until Intel devices surpass their EoL date.

agriyakhetarpal · 2024-08-28T13:45:23Z

Just added some additional changes that I had missed – this is ready for review whenever you have the time, @fjosw and @j-towns! I separated out the SciPy tests so that they run on non-PyPy runners in 19dab2a, so that we get faster CI. The pre-commit check will fail until #634 gets merged, so we can ignore that safely.

agriyakhetarpal · 2024-08-28T16:49:15Z

I have also added tests against NumPy and SciPy nightlies. Both of them support Python 3.10–3.13, which explains the failure across platforms for Python 3.8 and 3.9. We will be dropping 3.8 sometime soon in October anyway, but in the meantime, we could explore ways to make them fail more gracefully somehow or perhaps limit our nightly testing against the Python versions that are supported or the latest Python version. Please let me know your thoughts here.

fjosw

Hey @agriyakhetarpal, I have no experience with nox but I skimmed your code and things look good to me! Do you already have feeling for how robust the nightly setup is? I think for a project like this, which is mostly in maintenance mode, we would want to prevent too many "false positives" from the nightly tests.

agriyakhetarpal · 2024-08-30T19:02:33Z

Thanks for going through the changes, @fjosw! After all, nox is essentially the same as tox, but the Pythonic API might have made it at least slightly easier to read and maintain your own sessions, I hope!

I'm not sure if I have a robust answer for how robust the nightly tests will be – I think it depends a lot on our availability to ensure that we can land fixes at the right time since breakages will be critical to our users. At the same time, if we implement an upper pin on NumPy, it could, in rare circumstances, conflict with our ability to run the nightly tests, too. Say, if we have a pin on NumPy, say, <2.4, and NumPy 2.4.0-dev wheels get published in the nightly indices, we won't be able to install those wheels because the dependency resolution will fail with an unsatisfied requirement. If we bump the pin to NumPy <2.5, we run into the risk of having a new version of NumPy release on PyPI sometime later and breaking our code (if we aren't able to notice and fix it in time). It is a very low-risk situation overall, though, since we are both actively making changes to the repository, plus, NumPy makes it easier for us by deprecating things with warnings at least two releases before they get removed, and we as downstream maintainers are urged by them to try out the (one, and in some cases, two) release candidates once and when they are available.

However, without a robust refactoring of the original code, we can expect that there will be problems in the coming months and years. For example, if we see that something breaks, say, in NumPy 2.3.0 in their nightly wheels when we are in the NumPy 2.2.0 stable stage, we'll need to add some code patterns similar to this:

if _np.lib.NumpyVersion(_np.__version__) >= "2.3.0":
    # some logic
elif _np.lib.NumpyVersion(_np.__version__) >= "2.2.0" and _np.lib.NumpyVersion(_np.__version__) < "2.3.0":
    # some other logic
else:
    # yet some more logic

which are more code smells than code patterns, and will surely not be maintainable for us. At the same time, we don't want a lot of reds to appear in our CI either (and not too many if they do). Moreover, it is too early to support NumPy>2, as we discussed in #628 (review).

With that extra context, I'm open to more suggestions on how we can improve our testing, since I'm out of ideas :) We still need to add some testing against the minimum supported NumPy version (1.X) and decide on a lower bound, which I feel can be solved using uv pip install --resolution lowest (see https://docs.astral.sh/uv/concepts/resolution/#lower-bounds). I will be doing it in another PR, of course.

I just triggered CI again with the updated config, and while I can disable the current nightly tests for Python 3.8 and 3.9 since they don't have wheels (we'll drop 3.8 soon in October, anyway), I wonder if you would have suggestions on actual test failures? It could be a NumPy bug or a bug in our own codebase, but I'm not confident about it. I thought it was flaky and re-triggered that test in CI, but that did not seem to help, it fails with the same values.

agriyakhetarpal added 5 commits August 25, 2024 03:53

Ignore .nox folder

4e86511

Create an initial draft for the Noxfile

7c77f3c

Add a [test] extra, add parallel testing

1d08e46

Use nox, setup uv, bump actions' versions

1a31a22

Use uv tool runner for tests

b8f1210

agriyakhetarpal added 10 commits August 25, 2024 04:19

Add tests-scipy session

5825a1d

Modify condition for SciPy-Linux-PyPy

ade79ef

Bump pytest's verbosity

39b7e33

Make tests deterministic

6dd5109

Add Fortran compiler for SciPy on macOS

bcd01bb

Remove tests-scipy session, make it simpler

f2e11f3

Fix PIP_DISABLE_PIP_VERSION_CHECK

6189449

Cancel tests when new commits are pushed

8d0ebc9

Refactor tests and prerequisites job steps order

70f63ca

Fix a typo for macOS installations

a60facd

agriyakhetarpal added 2 commits August 25, 2024 04:45

Install pkg-config for macOS OpenBLAS detection

8d89d69

Retain the original testing configuration

f1e4a7d

agriyakhetarpal marked this pull request as ready for review August 24, 2024 23:55

agriyakhetarpal requested a review from fjosw August 24, 2024 23:55

Allow passing custom files to test

ebf3d2f

agriyakhetarpal changed the title ~~WIP: migrate to nox, parallel tests, revamp test infra~~ migrate to nox, run tests in parallel, revamp test infra Aug 25, 2024

agriyakhetarpal referenced this pull request Aug 27, 2024

Switch from tox to hatch

359603f

agriyakhetarpal added 5 commits August 28, 2024 18:33

Comment out some lines, add some comments

503a637

Add back Python 3.8

3886083

Replace checks.yml with nox equivalents

0844201

Delete tox configuration file

b797a91

Replace all instances of tox with nox

bb3067d

agriyakhetarpal changed the title ~~migrate to nox, run tests in parallel, revamp test infra~~ feat: migrate to nox, run tests in parallel, revamp test infra Aug 28, 2024

agriyakhetarpal requested a review from j-towns August 28, 2024 13:22

agriyakhetarpal added 4 commits August 28, 2024 18:56

Add --cov-append to update coverage

667b405

Split out CPython and PyPy tests

19dab2a

Cancel intermediate jobs

ca8faef

Fix nox invocation for PyPy

694517f

agriyakhetarpal added 5 commits August 28, 2024 19:16

Fix pytest -k invocation

414f8ef

Un-silence dependency installations

ad98e43

Re-create fresh environments every time

5eb25be

Implement testing against NumPy, SciPy nightlies

3df0f28

Document how to run nightly tests

236e2be

fjosw approved these changes Aug 30, 2024

View reviewed changes

agriyakhetarpal added 3 commits August 30, 2024 23:30

Merge master

391b976

Use pre-commit for a "lint" nox session

c39b1b0

Fix style failures

d602a5e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: migrate to `nox`, run tests in parallel, revamp test infra #632

feat: migrate to `nox`, run tests in parallel, revamp test infra #632

agriyakhetarpal commented Aug 24, 2024 •

edited

Loading

agriyakhetarpal commented Aug 24, 2024 •

edited

Loading

agriyakhetarpal commented Aug 24, 2024 •

edited

Loading

agriyakhetarpal commented Aug 28, 2024

agriyakhetarpal commented Aug 28, 2024

fjosw left a comment

agriyakhetarpal commented Aug 30, 2024

feat: migrate to nox, run tests in parallel, revamp test infra #632

Are you sure you want to change the base?

feat: migrate to nox, run tests in parallel, revamp test infra #632

Conversation

agriyakhetarpal commented Aug 24, 2024 • edited Loading

Description

agriyakhetarpal commented Aug 24, 2024 • edited Loading

agriyakhetarpal commented Aug 24, 2024 • edited Loading

agriyakhetarpal commented Aug 28, 2024

agriyakhetarpal commented Aug 28, 2024

fjosw left a comment

Choose a reason for hiding this comment

agriyakhetarpal commented Aug 30, 2024

feat: migrate to `nox`, run tests in parallel, revamp test infra #632

feat: migrate to `nox`, run tests in parallel, revamp test infra #632

agriyakhetarpal commented Aug 24, 2024 •

edited

Loading

agriyakhetarpal commented Aug 24, 2024 •

edited

Loading

agriyakhetarpal commented Aug 24, 2024 •

edited

Loading