Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add py311, remove py37 #320

Merged
merged 11 commits into from
Jan 29, 2024
Merged

Conversation

jack-mcivor
Copy link
Contributor

Fixes #319

It is easiest to implement NEP-29. All current versions of numpy, scipy, scikit-learn are built for 3.8-3.11 (and do not support Python 3.7)

Pandas can be removed as it is a transitive dependency only.

np.bool was deprecated in NumPy 1.20.

@jack-mcivor
Copy link
Contributor Author

I could do with a bit of help on this one. Two tests failing:

============================================================================================== short test summary info ==============================================================================================
FAILED tests/test_distns.py::test_dists_runs_on_examples_logscore[learner0-T] - ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
FAILED tests/test_distns.py::test_dists_runs_on_examples_logscore[learner1-T] - ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

It looks like this is caused by an upgrade of scipy, although I'm not clear why.

  • scipy==1.7.3 (passes)
  • scipy==1.8.1 (passes)
  • scipy==1.9.3 (fails)
  • scipy==1.10.1 (fails)

It doesn't seem to matter what versions of numpy, scikit-learn are used. It seems like the failure mode is variable based on the random-seed, but it looks like it's down to failures in calculating the gradient (either due to a Singular matrix error or nans in the result).

There are in general quite a few errors when running the tests, not sure what to make of them.

@tim-x-y-z
Copy link

It doesn't seem to matter what versions of numpy, scikit-learn are used. It seems like the failure mode is variable based on the random-seed, but it looks like it's down to failures in calculating the gradient (either due to a Singular matrix error or nans in the result).

I've had that issue for a while with multi-variate normal distribution - in my case i turned the natural_gradient off and this issue seemed to be going out. Not sure if relevant at all but thought i'd share.

I can try to have a deeper look later today !

@ryan-wolbeck
Copy link
Collaborator

I agree with removing support for 3.7 as it's end of like for Python as well. I'm working to get through some other PRs and I'll work to help figure out the failures here. Thanks for the PR

@ryan-wolbeck
Copy link
Collaborator

ryan-wolbeck commented Sep 8, 2023

@jack-mcivor can you re-run checks here? Logs have expired, would like to see the errors. Thanks

@jack-mcivor
Copy link
Contributor Author

@ryan-wolbeck those logs should be available now

@ryan-wolbeck
Copy link
Collaborator

ryan-wolbeck commented Oct 18, 2023

Here's what I'm working on testing here for an update:

  • The gradient computation for the T distribution with LogScore is leading to a singular matrix error. I'm testing a change to add a regularization term to the diagonal of the matrix, ensuring it's non-singular.
  • The target values (y) contain NaN values in the tests. Working on a change to handle the null values.. something like
# Check for NaN values in y_train and handle them
nan_mask = np.isnan(y_train)
if np.any(nan_mask):
    X_train = X_train[~nan_mask]
    y_train = y_train[~nan_mask]

ngb.fit(X_train, y_train)
  • I'm also looking at the runtime overflow warnings. Testing using stable functions to avoid numerical instability. By constraining the input to the exponential function we could ensure that the output remains within a range that can be accurately represented by typical floating-point numbers, making the computation "stable" in the sense that it avoids numerical issues.

I'm planning to work through these updates this week and will hopefully push some code here before the end of the week.

@ryan-wolbeck
Copy link
Collaborator

@jack-mcivor I've been working through tests on the versions of SciPy and supported python versions here https://github.com/stanfordmlgroup/ngboost/tree/pr-320 though I've not been successful even when pushing SciPy beyond 1.8.1 which requires > python 3.8 and still run into the the same issues and seem to be stuck.

In order to keep moving forward. I think we should create a new PR to just fix the issue with np.bool so that we can push an update to fix those issues for usability, once that is merged into main I then think we should work towards this removal of 3.7 (and maybe 3.8 but about 15% of downloads come from 3.8 still). I'm worried with this PR that we are going to struggle to find the root cause though I think you are likely right that it's a SciPy issue.

Would you mind creating another PR just to address the np.bool issue since it should already be within the code you added here? If you don't have time I can also work towards splitting it out myself as well.

@DumasCharles
Copy link

Hello,

Thank you for your work on NGBoost, and answering my comment on #336 😊

As mentioned in this issue, the bug with np.bool prevents from fitting a model with an Exponential distribution, which can be a major blocker... As you suggest, would it be possible to open a PR specific to this bug?
I'm happy to create the PR if that helps.

@jack-mcivor
Copy link
Contributor Author

#337 is to fix the bool deprecation

@jack-mcivor
Copy link
Contributor Author

@ryan-wolbeck it's been a while since I looked at this, but is there anything else we can do?

I think the failing test is not specific to Python versions at all, but scipy? According to the current constraints, a user could install scipy>=1.9 and get failing tests. So, I think we should xfail this test, or if it is a real problem, then constrain scipy<1.9.

What do you think?

@ryan-wolbeck
Copy link
Collaborator

@jack-mcivor lets merge the latest changes from master so we can build the tests faster. From there I'll pull this again and test out the versions to dig in further

@jack-mcivor
Copy link
Contributor Author

@ryan-wolbeck done, still seems to exhibit the same behaviour. A minimal & fast reproducer is

pytest 'tests/test_distns.py::test_dists_runs_on_examples_logscore[learner0-T]'

@ryan-wolbeck
Copy link
Collaborator

@jack-mcivor

The error message indicates that a numpy.linalg.LinAlgError: Singular matrix error occurred. This error is raised when a function in the numpy.linalg module, such as numpy.linalg.solve, is applied to a singular (or non-invertible) matrix.

In this case, the error occurred in the grad method of the Score class in ngboost/scores.py. This method tries to solve a linear system using the numpy.linalg.solve function, which fails if the input matrix is singular.

The test_dists_runs_on_examples_logscore test is failing because it's trying to fit an NGBRegressor with a T distribution, which seems to be causing the singular matrix error.

I think you are right that we should just allow this T in this case to fail (skip the test). I tested an implementation here https://github.com/stanfordmlgroup/ngboost/tree/pr-320-2 and got things to build. I did play around with the versions a bit as well but I don't think that's actually necessary to get the xfail to allow the failure to occur due to the singular matrix error.

Can you push a similar implementation here?

@alejandroschuler do you have any concerns with us allowing this case to fail in order to support 3.11?

@alejandroschuler
Copy link
Collaborator

@ryan-wolbeck sure let's let that test fail, nbd

@jack-mcivor
Copy link
Contributor Author

@ryan-wolbeck thanks, that should be ready now

@ryan-wolbeck ryan-wolbeck merged commit c482aab into stanfordmlgroup:master Jan 29, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python 3.11 support
5 participants