-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/historical retrain on condition #1139
Feature/historical retrain on condition #1139
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quite good, thanks! I have a couple of comments.
In the case of ``int``: the model is retrained every `retrain` iterations. | ||
In the case of ``Callable``: the model is retrained whenever callable returns `True`. | ||
Notice that the arguments passed to the callable are as follows: | ||
- `pred_time (pd.Timestamp)`: next timestamp to predict (retraining happens before) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `pred_time (pd.Timestamp)`: next timestamp to predict (retraining happens before) | |
- `pred_time (pd.Timestamp or int)`: timestamp of forecast time (end of the training series) |
In the case of ``Callable``: the model is retrained whenever callable returns `True`. | ||
Notice that the arguments passed to the callable are as follows: | ||
- `pred_time (pd.Timestamp)`: next timestamp to predict (retraining happens before) | ||
- `series (TimeSeries)`: train series up to `pred_time` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling it train_series
?
This parameter supports 3 different datatypes: ``bool``, (positive) ``int``, and | ||
``Callable`` (returning a ``bool``). | ||
In the case of ``bool``: retrain the model at each step (`True`), or never retrains the model (`False`). | ||
Not all models support setting `retrain` to `False`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move these two sentences down in the retrain
doc? And slightly modify it, e.g.,
Note: some models do require being retrained every time
and do not support anything else than `retrain=True`.
@@ -336,3 +338,43 @@ def test_statsmodels_dual_models(self): | |||
# check backtesting with retrain=False | |||
model: TransferableDualCovariatesForecastingModel = model_cls(**kwargs) | |||
model.backtest(series1, future_covariates=exog1, retrain=False) | |||
|
|||
def test_backtest_retrain(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also add a test to make sure retrain
is being called as expected (with expected arguments, e.g., past/future covariates etc) when it is a Callable
. I think this should be pretty easy to do with unittest.mock
, you can check out examples here: https://docs.python.org/3/library/unittest.mock.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to use Mock
, yet implemented a test with "fake" train_series
, past_covariates
and future_covariates
all equal to the series itself, but testing on some "proper" condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also work, however your current test does not test correctness, or even that the function has been called. If you want to do it this way (using conditions on the series within the function), I would maybe cook a small toy example where the result of the backtest is not the same depending (for instance) on the quantile quantities computed on the covariates series. This way you could then test that
- The output of the backtesting procedure is not the same if different
future_covariates
are used - The output is as you expect in some pre-computed case (e.g., the historical forecasts is a series containing values [x, y, z]).
That being said, I still think it might be easier to simply test whether the function is called by using mocking. You can find an example of how we are doing it in other tests, for instance here, where we then check here whether a given function has been called or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to explain myself: it's easy to mock a retrain function (i.e. a callable returning a boolean), but the function itself is called at each iteration (and I am able to check for that).
I would be more interested to verify that _fit_wrapper
method is called multiple times.
However when trying to patch/mock _fit_wrapper
and _predict_wrapper
, I get all sort of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hrzn do you have any new suggestion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then if mocking does not work my suggestion would be to at least test that the output of model_cls.backtest()
is what you expect it to be for a couple of simple cases (e.g., when you retrain each 1st of the month).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally I was able to make the test run using mock.patch
: few tricks were needed. Looking forward to the review.
…drop_after' calls, added test with past and future covariates
Codecov ReportBase: 93.61% // Head: 93.61% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #1139 +/- ##
=======================================
Coverage 93.61% 93.61%
=======================================
Files 81 81
Lines 8328 8330 +2
=======================================
+ Hits 7796 7798 +2
Misses 532 532
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the tests look better with the mocking!
There were a couple more comments to address and then I think we will be able to merge. Also please solve the merge conflicts first.
# resets patch_retrain_func call_count to 0 | ||
retrain.call_count = 0 | ||
retrain.side_effect = [True, False] * (len(series) // 2) | ||
# retrain.return_value = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line can be removed
Added os, shutil and tempfile imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I just have 4 tiny commit suggestions and then I think we'll be good.
Co-authored-by: Julien Herzen <j.herzen@gmail.com>
…del.py Co-authored-by: Julien Herzen <j.herzen@gmail.com>
…del.py Co-authored-by: Julien Herzen <j.herzen@gmail.com>
…del.py Co-authored-by: Julien Herzen <j.herzen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @FBruzzesi ! I'll merge it :)
Addresses #135 and #623
Summary
Implemented support for new types in
retrain
argument ofhistorical_forecasts
method. Now it acceptsbool
, (positive)int
andCallable
(returning abool
) data types.The new behaviour is as follows (copying from the updated docstring):
bool
: retrain the model at each step (True
), or never retrains the model (False
).Not all models support setting
retrain
toFalse
.Notably, this is supported by neural networks based models.
int
: the model is retrained everyretrain
iterations.Callable
: the model is retrained whenever callable returnsTrue
.Notice that the arguments passed to the callable are as follows:
pred_time (pd.Timestamp)
: next timestamp to predict (retraining happens before)series (TimeSeries)
: train series up topred_time
past_covariates (TimeSeries)
: past_covariates series up topred_time
future_covariates (TimeSeries)
: future_covariates series up topred_time + series.freq * forecast_horizon
Other Information
_retrain_wrapper
, inutils/utils.py
, which passes to the wrapped function only the original signature arguments, and raises aValueError
in the case that the provided function doesn't return a boolean value.test_backtest_retrain
, intest_local_forecasting_models.py
as suggested in #623