Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/output chunk length regression model #761

Merged
merged 30 commits into from
Feb 5, 2022

Conversation

brunnedu
Copy link
Contributor

Addresses #618

Summary

RegressionModels:

  • Implemented output_chunk_length parameter for all RegressionModels allowing for much more flexible prediction with less covariate data
  • deleted now unused functions (_get_prediction_data, _shift_matrices, _update_min_max)
  • turned lags into a dictionary for simpler code
  • updated check of covariate input dimensions
  • updated checks for enough covariate data (now the requirements are minimal and can differ between covariate types)
  • only wraps model with MultiOutputRegressor if it is necessary and model doesn't support multi-output regression natively
  • doesn't rely on torch datasets anymore

RegressionModel Tests:

  • Deleted now unused tests (test_models_denoising_multi_input, test_models_denoising, test_shift_matrices)
  • Added new tests (test_models_accuracy_*, test_multioutput_wrapper)
  • Added checks for output_chunk_length=1 and output_chunk_length=5 in some already existing tests
  • Updated test_not_enough_covariates to check if minimal covariates are required

Other Information

  • changed the __str__ of LinearRegressionModel, RandomForest, LightGBMModel
  • should we consider deleting the LinearRegressionModel, since it behaves the same as RegressionModel(model=None)?

@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2022

Codecov Report

Merging #761 (770b63a) into master (c2d91e0) will decrease coverage by 0.10%.
The diff coverage is 97.40%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #761      +/-   ##
==========================================
- Coverage   90.77%   90.67%   -0.11%     
==========================================
  Files          67       67              
  Lines        6756     6711      -45     
==========================================
- Hits         6133     6085      -48     
- Misses        623      626       +3     
Impacted Files Coverage Δ
darts/models/forecasting/random_forest.py 100.00% <ø> (ø)
darts/models/forecasting/regression_model.py 96.61% <97.26%> (-1.14%) ⬇️
darts/models/forecasting/gradient_boosted_model.py 100.00% <100.00%> (ø)
...arts/models/forecasting/linear_regression_model.py 100.00% <100.00%> (ø)
...ts/models/forecasting/regression_ensemble_model.py 100.00% <100.00%> (ø)
darts/utils/data/inference_dataset.py 94.54% <0.00%> (-1.82%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2d91e0...770b63a. Read the comment docs.

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome work @brunnedu, thanks!
You are bringing our RegressionModels to the next step! 🚀

f"`lags_future_covariates`: {regression_model.lags_future_covariates}."
),
# check lags of the regression model
raise_if_not(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

and self.model._get_tags().get("multioutput")
):
# if not, wrap model with MultiOutputRegressor
self.model = MultiOutputRegressor(self.model, n_jobs=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder there could be some cases where n_jobs > 1 might provide significant gains...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I will test it out 👍

input_chunk_length=max(0, -self.min_lag),
output_chunk_length=1,
)
if cov.has_datetime_index:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, it hurts that we have to differentiate the two cases here 😱

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, maybe it would be nice to have a slice_inclusive in TimeSeries that always includes the start and stop for both datetime and integer indices? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes maybe. I think our slicing is at least consistent with Pandas, but it's not the first time we find this a bit frustrating.

@brunnedu brunnedu merged commit 0e5b5ad into master Feb 5, 2022
@madtoinou madtoinou deleted the feat/output-chunk-length-regression-model branch July 5, 2023 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants