Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support progress bar in parallel forecast #849

Merged
merged 3 commits into from
May 30, 2024
Merged

support progress bar in parallel forecast #849

merged 3 commits into from
May 30, 2024

Conversation

jmoralez
Copy link
Member

@jmoralez jmoralez commented May 29, 2024

Changes the forecast method to use a concurrent.futures.ProcessPoolExecutor which processes series one at a time instead of chunks of them. This allows two things:

  • We can now have a progress bar because we can update it as each serie is processed.
  • The work can be better balanced since we previously assumed that each partition would take about the same time, but that wasn't always true in practice, so some CPUs would become idle as some partitions were done while others were still being processed. This means that the process can now be faster, even though I feared the serialization would become a bottleneck it doesn't seem like it.

I ran the following experiment in a c5.4xlarge instance to compare the current version versus the one in this PR to verify that there isn't a slowdown:

Click to expand
from datasetsforecast.m4 import M4, M4Info

from statsforecast import StatsForecast
from statsforecast.models import AutoETS

group = 'Hourly'
series, *_ = M4.load('data', group=group)
series['ds'] = series['ds'].astype('int64')
h = M4Info[group].horizon
sf = StatsForecast(
    models=[AutoETS(season_length=24)],
    freq=1,
    n_jobs=16,
    verbose=True,
)
sf.forecast(df=series, h=h, level=[80, 90], fitted=True)

and it took 1min 36s (current main) vs 1min 32s (this PR).

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jmoralez jmoralez marked this pull request as ready for review May 30, 2024 00:28
@jmoralez jmoralez merged commit 625460a into main May 30, 2024
16 of 17 checks passed
@jmoralez jmoralez deleted the multiproc-pbar branch May 30, 2024 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant