Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve pooled progress output for BatchDownloader #12925

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cosmicexplorer
Copy link
Contributor

Split out from #12923.

Problem

We have a really nice BatchDownloader for pooled downloads from metadata-only resolves, but it currently does the same thing as the synchronous Downloader! We have more information about what we need to download, so let's display it to the user and give them an idea of overall progress!

Note: this PR does not perform parallel downloads. That is performed in #12923, on top of this change.

Solution

  • Make --progress-bar correspond to a new enum class ProgressBarType.
  • Create a BatchedProgress abstract class to codify how to respond to async notifications of multiple parallel download tasks instead of a single download task.
    • Create a BatchedRawProgressBar to print log messages for individual subtasks as well as a repeated (rate-limited) log message for overall progress in downloading all bytes.
    • Create a BatchedRichProgressBar using the full gamut of the vendored rich library to produce colorful and useful output for multiple parallel download tasks.

Result

We have a nice progress bar for multiple downloads at once!

# delete wheel cache and http download cache
> rm -rf ~/.cache/pip/{http{,-v2},wheels}/
> time PYTHONPATH="$(readlink -f src)" python -m pip install --dry-run --ignore-installed --report test.json --use-feature=fast-deps 'numpy>=1.19.5' 'keras==2.11.0' 'mtcnn' 'pillow>=7.0.0' 'bleach>=2.1.0' 'tensorflow-gpu==2.11.0'
# ...
╭────────────────────────────────────────── Download Progress ───────────────────────────────────────────╮
│ total downloads | ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   2%  1/46                                  │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────── Individual Request Progress ──────────────────────────────────────╮
│total bytes             ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3% 19.5/734.1 MB 6.7 MB/s eta 0:01:47│
│tensorflow_gpu-2.11.... ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   3% 17.8/588.3 MB 6.4 MB/s eta 0:01:30│
│numpy-2.1.0-cp310-cp... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/16.3 MB   ?        eta -:--:--│
│mtcnn-0.1.1-py3-none... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/2.3 MB    ?        eta -:--:--│
│pillow-10.4.0-cp310-... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/4.5 MB    ?        eta -:--:--│
│bleach-6.1.0-py3-non... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/162.8 kB  ?        eta -:--:--│
│absl_py-2.1.0-py3-no... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/133.7 kB  ?        eta -:--:--│
│astunparse-1.6.3-py2... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/12.7 kB   ?        eta -:--:--│
│flatbuffers-24.3.25-... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/26.8 kB   ?        eta -:--:--│
│gast-0.4.0-py3-none-... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/9.8 kB    ?        eta -:--:--│
│google_pasta-0.2.0-p... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/57.5 kB   ?        eta -:--:--│
│grpcio-1.65.5-cp310-... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/5.7 MB    ?        eta -:--:--│
│h5py-3.11.0-cp310-cp... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/5.3 MB    ?        eta -:--:--│
│libclang-18.1.1-py2.... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/24.5 MB   ?        eta -:--:--│
│opencv_python-4.10.0... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/62.5 MB   ?        eta -:--:--│
│opt_einsum-3.3.0-py3... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/65.5 kB   ?        eta -:--:--│
│protobuf-3.19.6-cp31... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/1.1 MB    ?        eta -:--:--│
│six-1.16.0-py2.py3-n... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/11.1 kB   ?        eta -:--:--│
│tensorboard-2.11.2-p... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0.0/6.0 MB    ?        eta -:--:--│
# ...

- use more specific types for BatchDownloader#__call__
- calculate byte lengths with a HEAD request
- quiet all progress output from -q
- don't write colored output with --no-color
- write a lot more documentation for the new progress bar logic
- use ProgressBarType enum for --progress-bar CLI flag
@cosmicexplorer cosmicexplorer marked this pull request as ready for review August 20, 2024 18:03
@notatallshaw
Copy link
Member

notatallshaw commented Aug 21, 2024

I've not looked at the code at all, but I have some thoughts on the user experience.

I think what makes the current pip downloader progress bar really portable and not garnered too many user complaints is that it's easy to read, doesn't repeat information, and almost never suffers from "reflow" issues.

Taking a look at what that means in practise compared to your screenshot:

image

  • Each column is a different colour, which make it easier to distinguish
  • Outside the progress bar there's no duplicate information, where as you have % and X MB / Total MB
  • There's no layout lines (which will become a mess on reflow)

I personally would advise:

  • Add colours back
  • Remove table lines
  • Remove % column
  • Increase the length of the filename columns a little bit and decrease the length of the bar a little bit
  • Add column headers, and move "eta" to a column header
  • Perhaps Max download / size shoud be maxed to 4 digits including decimal, e.g. display 588 MB instead of 588.3 MBs

@arenasys
Copy link
Contributor

arenasys commented Aug 23, 2024

The batched raw outputs should match the existing raw ones. Percentage isn't needed, raw mode is intended to supply unprocessed information (for wrappers).
Progress {pcnt}% {self._total_progress} of {total_fmt} bytes
vs
Progress {current} of {total_fmt} bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants