Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanning directories and loading animated images too slow #10

Closed
leo-arch opened this issue Jan 30, 2022 · 23 comments
Closed

Scanning directories and loading animated images too slow #10

leo-arch opened this issue Jan 30, 2022 · 23 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@leo-arch
Copy link

Description
First of all, great work. I'm really interested on it. Now, everything goes fine when you open a single file or several files using a glob expression (e.g. ~/Downloads/*.png). However, whenever you want to browse a directory, it takes too long (like 20 secs or more) to load the TUI viewer (while recursively scanning image files in the given directory). Once the TUI finally loads, it is also a bit slow whenever you hover a directory (a few seconds depending on the directory).

To Reproduce
term-img DIR or term-img DIR/*

Expected behavior
term-img should be quicker and smoother when recursively browsing directories

Desktop:

  • OS: Arch
  • Kernel version: Linux 5.15.12 (libre-hardened)

Package info:

  • Python version: 3.10.2
  • Package version: 0.1.1
  • Installation method: clone repo and then pip install

Terminal Emulator:

  • Name: Tested on Xterm (370) and Alacritty (0.10.0)
@leo-arch leo-arch added the bug Something isn't working label Jan 30, 2022
@AnonymouX47
Copy link
Owner

AnonymouX47 commented Jan 30, 2022

Thanks for the feedback. 😃

Concerning directory searching:

Since you didn't use the -r/--recursive option, then the search shouldn't take up to a second because all the search does is to identify at least one image file immediately in the directory and that's all.

Using the -r option, The duration of the search is dependent on:

  1. The amount of subdirectories.
  2. The amount of non-image files encountered.

This is due to the fact that it identifies image files by their content (just the header, first few bytes) and not by filename extension.

Concerning Loading directories in the TUI:

Note: The TUI itself doesn't take time to show up, it's loading (and partially rendering) the images that takes a while.

When the a directory is loaded, all files in the directory are checked and the non-images discarded.

Rendering only affects partially because only the image(s) currently in view is/are rendered.

I'll see what I can do about making the loading faster. Any ideas?

@leo-arch
Copy link
Author

leo-arch commented Jan 30, 2022

Thanks for your quick answer! Here's a more precise description of the issue:

I run: ./term-img ~/Downloads, and this is the output:

Checking directory '/home/user/Downloads/'...
... Done!

It gets stuck here for exactly 46 secs, taking 50% of my CPU.

Once it finally loads the TUI, it gets stuck there (hovering on Downloads/) for another 40 secs or so (taking 50% of the CPU again) before I can actually do something.

As to recommendations to make the loading thing faster, I didn't actually look at the code, so I cannot recommend anything useful. Maybe I could take a look at it to see what could be causing this issue.

EDIT: It seems to happen mostly with my Downloads folder. With other directories it works much faster (almost as expected, I'd say). My bet is that whatever image previewer you're using is having trouble dealing with some specific file type.

@leo-arch
Copy link
Author

Well, after some tests I conclude that the source of the issue is actually related to some file types. Previewing postscripts files could be a bit slow. In my case, the main obstacle was a really big GIF file (2187 frames!). I guess you could provide a command line switch to disable automatic previews for GIF files, or, even better, find a programmatic way to not automatically reproduce GIF files with more than, say, 100 frames.

@AnonymouX47
Copy link
Owner

AnonymouX47 commented Jan 30, 2022

Wow! That's a lot of time... It took way less than that for the entire DCIM directory (with thousands of images) on my (not recent) android device.

It seems to be an edge case like you've described. I'll try getting a sample image of that kind to test.

EDIT: I suppose the file will also be really large... among the planned features is an option to specify a maximum file size which I plan to set to a reasonable default.

@AnonymouX47
Copy link
Owner

AnonymouX47 commented Jan 30, 2022

As for the code responsible...

term_img.tui.main.scan_dir() scans a directory and creates Image widgets for every image in it and a generator for subdirectories.

term_img.cli.check_dir() handle the scanning of directories to determine if they contain images or subdirectories that to.

@leo-arch
Copy link
Author

Whichever way you find to tackle this issue will be fine.

This is a principle I follow whenever I write some new feature for my programs: no matter how cool I think it is, if it might noticeable reduce the program's performance (or somehow produce a negative impact on the user), just disable it by default and allow the user to enable it via some command line switch or option in the config file. Users get quickly disappointed and give up as soon as they experience some issue with your program, and that's something that we, as developers, need to prevent at all costs.

Please let me know if you find some way to improve this loading issue.

@AnonymouX47
Copy link
Owner

I really appreciate your quick reply... that's one golden principle.

I'm currently looking into it. 😃
Thank you very much.

@AnonymouX47 AnonymouX47 self-assigned this Jan 30, 2022
@AnonymouX47
Copy link
Owner

Well, after some tests I conclude that the source of the issue is actually related to some file types. Previewing postscripts files could be a bit slow. In my case, the main obstacle was a really big GIF file (2187 frames!). I guess you could provide a command line switch to disable automatic previews for GIF files, or, even better, find a programmatic way to not automatically reproduce GIF files with more than, say, 100 frames.

Any idea where I can get such a GIF?

@leo-arch
Copy link
Author

This is the big GIF file I was talking about:

https://mega.nz/file/x4p1ma4B#PpYiUtwWFl5kUKOkyGavV9lXLuD4qsHB9uLRs1rfZvM

@AnonymouX47
Copy link
Owner

Thanks

@AnonymouX47
Copy link
Owner

AnonymouX47 commented Jan 30, 2022

After investigating... turned out it's actually the large number of frames that's responsible.

During initialization, the number of frames is stored for subsequent use later. 👇🏾
https://github.com/AnonymouX47/term-img/blob/926b4d1433a7b572bdc6236b4642153787b14aad/term_img/image.py#L100

It's getting the number of frames (image.n_frames) that takes forever. (Note: n_frames is a descriptor not just a data attribute, so it actually executes code that returns the required value)
There's also some lesser delay when performing seek operations on the PIL Image instance, the magnitude of the delay depends on the distance between the current position and the new position.

Looking up the specification of the GIF format, there's no portion of it that specifies the number of frames in the image, so it has to be determined by reading almost all through the entire image.

This being the case, I don't think selecting GIFs to display based on frame count will work, since it's not possible without first determining the frame count.
I think a "max size" option is the only way I can see at the moment, since the file size should be proportional to the frame count and the file size is actually what causes the program to spend so long reading the file to determine the frame count.

As for other file types... I think a "max size" option and the existing "max pixels" option should suffice.

@AnonymouX47
Copy link
Owner

AnonymouX47 commented Jan 30, 2022

Concerning scanning directories to get all images in it... I'm considering some form of concurrency or parallelism where a separate thread or process preloads directories up to a certain depth (like a "look forward").

Then for the first directory or top-level (i.e containing the sources passed at the command-line), loading images could be gradual, checking the files starting with the smallest in size... and they get added to the menu/grid as they're ready.

Note: I haven't implemented any of these yet, still theoretical. 😃

@leo-arch
Copy link
Author

At least in the case of this huge GIF file, the size approach should work: its size is 8mb, a lot for a simple image file. However I'm not sure about what would be a sane default value for max size. A postscript file could take 3mb easily, and previewing it is quite CPU intensive. Maybe 1mb is a good default value.

@AnonymouX47
Copy link
Owner

Hmm... True. A sane default could be difficult to come by but in the end, the user could simply change the config value. 🤷🏾‍♂️
Maybe the size could apply only to certain image types. 🤔

@leo-arch
Copy link
Author

As to the idea of gradually loading/displaying images, I guess that's a nice approach, provided the loading process runs on a separate thread and the TUI stuff on another one. The process should be as smooth as possible, even if this implies not previewing some files at all. Maybe a message in the previewing panel warning the user about the file not being displayed and allowing him/her to forcefully load the image by means of some keyboard shortcut. Just thinking aloud.

@AnonymouX47
Copy link
Owner

AnonymouX47 commented Jan 30, 2022

Exactly the same thoughts here.

As for the "forced load", I already implemented something similar for images above the "max pixels" so I'll simply extend that.

@leo-arch
Copy link
Author

leo-arch commented Jan 30, 2022

Maybe the size could apply only to certain image types

Absolutely. Simple image files like PNG and JPG do not seem to cause any issue (as of now). For the time being, both GIF and postscript should be taken into account regarding max size.

@AnonymouX47
Copy link
Owner

Good. I'll approach it the problem this way then.

Thanks so much.

@AnonymouX47 AnonymouX47 changed the title Recursive search is too slow Scanning directories and loading animated images is too slow Feb 21, 2022
@AnonymouX47 AnonymouX47 changed the title Scanning directories and loading animated images is too slow Scanning directories and loading animated images too slow Feb 21, 2022
AnonymouX47 added a commit that referenced this issue Mar 5, 2022
- Fix: Eliminated delay when switching to directory entries with much image contents.
  - Addresses #10.
  - Switching between directory entries is now just as smooth as for image entries.
- Add: Implemented concurrent grid scanning and update.
  - Added `.tui.main.scan_dir_grid()`.
  - Added a spearate thread "GridScanner" for scanning grid directories.
- Change: `update_pipe` file descriptor is now exported to `.tui.main` and used from the module namespace.
- Change: Moved `update_screen()` from within `.tui.main.scan_dir_menu()` to the module scope to avoid code replication.
- Change: The directory is now scanned, and the grid updated, concurrently.
  - Updated `.tui.main.display_images()` to work with GridScanner.
  - Grid cells are added gradually, though quickly.
  - The grid is now always scanned afresh whenever the switching to a directory entry, even if the path is the same as that of the last grid.
    - This is to allow for smooth switching between directory entries, as initializing a grid from a previous scan result all at once will result in delays for large grids.
- Change: Menu listing speed-up.
  - When opening a directory entry, the menu list is initialized using the contents already scanned for the grid.
  - If the scanning is not yet done, the MenuScanner thread picks up from where it stopped.
- Change: `.tui.main.display_images()` now uses `content["/"]` to determine the emptiness of a directory's grid.
- Change: Updated `GridLisBox` re-rendering process.
  - The focus positions are now transffered as long as the grid direcory path is the same, even if the number of cells is different.
  - Ensure the focus position is not lost when a new cell is added by GridScanner.
- Change: Updated grid cell render caching.
  - Grid cell canvases are now cached using the name of the image in the cell instead of the image instance since:
    - all names in the same directory must be unique.
    - it supports the new grid directory scanning approach where a grid is always scanned afresh.
- Change: Updated docstrings and comments.
AnonymouX47 added a commit that referenced this issue Mar 5, 2022
- Change: Increased fuildity and responsiveness when switching menus.
  - Futher adresses #10.
- Change: Eliminated unnecessary waits/blocks.
  - Removed while loop blocks and replaced with `Event.wait()` where applicable.
- Change: MenuScanner is now a daemon thread.
  - Eliminates the need for `interrupted` and `quitting` event status checks while scanning.
- Change: Re-organized top-level assignments in `.tui.main`.
- Change: Updated comments.
AnonymouX47 added a commit that referenced this issue Mar 10, 2022
- Change: Caching of animated image frames in the TUI is now based on image file size instead of frame count.
- Adresses #10.

NOTE: This change is temporary and will be reverted as soon as a Pillow version including the improvements in python-pillow/Pillow#6077 is released.
@AnonymouX47
Copy link
Owner

AnonymouX47 commented Mar 14, 2022

Hello @leo-arch !

It's definitely been a while 😃 but the good news is... time hasn't passed in vain.

Here is an highlight of the performance-related fixes and improvements I've implemented since, as regards this issue:

For the library:

  • Deferred, on-demand computation of TermImage.n_frames [1].
  • Implemented an ImageIterator class for efficient iteration through frames of animated images, optionally with frame caching.
    • Works without calculating the number of frames first.
  • Improved animated image display performance using the new ImageIterator class.

For the CLI:

  • Parallel directory checks i.e multiple directory sources are checked at the same time, using multiprocessing.
  • Concurrent processing of file and URL sources, via multithreading, all happening in parallel with directory checks.
  • Animation frames are cached if the file size is <= 2MiB. [2]
  • Implemented logging across all subprocesses.

For the TUI:

  • Improved animated image display performance using the new ImageIterator class.
    • Frames are cached if the file size is <= 2MiB. [2]
    • Animation no longer resets upon resizing the screen.
  • Concurrent (multithreaded) directory scanning for menu lists and directory image grids.
    • Entries/cells are added to the menu/grid as available.
    • Menu scans continue from where the grid scan for that directory stopped.
    • No more need to wait till the entire directory is scanned.
    • Can move on from a menu or grid at any point, whether scanning is done or not.
  • Parallel/Concurrent rendering of grid cells.
    • Multiprocessing is utilized on platforms that support it or multithreading if otherwise.
    • No more need to wait till all cells are rendered.
    • Can move on from the directory entry at any point, even while the grid cells are still being rendered.
  • Implemented logging across all subprocesses.

Footnotes

  1. I reported the issue to the Pillow developers and opened a PR to fix the issues, though my changes would've broken some key things. 😞
    So, one of the core developers implemented the improvements in a better way that wouldn't break things. See GIF seek performance improvements python-pillow/Pillow#6077
    The changes have been merged and will be released with the next feature release 9.1.0 by April 1.
    When this version is out, I plan to revert the changes to TermImage.n_frames because the performance will be smooth enough.

  2. When Pillow 9.1.0 is out, I'll be extending the caching criteria of ImageIterator to cache frames based on a given maximum number of frames, which is a better metric than file size.


If you don't mind, please update your installation from the main branch to test the changes.

I'll gladly appreciate your feedback. Thanks 😃

@AnonymouX47
Copy link
Owner

AnonymouX47 commented Mar 14, 2022

In addition to these... I'll also be implementing the following before releasing 0.2.0:

  • Parallel image rendering in the TUI, to give a smoother experience when scrolling through images and reducing resources used for rendering.
  • A loading indicator in the TUI.
  • Improving the rendering performance of the library (using a C extension module)... Though, not yet sure if this will be shipping with 0.2.0 or not 🤔.

@AnonymouX47 AnonymouX47 added this to the 0.2.0 milestone Mar 29, 2022
@leo-arch
Copy link
Author

leo-arch commented Apr 4, 2022

Hey @AnonymouX47! Sorry for the delay.

It's much better now! Congrats and thanks! I really like this. I guess PIL internally uses something along the lines of cacalib or chafa (i.e., ASCII/ANSI rendering). Do you plan to add support for ueberzug, w3img, sixel or kitty? These protocols are really good at displaying images on the terminal, and besides they are more and more widely adopted.

I myself implemented a files previewer (not only images, but also PDF, document, postscript files, and even sound files) using fzf and ueberzug for my CliFM, but only as a plugin (a shell script indeed): it's nice and all, but far from ideal. A third party utility able to do this smoothly, and able to be integrated into CliFM, would be really nice. And term-img is on the right track for sure.

Keep up the good work!

@AnonymouX47
Copy link
Owner

It's much better now! Congrats and thanks!

Great, happy to hear this!

I guess PIL internally uses something along the lines of cacalib or chafa (i.e., ASCII/ANSI rendering)

Oh, no. PIL only decodes the images, the rendering is implemented here in term-img.

Do you plan to add support for ueberzug, w3img, sixel or kitty?

Another developer using the library also suggested the likes of this.
I'm currently looking at adding support for some other character-based styles and sixel, kitty and iterm2 protocols.
I had checked out ueberzug before but not yet decided if I want to add support... I remember it has Python bindings, so I guess is shouldn't much of a hassle. 🤔
I'll look into w3img also.

Though before any of these will be implemented, a major API change is required as regards the sizing unit (See #16).

I myself implemented a files previewer (not only images, but also PDF, document, postscript files, and even sound files) using fzf and ueberzug for my CliFM.

Yeah, I did check it out 😃... Great work 👍

A third party utility able to do this smoothly, and able to be integrated into CliFM, would be really nice. And term-img is on the right track for sure.

True, I'll put more work into it. 😃

Thanks so much, I really appreciate your feedback and suggestions... would love to see more of that.

I'll update you on the protocol support issue as I progress.

@leo-arch
Copy link
Author

leo-arch commented Apr 4, 2022

I'll update you on the protocol support issue as I progress

Thanks! That would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants