File picker has high latency on large projects #1707

Superty · 2022-02-24T09:29:05Z

Reproduction steps

On large projects like LLVM, the file picker is quite slow to open and also is not nice to type in due to latency with each keystroke. On the other hand fzf handles the same project very smoothly. Here is an asciinema. I first show fzf and then helix. In the helix part I press space and then f immediately. Not sure how visible the typing latency is though in this recording, but you can see that it takes a while for the filepicker to come up. On the other hand fd | fzf opens immediately. (Sorry for it being cutoff on the right. You can still see that the menu opens and then it takes a while for the filepicker to open.)

Environment

Platform: Linux
Terminal emulator: alacritty
Helix version: v0.5.0-785-g806cc1c3

(helix.log doesn't have any lines from today)

The text was updated successfully, but these errors were encountered:

Aloso · 2022-02-25T16:42:06Z

The relevant parts are here:

helix/helix-term/src/ui/mod.rs

Lines 132 to 149 in 6a6a9ab

    
           let files = walk_builder.build().filter_map(|entry| { 
        
               let entry = entry.ok()?; 
        
               // Path::is_dir() traverses symlinks, so we use it over DirEntry::is_dir 
        
               if entry.path().is_dir() { 
        
                   // Will give a false positive if metadata cannot be read (eg. permission error) 
        
                   return None; 
        
               } 
        
               let time = entry.metadata().map_or(time::UNIX_EPOCH, |metadata| { 
        
                   metadata 
        
                       .accessed() 
        
                       .or_else(|_| metadata.modified()) 
        
                       .or_else(|_| metadata.created()) 
        
                       .unwrap_or(time::UNIX_EPOCH) 
        
               }); 
        
               Some((entry.into_path(), time)) 
        
           });

I think walking the directory tree on multiple threads would be possible to speed this up.

helix/helix-term/src/ui/mod.rs

Line 158 in 6a6a9ab

files.sort_by_key(|file| std::cmp::Reverse(file.1));

This can be replaced with sort_unstable_by_key, and sorting could be performed on another thread, if this turns out to be a bottleneck.

While typing, the slowest part is probably fuzzy-searching. This is done here, using fuzzy-matcher:

helix/helix-term/src/ui/picker.rs

Lines 347 to 349 in 6a6a9ab

    
           self.matcher 
        
               .fuzzy_match(&text, pattern) 
        
               .map(|score| (index, score))

I'm curious what fzf does differently to be this fast.

archseer · 2022-02-26T07:59:13Z

I think walking the directory tree on multiple threads would be possible to speed this up.

WalkBuilder does support build_parallel to do this.

The biggest blocker is that we don't do this loading asynchronously and in a streaming fashion. fzf etc. will continuously update and stream more files as it searches the tree. We want to do that down the line but it's not as straightforward.

I'm actually impressed the current implementation processes your 100k files in less than a second.

I'll look at the improvements suggested and do some measurements with cargo flamegraph

archseer · 2022-02-28T08:41:45Z

Findings so far, I have some changes staged locally:

Initialization

Running the walker in parallel is actually detrimental to performance because it adds overhead from threads and atomics used in channels to communicate results.

The biggest blocker is the amount of syscalls: fs::metadata is called per entry. We had some more calls because we were also reading mtime and doing an inefficient sort + collect to sort by modification time. I'm getting rid of that because I find sorting by mtime unreliable anyway.

Initial invocation is still fairly slow (2-4s for me for /nix limited to 100k entries), but subsequent calls hit the kernel cache and take about 360ms (down from 522ms).

For git repositories where gitignore filter is enabled we can query the git index directly rather than the kernel. This is much faster since it avoids syscalls. We can also in-memory index the current workspace on first instantiation and use notify to listen for changes to avoid querying the fs.

Match scoring

There were some inefficiencies there too:

Added a fastpath for empty search pattern. It was reasonably fast before too though.
Scoring only occurs if the search pattern changes, we would re-score while moving the cursor before
If the new pattern starts with the old pattern, we can avoid re-scoring the whole set and instead only re-score the current matches.

Scoring should only start after an idle timer. We already have a system for this for auto-completion, and it can be extended to both the scoring as well as syntax highlighting the file previews.

Aloso · 2022-02-28T10:30:31Z

@archseer AFAIK the git index only contains commited or staged files. New files that haven't been staged yet should also be suggested by helix.

archseer · 2022-03-03T07:53:26Z

Pushed some of the changes in 78fba86

pppKin · 2022-03-04T04:05:11Z

The biggest blocker is that we don't do this loading asynchronously and in a streaming fashion

This is so useful. Hopefully one day we'll get there!

simonsolnes · 2022-10-03T11:45:40Z

Coming from #4076, I want to mention that it might be because of iCloud Drive.

timrekelj · 2022-11-14T20:31:49Z

Is there any update on this issue? It is the one thing that is preventing me from switching from vim.

ksandvik · 2022-11-14T20:33:37Z

Do you have external file systems mounted in that directory? Those usually slow down the file browser of various reasons. But hx / on a Linux system with lots of file is usually pretty snappy.

I tried hx inside my iCloud directory, it took about 3 seconds to open up a 4Gb iCloud total directory. While hx in my $HOME dir (M1 MacBookPro) takes forever thanks to OneDrive and iCloud mount points.

antoyo · 2022-11-14T20:36:39Z

I can confirm it's still laggy on the Rust project on Linux.

timrekelj · 2022-11-14T20:37:24Z

No I don't have any external file systems, the only file system is the default one on 1 NVME storage. And the slow down happens on hello world rust project, which is weird as there is only 16 files in directory

edit: I tried it on c++ project (with 23 files) too and it freezes

jonahd-g · 2022-12-19T21:54:38Z

I encounter a lot of blocking on a large monorepo mounted in a virtual filesystem. It would be nice if the blocking lookup happened in a background thread so that I could at least keep typing.

l4l · 2023-05-10T00:59:54Z

Tried to run file picker on a big rust project locally with disabled gitignore, in total around ~200k files, typing each symbol in picker takes ~400-700ms to process. fzf searches smoothly but takes much more to load (apparently because it uses shell commands, wtf?). Perfed and got the following numbers:

  48.14%    [.] fuzzy_matcher::skim::SkimMatcherV2::fuzzy
  12.95%    [.] fuzzy_matcher::skim::CharType::of
   7.06%    [.] fuzzy_matcher::util::cheap_matches
   6.03%    [.] <std::path::Components as core::iter::traits::iterator::Iterator>::next
   3.07%    [.] <core::str::lossy::Utf8Chunks as core::iter::traits::iterator::Iterator>::next
   2.95%    [.] std::path::compare_components
   1.52%    [.] std::path::Path::_strip_prefix
   1.07%    [.] std::path::Components::parse_next_component_back
   1.04%    [.] core::slice::sort::recurse
   0.95%    [.] core::slice::memchr::memchr_aligned
...

Helix built with release + debug=true at current master.

So I guess optimizing out unmaintained fuzzy_matcher is a best option here. fzf algo isn't that trivial but fits into 1kloc if anyone interested in its re-implementation: source.

UPD: profile with call-graph is a little bit different but fuzzy_matcher still among the top entries, see attached profile (to see use https://profiler.firefox.com/ for example). perf.script.gz

archseer · 2023-05-10T01:15:39Z

The problem isn't the matching but the difference that fzf will match asynchronously. fuzzy_matcher isn't necessarily unmaintained, there just isn't a lot of work to do on an algorithm after it's implemented.

pascalkuthe · 2023-05-10T01:36:06Z

Yeah akim can beat fzf in performance. Fuzzy matchers I very well optimized. But skim and fzf:

Do the matching parallel
Stream entirws in asymchrounsly (which also avoids blocking the ui thread on slow IO)

l4l · 2023-05-10T01:47:51Z

I believe it should be no difference in doing that sync vs async. The only concern might arise due to big i/o latencies (which is not the case I've reported). In my case fzf doesn't perform any visible reorder (which imply it handles input more quickly), nor using threads (checked in top + got the same result with cpulimit). Moreover async impose overhead: sending data around the tasks, scheduling and lightweight (yet existing) context-switches. Therefore I think it should be possible achieving the similar speed without major code re-organization (i.e bringing the asynchronicity).

pascalkuthe · 2023-05-10T02:10:36Z

The main reason fzf is smooth because it score the entries asynchronously and streams the results in. In almost all cases your current top results will be your new top results so you dont notice but this very much happens and is critical for good performance.. Fzf is also always using multiple threads without an option to turn them off. Even if the matching algorithm is slower the difference will not be noticable with practical applications. skim is fast enough for all practical uses cases and uses the same algorithm as helix.

The fzf or fzy algorithm might be better but including a fuzzy matching algorithm in helix is out of scope and won't fix the performance issues. If a well maintained fuzzy matching crate comes aekuns that contains detailed bwncjamrks that demonstrate better performance than fuzzy-matchers then we could migrate but this is not something I would want to out effort into

Superty added the C-bug Category: This is a bug label Feb 24, 2022

kirawi added the A-helix-term Area: Helix term improvements label Feb 26, 2022

sudormrfbin mentioned this issue Mar 6, 2022

File Picker is slow for large projects #1233

Closed

the-mikedavis mentioned this issue Mar 31, 2022

Opening file picker in large repo takes excessively long #1910

Closed

sudormrfbin mentioned this issue Apr 6, 2022

async loading for directories #1987

Closed

the-mikedavis mentioned this issue Apr 11, 2022

Can't launch helix on the directory , env. %SYSTEMDRIVE% (Windows), Running <hx .> will give an unexpected result. #2061

Closed

the-mikedavis mentioned this issue Oct 2, 2022

Hangs on open picker #4076

Closed

the-mikedavis mentioned this issue Nov 8, 2022

Helix crashes when opening file picker #4653

Closed

the-mikedavis mentioned this issue Dec 28, 2022

Open File Picker takes a very long time in large repos #5324

Closed

the-mikedavis mentioned this issue Jan 29, 2023

freeze when opening /tmp directory #5725

Closed

pascalkuthe mentioned this issue Feb 8, 2023

hx freezes on file picker selection #5871

Closed

pascalkuthe mentioned this issue Feb 16, 2023

Refactor picker/menu logic into an options manager which can be fed with multiple different item sources (async, sync, refetch) #5991

Closed

pascalkuthe self-assigned this Aug 1, 2023

pascalkuthe mentioned this issue Aug 3, 2023

transition to nucleo for fuzzy matching #7814

Merged

archseer closed this as completed in #7814 Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File picker has high latency on large projects #1707

File picker has high latency on large projects #1707

Superty commented Feb 24, 2022

Aloso commented Feb 25, 2022

archseer commented Feb 26, 2022

archseer commented Feb 28, 2022

Aloso commented Feb 28, 2022

archseer commented Mar 3, 2022

pppKin commented Mar 4, 2022

simonsolnes commented Oct 3, 2022

timrekelj commented Nov 14, 2022

ksandvik commented Nov 14, 2022 •

edited

Loading

antoyo commented Nov 14, 2022

timrekelj commented Nov 14, 2022 •

edited

Loading

jonahd-g commented Dec 19, 2022

l4l commented May 10, 2023 •

edited

Loading

archseer commented May 10, 2023

pascalkuthe commented May 10, 2023

l4l commented May 10, 2023

pascalkuthe commented May 10, 2023

File picker has high latency on large projects #1707

File picker has high latency on large projects #1707

Comments

Superty commented Feb 24, 2022

Reproduction steps

Environment

Aloso commented Feb 25, 2022

archseer commented Feb 26, 2022

archseer commented Feb 28, 2022

Initialization

Match scoring

Aloso commented Feb 28, 2022

archseer commented Mar 3, 2022

pppKin commented Mar 4, 2022

simonsolnes commented Oct 3, 2022

timrekelj commented Nov 14, 2022

ksandvik commented Nov 14, 2022 • edited Loading

antoyo commented Nov 14, 2022

timrekelj commented Nov 14, 2022 • edited Loading

jonahd-g commented Dec 19, 2022

l4l commented May 10, 2023 • edited Loading

archseer commented May 10, 2023

pascalkuthe commented May 10, 2023

l4l commented May 10, 2023

pascalkuthe commented May 10, 2023

ksandvik commented Nov 14, 2022 •

edited

Loading

timrekelj commented Nov 14, 2022 •

edited

Loading

l4l commented May 10, 2023 •

edited

Loading