fd without pattern is much slower than find #304

vandenoever · 2018-06-14T08:08:50Z

I often use find to list files. Doing this in fd is much slower. Here are some results for listing all files in my home directory with warm and cold cache. The directory is on btrfs on Linux 4.15.15. find is about twice as fast in both cases and uses about 10% of the cpu cycles that fd uses.

$ hyperfine --warmup 3 'fd -HI -t f . ~' 'find ~ -type f'
Benchmark #1: fd -HI -t f . ~

  Time (mean ± σ):      5.668 s ±  0.151 s    [User: 8.523 s, System: 12.736 s]
 
  Range (min … max):    5.282 s …  5.788 s
 
Benchmark #2: find ~ -type f

  Time (mean ± σ):      2.640 s ±  0.222 s    [User: 779.7 ms, System: 1837.9 ms]
 
  Range (min … max):    2.407 s …  3.060 s
 
Summary

  'find ~ -type f' ran
    2.15x faster than 'fd -HI -t f . ~'

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -HI -t f . ~' 'find ~ -type f' 
Benchmark #1: fd -HI -t f . ~

  Time (mean ± σ):     28.027 s ±  1.714 s    [User: 10.159 s, System: 34.685 s]
 
  Range (min … max):   24.161 s … 29.585 s
 
Benchmark #2: find ~ -type f

  Time (mean ± σ):     15.151 s ±  0.482 s    [User: 1.386 s, System: 4.884 s]
 
  Range (min … max):   14.414 s … 15.703 s
 
Summary

  'find ~ -type f' ran
    1.85x faster than 'fd -HI -t f . ~'

Here is a screenshot of the CPU and IO during the run with cold cache.

The text was updated successfully, but these errors were encountered:

BurntSushi · 2018-06-14T11:48:10Z

@vandenoever Is there any way you can reproduce these results on a public corpus?

vandenoever · 2018-06-14T12:42:49Z

I can try if you've a suggestion for a corpus.

Some statistics on my files: 1.4M files in 363Gbytes, so on average the files are 264kbytes. See histogram.

BurntSushi · 2018-06-14T12:47:59Z

I'd try the Linux kernel. If that fails to show a difference, then I'd try the chromium source tree.

vandenoever · 2018-06-14T13:18:47Z

Here's the measurements for linux-4.17 source code. The effect with cold cache is less pronounced but still large.

warm

$ hyperfine --warmup 3 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):     156.2 ms ±   3.0 ms    [User: 270.4 ms, System: 298.3 ms]
 
  Range (min … max):   151.3 ms … 164.7 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):     353.9 ms ±  10.5 ms    [User: 258.3 ms, System: 281.8 ms]
 
  Range (min … max):   338.9 ms … 370.6 ms
 
Benchmark #3: find . -type f

  Time (mean ± σ):      76.7 ms ±   1.3 ms    [User: 24.0 ms, System: 52.2 ms]
 
  Range (min … max):    75.3 ms …  80.7 ms
 
Summary

  'find . -type f' ran
    2.04x faster than 'fd -HI -t f . .'
    4.61x faster than 'fd -HI -j1 -t f . .'

cold

$ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):     987.8 ms ±  15.8 ms    [User: 394.4 ms, System: 1279.4 ms]
 
  Range (min … max):   962.6 ms … 1007.4 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):      1.788 s ±  0.016 s    [User: 413.6 ms, System: 920.6 ms]
 
  Range (min … max):    1.758 s …  1.806 s
 
Benchmark #3: find . -type f

  Time (mean ± σ):     736.7 ms ±  11.2 ms    [User: 49.0 ms, System: 174.7 ms]
 
  Range (min … max):   725.2 ms … 756.9 ms
 
Summary

  'find . -type f' ran
    1.34x faster than 'fd -HI -t f . .'
    2.43x faster than 'fd -HI -j1 -t f . .'

sharkdp · 2018-06-14T16:12:16Z

Thank you very much for reporting this. Which version of fd are you using? There have been some performance improvements regarding "-HI searches" in 6.2 (see #191). Also, I have reported a performance regression in the ignore crate (see BurntSushi/ripgrep#835), which definitely affects fd in "-HI searches". The regression has been fixed in the meantime, however, the fix has not been released, so I was not able to integrate it into fd.

I can not reproduce these results on my home directory:

▶ fd --version
fd 7.0.0

▶ hyperfine --warmup 3 'fd -HI -t f . ~' 'find . ~ -type f' -i
Benchmark #1: fd -HI -t f . ~

  Time (mean ± σ):      1.395 s ±  0.045 s    [User: 4.666 s, System: 5.592 s]
 
  Range (min … max):    1.346 s …  1.469 s
 
Benchmark #2: find . ~ -type f

  Time (mean ± σ):      2.095 s ±  0.055 s    [User: 613.8 ms, System: 1455.1 ms]
 
  Range (min … max):    2.030 s …  2.187 s
 
  Warning: Ignoring non-zero exit code.
 
Summary

  'fd -HI -t f . ~' ran
    1.50x faster than 'find . ~ -type f'

▶ fd -HI -t f . ~ | wc -l
849174

Will try the Linux Kernel soon. Did you clone https://github.com/torvalds/linux or just download the 4.17 source code?

vandenoever · 2018-06-14T16:22:42Z

I'm running 7.0.0. I got 4.17 as tar.xz from kernel.org. What's your average file size?

Perhaps it's related to number of files per directory. My home dir has 1.4M files in 150k directories so an average of 10 files per directory.

Why does your benchmark command have a non-zero exit code?

vandenoever · 2018-06-14T16:27:19Z

@sharkdp: is fd also faster with -j1? Even though it's faster the CPU cost is 5x higher (10.3s vs 2.1s).

sharkdp · 2018-06-15T18:02:23Z

Why does your benchmark command have a non-zero exit code?

This can be ignored. I have a single test-folder (for fd) which is read-only and owned by root. This is why find returns a non-zero exit code. It does not affect the benchmark (fd can't enter that directory as well - and it's empty anyways).

Perhaps it's related to number of files per directory. My home dir has 1.4M files in 150k directories so an average of 10 files per directory.

Mine is 920k files in 180k folders (5 files per directory). Why do you think it could be related to the ratio files/directories (I'm not saying it isn't)?

I got 4.17 as tar.xz from kernel.org.

I ran the benchmarks for the same dataset. While the effect is less dramatic, I can reproduce your results for a warm cache:

▶ hyperfine --warmup 3 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):      86.5 ms ±   7.2 ms    [User: 293.6 ms, System: 235.9 ms]
 
  Range (min … max):    73.7 ms …  96.8 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):     285.3 ms ±   3.5 ms    [User: 191.8 ms, System: 252.3 ms]
 
  Range (min … max):   280.4 ms … 291.7 ms
 
Benchmark #3: find . -type f

  Time (mean ± σ):      68.9 ms ±   1.0 ms    [User: 25.4 ms, System: 43.0 ms]
 
  Range (min … max):    67.5 ms …  71.3 ms
 
Summary

  'find . -type f' ran
    1.25x faster than 'fd -HI -t f . .'
    4.14x faster than 'fd -HI -j1 -t f . .'

However, for a cold cache, I get the following:

▶ hyperfine --prepare 'sync; echo 3 | sudo tee /proc/sys/vm/drop_caches' 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):     254.3 ms ±   6.7 ms    [User: 298.8 ms, System: 612.6 ms]
 
  Range (min … max):   245.6 ms … 268.7 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):      1.220 s ±  0.017 s    [User: 375.6 ms, System: 587.6 ms]
 
  Range (min … max):    1.186 s …  1.245 s
 
Benchmark #3: find . -type f

  Time (mean ± σ):     638.9 ms ±  26.8 ms    [User: 54.8 ms, System: 169.1 ms]
 
  Range (min … max):   604.4 ms … 684.7 ms
 
Summary

  'fd -HI -t f . .' ran
    2.51x faster than 'find . -type f'
    4.80x faster than 'fd -HI -j1 -t f . .'

Interestingly, the Linux kernel has a files/directories ratio of about 14, so there might very well be a correlation.

fd is really designed for a multi-core setup. I believe it will always have a disadvantage over truly single-thread programs in a -j 1 setting because it actually still spawns a separate thread for searching which then sends results to a receiver-thread that outputs all results. Also, personally, I don't really care about the total amount of CPU cycles, but rather on the overall execution time.

Still, this seems to be a case where find is definitely faster than fd. I'm happy to investigate this further, but I will be away from home for two weeks, so this might take some time.

vandenoever · 2018-06-15T18:38:02Z

There are two actions going on reading a directory and reading file metadata. fd seems to be slower when there's more files per directory. That might mean that there's something expensive going on when reading those directories, e.g. allocating a larger Vec to hold the files.

I'm not in a hurry to have this fixed. It was the first thing I noticed when I tried out fd.

Since fd uses more CPU, find is probably faster more often on slower machine, even if they're multicore.

Perhaps using a Mutex to coordinate the output instead of using channels is faster, because you can then perhaps reuse buffers and avoid allocations.

sharkdp · 2018-06-15T20:51:32Z

Thank you for the useful tips!

I think the main issue might be the actual printing of the results in output::print_entry. Given a search pattern that limits the amount of output lines (e.g. ^usbnet with 2 results), fd is suddenly much faster:

▶ hyperfine --warmup 5 'fd -HI' 'fd -HI ^usbnet'  
Benchmark #1: fd -HI

  Time (mean ± σ):      72.4 ms ±   3.8 ms    [User: 243.2 ms, System: 83.6 ms]
 
  Range (min … max):    65.1 ms …  81.0 ms
 
Benchmark #2: fd -HI ^usbnet

  Time (mean ± σ):      28.1 ms ±   1.5 ms    [User: 124.8 ms, System: 67.0 ms]
 
  Range (min … max):    27.0 ms …  37.7 ms

... while find is slightly slower with a search pattern:

Benchmark #1: find .

  Time (mean ± σ):      67.8 ms ±   1.2 ms    [User: 24.2 ms, System: 43.1 ms]
 
  Range (min … max):    66.4 ms …  70.8 ms
 
Benchmark #2: find -iname "usbnet*"

  Time (mean ± σ):      79.2 ms ±   1.5 ms    [User: 33.5 ms, System: 45.2 ms]
 
  Range (min … max):    77.1 ms …  83.0 ms

I'm pretty sure we can find some ways to optimize output::print_entry. In the meantime, I hope that most of your searches actually have just a few results. After all, listing all files in a huge directory is probably not the most interesting use case 😄

Edit: another thing I have noticed in the past (although this will not affect the benchmarks above): If you really have a lot of search results, the output of fd is slowed down with enabled terminal colors. This due to (1) fd having to do additional work, because it splits the path into multiple colored parts and (2) because the terminal emulator has more work to do. Try comparing fd and fd --color=never in a huge directory. If you want to measure terminal output speed as well, you can use hyperfines new --show-output option.

mqudsi · 2018-06-15T21:41:59Z

I know we've ruled it out as the bottleneck, but to share a bit of greybeard folklore: back in my sysadmin days (and this was before microbenchmarking was the norm), I was always told to use ^ instead of . as supposedly pcre (or whatever lib it was that was king at the time) matched the former a tiny bit faster.

BurntSushi · 2018-06-15T22:34:56Z

Glancing at the printing functions, there is definitely some room to optimize I think. For my own use cases, I found that most of the standard path functions in std::path were noticeable on a profile, and dropping down into raw bytes (on Unix at least) made a difference. Generally speaking, I only ever use the standard path functions (and to_string_lossy) on Windows.

sharkdp · 2018-06-16T06:03:41Z

@BurntSushi Exactly, that was what I was thinking about. As a bonus, this would also solve #295.

sharkdp · 2018-08-19T17:57:58Z

I found a very simple optimization in 50a2bab which (possibly together with some other improvements) brings the multi-core version of fd-7.1.0 up to the same speed as find in the Linux-kernel benchmark:

▶ hyperfine --warmup 3 'fd -HI -t f . .' 'fd -HI -j1 -t f . .' 'find . -type f'
Benchmark #1: fd -HI -t f . .

  Time (mean ± σ):      69.1 ms ±   2.9 ms    [User: 226.5 ms, System: 87.7 ms]
 
  Range (min … max):    61.5 ms …  76.9 ms
 
Benchmark #2: fd -HI -j1 -t f . .

  Time (mean ± σ):     102.4 ms ±   4.3 ms    [User: 93.2 ms, System: 59.5 ms]
 
  Range (min … max):    98.9 ms … 115.6 ms
 
Benchmark #3: find . -type f

  Time (mean ± σ):      68.9 ms ±   1.3 ms    [User: 25.8 ms, System: 42.7 ms]
 
  Range (min … max):    67.3 ms …  72.7 ms
 
Summary

  'find . -type f' ran
    1.00x faster than 'fd -HI -t f . .'
    1.49x faster than 'fd -HI -j1 -t f . .'

I still need to look at optimizations in output::print_entry, though.

sharkdp · 2019-02-13T19:18:09Z

I'm going to close this for now. The current version of fd is still on par with find on this benchmark and much faster if the user is actually searching for something (and not just listing all of the 65,598 directory entries).

sharkdp added the performance label Jun 15, 2018

sharkdp closed this as completed Feb 13, 2019

sharkdp mentioned this issue Sep 21, 2019

[exploration] Use crossbeam MPMC channel instead of std::sync::mpsc #486

Closed

sharkdp mentioned this issue Feb 13, 2021

--color=always ~10x slower #720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fd without pattern is much slower than find #304

fd without pattern is much slower than find #304

vandenoever commented Jun 14, 2018 •

edited

Loading

BurntSushi commented Jun 14, 2018

vandenoever commented Jun 14, 2018

BurntSushi commented Jun 14, 2018

vandenoever commented Jun 14, 2018

sharkdp commented Jun 14, 2018 •

edited

Loading

vandenoever commented Jun 14, 2018 •

edited

Loading

vandenoever commented Jun 14, 2018 •

edited

Loading

sharkdp commented Jun 15, 2018

vandenoever commented Jun 15, 2018

sharkdp commented Jun 15, 2018 •

edited

Loading

mqudsi commented Jun 15, 2018

BurntSushi commented Jun 15, 2018

sharkdp commented Jun 16, 2018

sharkdp commented Aug 19, 2018

sharkdp commented Feb 13, 2019

fd without pattern is much slower than find #304

fd without pattern is much slower than find #304

Comments

vandenoever commented Jun 14, 2018 • edited Loading

BurntSushi commented Jun 14, 2018

vandenoever commented Jun 14, 2018

BurntSushi commented Jun 14, 2018

vandenoever commented Jun 14, 2018

warm

cold

sharkdp commented Jun 14, 2018 • edited Loading

vandenoever commented Jun 14, 2018 • edited Loading

vandenoever commented Jun 14, 2018 • edited Loading

sharkdp commented Jun 15, 2018

vandenoever commented Jun 15, 2018

sharkdp commented Jun 15, 2018 • edited Loading

mqudsi commented Jun 15, 2018

BurntSushi commented Jun 15, 2018

sharkdp commented Jun 16, 2018

sharkdp commented Aug 19, 2018

sharkdp commented Feb 13, 2019

vandenoever commented Jun 14, 2018 •

edited

Loading

sharkdp commented Jun 14, 2018 •

edited

Loading

vandenoever commented Jun 14, 2018 •

edited

Loading

vandenoever commented Jun 14, 2018 •

edited

Loading

sharkdp commented Jun 15, 2018 •

edited

Loading