Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--color=always ~10x slower #720

Closed
Exr0n opened this issue Jan 27, 2021 · 4 comments · Fixed by #853
Closed

--color=always ~10x slower #720

Exr0n opened this issue Jan 27, 2021 · 4 comments · Fixed by #853
Assignees
Milestone

Comments

@Exr0n
Copy link

Exr0n commented Jan 27, 2021

Describe the bug you encountered:

I understand that outputting the escape sequences for colors is more text, but I didn't expect it to be 10x slower. Is this expected behavior?

Describe what you expected to happen:

Expected color=always to not be 10x slower, although maybe I'm simply misguided.

image

What version of fd are you using?
fd 8.2.1 (fd-8.2.1-1 on the AUR)

Which operating system / distribution are you on?

Linux 5.10.10-arch1-1 x86_64

and lsb_release but I don't think it's very helpful :)

LSB Version:	1.4
Distributor ID:	Arch
Description:	Arch Linux
Release:	rolling
Codename:	n/a

Ryzen 3900x (24 threads) but it seemed during the bench mark only ~80% cpu was being used.

@Exr0n Exr0n added the bug label Jan 27, 2021
@tavianator
Copy link
Collaborator

--color can be much more expensive because it needs to stat() more often, since colors depend on things like permissions and link counts that aren't known from just reading directories. However, the sheer number of stat() calls seems like a bug:

$ strace -cf fd --color=always -HI -j1 >/dev/null
strace: Process 51592 attached
strace: Process 51594 attached
strace: Process 51593 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 64.09    3.465955        6211       558         4 futex
 25.69    1.389121           2    500392         1 statx
  4.53    0.244950           2    115031           write
  2.64    0.142739           8     16900           getdents64
  1.08    0.058637           6      8455           openat
  0.98    0.052769           6      8454           fstat
  0.97    0.052676           6      8457           close
...
$ strace -cf fd -HI -j1 >/dev/null
strace: Process 51606 attached
strace: Process 51607 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 57.20    1.086218        1792       606        29 futex
 23.77    0.451380           3    115031           write
  9.00    0.170929          10     16900           getdents64
  4.33    0.082204           9      8455           openat
  3.10    0.058842           6      8457           close
  2.59    0.049160           5      8454           fstat
...

500392 calls to statx() is much more than necessary for only 115031 total files. (Also, 1 write() per file is suboptimal too.) It looks like fd is calling statx() for every path component, which results in lots of repeated calls that could be cached:

$ mkdir -p 1/2/3/4/5/6/7/8/9
$ strace -f -e statx fd -HI -j1 . 1
...
[pid 57576] statx(AT_FDCWD, "1", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
[pid 57576] statx(AT_FDCWD, "1/2", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
1/2
[pid 57576] statx(AT_FDCWD, "1", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
[pid 57576] statx(AT_FDCWD, "1/2", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
[pid 57576] statx(AT_FDCWD, "1/2/3", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
1/2/3
[pid 57576] statx(AT_FDCWD, "1", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
[pid 57576] statx(AT_FDCWD, "1/2", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
[pid 57576] statx(AT_FDCWD, "1/2/3", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
[pid 57576] statx(AT_FDCWD, "1/2/3/4", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFDIR|0750, stx_size=2, ...}) = 0
1/2/3/4
...

This happens because fd colors each path component separately. In bfs I just color the whole thing up to the basename as a directory (even though some components might be links). The extra precision of fd here is costly without any caching.

@tavianator
Copy link
Collaborator

tavianator commented Feb 2, 2021

The extra write() calls are because Rust's Stdout is always line-buffered, even if stdout is not a tty :O

rust-lang/rust#60673

@sharkdp sharkdp added performance and removed bug labels Feb 13, 2021
@sharkdp
Copy link
Owner

sharkdp commented Feb 13, 2021

Thank you for reporting this. I was aware of this performance problem but never got around to fixing it. Note that the "10x slower" is only true if there are LOTS of search results.

Related:

It looks like fd is calling statx() for every path component, which results in lots of repeated calls that could be cached:

True.

This happens because fd colors each path component separately. In bfs I just color the whole thing up to the basename as a directory (even though some components might be links). The extra precision of fd here is costly without any caching.

Thank you for the information. Colorizing everything up to the basename also sounds like a reasonable alternative. We won't be able to do this anymore:
image
But I guess that's okay. A cache might still be of value if there are multiple results in the same folder. This is definitely something that we should address.

Also, 1 write() per file is suboptimal too.

Could you go into more detail here? You mean we could buffer results if not writing to a TTY? (if we write to a TTY, we want results to be visible immediately).

@tavianator
Copy link
Collaborator

Could you go into more detail here? You mean we could buffer results if not writing to a TTY? (if we write to a TTY, we want results to be visible immediately).

Yeah, I expect stdout to be line buffered when going to a TTY and fully buffered otherwise. That's how it is in C anyway. I thought Rust would do the same thing but it's just always line buffered.

@tavianator tavianator self-assigned this Aug 31, 2021
tavianator added a commit to tavianator/fd that referenced this issue Sep 16, 2021
It can be expensive to color each path component separately, requiring a
stat() call on each component.  For deep hierarchies this can result in
quadratic overhead.  Instead, just color the path up to the basename as
a directory.

Fixes sharkdp#720.
tavianator added a commit to tavianator/fd that referenced this issue Sep 17, 2021
It can be expensive to color each path component separately, requiring a
stat() call on each component.  For deep hierarchies this can result in
quadratic overhead.  Instead, just color the path up to the basename as
a directory.

Fixes sharkdp#720.
tavianator added a commit to tavianator/fd that referenced this issue Sep 20, 2021
It can be expensive to color each path component separately, requiring a
stat() call on each component.  For deep hierarchies this can result in
quadratic overhead.  Instead, just color the path up to the basename as
a directory.

Fixes sharkdp#720.
sharkdp pushed a commit to tavianator/fd that referenced this issue Nov 14, 2021
It can be expensive to color each path component separately, requiring a
stat() call on each component.  For deep hierarchies this can result in
quadratic overhead.  Instead, just color the path up to the basename as
a directory.

Fixes sharkdp#720.
@sharkdp sharkdp added this to the fd 8.3 milestone Nov 14, 2021
sharkdp pushed a commit that referenced this issue Nov 14, 2021
It can be expensive to color each path component separately, requiring a
stat() call on each component.  For deep hierarchies this can result in
quadratic overhead.  Instead, just color the path up to the basename as
a directory.

Fixes #720.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants