Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't hash FileTime value in PathSource::fingerprint #6835

Open
alexcrichton opened this issue Apr 9, 2019 · 3 comments
Open

Don't hash FileTime value in PathSource::fingerprint #6835

alexcrichton opened this issue Apr 9, 2019 · 3 comments
Labels
A-rebuild-detection Area: rebuild detection and fingerprinting C-cleanup Category: cleanup within the codebase S-triage Status: This issue is waiting on initial triage.

Comments

@alexcrichton
Copy link
Member

Currently here in Cargo we will insert the FileTime value, stringified, into a Fingerprint for a package. This is only ever used for build scripts which do not mention rerun-if-* directives.

Hashing FileTime is particularly brittle because filesystems are weird. The primary use case today is that Docker will rewrite file times to zero out the nanosecond field. This means that the hash for a build script changes between docker, even though nothing actually changed.

We should instead thread through the set of files considered input to a build script to the Fingerprint. These files would then be stored in the Fingerprint (and deterministically hashed) where we would later determine at runtime what the FileTime modification time is for each file for recompilation purposes.

I'm spawning this off of #6832 for a separable unit of work. We'll know this is fixed when this test has its branch on already_zero removed, because we should always consider everything fresh.

@fluffysquirrels
Copy link
Contributor

fluffysquirrels commented Apr 28, 2019

Sorry if this is obvious, but should we hash source files to check freshness only when their modified time has changed? Otherwise we would need to re-hash all source files for path dependencies on every null build, correct? That seems like it might hurt null build performance.

In case it helps, I believe Git only rehashes files when their modified time (stored in the index) is changed.

I'm happy to take a look at this; would a PR be useful?

@alexcrichton
Copy link
Member Author

Hashing the contents of files is definitely something we're interested in exploring! That's tracked at #6529 and you're right in that it's not initially done because of performance reasons, but using a combination of mtime + content hash seems not entirely unreasonable.

@ehuss ehuss added the A-rebuild-detection Area: rebuild detection and fingerprinting label Sep 23, 2019
@epage epage added C-cleanup Category: cleanup within the codebase S-triage Status: This issue is waiting on initial triage. labels Oct 31, 2023
@alexpyattaev
Copy link

Scons uses content hash successfully for years now, and works faster than most build systems for c++.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rebuild-detection Area: rebuild detection and fingerprinting C-cleanup Category: cleanup within the codebase S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

5 participants