Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intern more kinds of things #6207

Closed
2 of 3 tasks
Eh2406 opened this issue Oct 22, 2018 · 10 comments
Closed
2 of 3 tasks

Intern more kinds of things #6207

Eh2406 opened this issue Oct 22, 2018 · 10 comments
Assignees
Labels
A-cargo-api Area: cargo-the-library API and internal code issues

Comments

@Eh2406
Copy link
Contributor

Eh2406 commented Oct 22, 2018

Cargo sometimes makes a lot of copies of data structures. For example the string "serde", was copied for the name of each dependency on that crate, as the name of each version of that package, and for each feature that enables that package. This got fixed by adding a InternedString data type that deduplicates the data and leaks it into a &'static reference. We have other data structures which use Arc/Rc to make cloning cheaper.

We should experiment with Interning them to see if it is a Speed or Memory or Ergonomic win.

Off the top of my head:

  • PackageId is already using Arc, is copied and hashed and compared all over the place in hot code.
  • SourceId is already using Arc, has a manual cache for crates.io and will probably be leaked anyway by caching PackageId.
  • semver::Version/semver::VersionReq are used a lot, probably often with the same value, and are bigger structures then they seem.
@Eh2406 Eh2406 added E-help-wanted A-cargo-api Area: cargo-the-library API and internal code issues labels Oct 22, 2018
@alexcrichton
Copy link
Member

Note that {Package,Source}Id internally using Arc already was a "poor man's" form of interning early on. I definitely agree we should switch to an InternedString style approach, and in doing so it should be fine to remove the internal Arc

@derekdreery
Copy link
Contributor

derekdreery commented Oct 31, 2018

There is interning infrastructure used in the html5ever crate. Might be useful here - you can pre-intern anything you want at compile-time

EDIT the crates are string_cache and string_cache_codegen

@dwijnand dwijnand self-assigned this Nov 18, 2018
@dwijnand
Copy link
Member

For semver::Version and semver::VersionReq should we find a cargo-local solution (I'm thinking of wrapping all the Version::parse calls), or should we request this support upstream?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Nov 19, 2018

I think we find a cargo-local solution. I was thinking of making an InternedVersion{Req} to make sure we are always interning it, but if we only have a few constructors then maybe that is overkill.

@alexcrichton
Copy link
Member

Agreed this should probably be Cargo-specific for now, we can always upstream later if it's useful!

@dwijnand
Copy link
Member

I was thinking of making an InternedVersion{Req} to make sure we are always interning it, but if we only have a few constructors then maybe that is overkill.

I think the box-leaking, interning solution only gives us access to a &'static Version{Req} so we'd have to change all the instances of owned values to borrows. At that point I'm not sure if it's best to use:

  • &'static Version{Req},
  • InternedVersion{Req}, or a
  • InternedVersion{Req} which is a type alias of &'static Version{Req}.

Either way, I'll try and finish up this ticket on Sunday.

@dwijnand
Copy link
Member

@alexcrichton if you get a chance could you write down those hashing shenanigans you were explaining to me are happening with package id and source id? The part I'm most interested in is the ideal setup we'd like to be in, in terms of interned values and their hashes.

@alexcrichton
Copy link
Member

Sure thing! I think you ended up finding it on the SourceId PR. The Hash for SourceId leaves out many fields and canonicalizes others, but the actual instances are still distinct and have different fields. Now that I think about it I think PackageId is pretty bland, it's just SourceId that can be "equal" yet have different values for each field.

@dwijnand
Copy link
Member

Follow-up task: remove all the no longer necessary .clone() calls to all the Copy-able interned typed.

bors added a commit that referenced this issue Nov 25, 2018
bors added a commit that referenced this issue Nov 26, 2018
@dwijnand
Copy link
Member

#6468 implements the last bit about semver, but there were some uncertainty on whether it's a desired change which halted the PR. So I'm going to consider this done for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cargo-api Area: cargo-the-library API and internal code issues
Projects
None yet
Development

No branches or pull requests

4 participants