Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustdoc files have naming collisions on case-insensitive file systems #76922

Closed
jyn514 opened this issue Sep 19, 2020 · 6 comments
Closed

Rustdoc files have naming collisions on case-insensitive file systems #76922

jyn514 opened this issue Sep 19, 2020 · 6 comments
Labels
C-bug Category: This is a bug. O-macos Operating system: macOS O-windows Operating system: Windows T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@jyn514
Copy link
Member

jyn514 commented Sep 19, 2020

Rust is case-sensitive, but some filesystems (especially on Windows) are not.
This could cause overlaps in names for the files generated by rustdoc, such as the following:

struct Command; // page currently generated at `struct.Command.html`
struct command {} // page currently generated at `struct.command.html`

This is somewhat mitigated by the non_camel_case_types lint, which warns
if you don't use the customary capitalization for the items.

@Nemo157 has kindly conducted a survey of the docs.rs documentation and found
that there are about 700,000 items that currently overlap, out of 308,064,859 total items in the inventory, so 0.23% files conflict.

It's not clear how many users this impacts. Are there lots of people generating docs for crates with an overlap, and are they documenting on Windows? @retep998 said the behavior in that case will be that that whichever file is generated first will determine the name and whichever one is last will determine the contents.

@jyn514 jyn514 added O-windows Operating system: Windows T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels Sep 19, 2020
@the8472
Copy link
Member

the8472 commented Sep 19, 2020

Perhaps the directories could be marked as case-sensitive? Although that's a relatively recent windows 10 feature for WSL.

@camelid
Copy link
Member

camelid commented Sep 19, 2020

This is an issue on macOS too! The URLs at doc.rust-lang.org/std are case-sensitive, but docs from my local filesystem (file://...) are case-insensitive. E.g., trait.Write.html and trait.write.html both work!

@rustbot modify labels: O-macos

@rustbot rustbot added the O-macos Operating system: macOS label Sep 19, 2020
@CAD97
Copy link
Contributor

CAD97 commented Sep 19, 2020

Copying over a comment from i-rl-o:

The non_ascii_idents RFC punts on non-ascii file paths due to file system concerns, which is tangentially related, especially for one key reason:

It'd be nice if the same mangling scheme(s) could be used both for names/paths that don't work on the filesystem/URLs due to non-ascii paths as for case-sensitivity within ASCII.

The "standard" web solution for non-ascii domain names is punycode. I tested FireFox, and it currently "demangles" punycode in the domain name but not in the path part of the URL (and I have no idea whether standards say one way or the other whether punycode de-mangling should/n't be done on specific parts of the URL), but we still could theoretically reuse the same mangling scheme if Unicode names are a problem.

Unfortunately, punycode (deliberately) is no help in the ASCII range, so this probably isn't much help to the actual tracked issue here 🙃

@CAD97
Copy link
Contributor

CAD97 commented Sep 19, 2020

One other comment from the same thread, from @ogoffart:

A good potential solution is to emit a disambiguation page at struct.Command.html (only when deploying to a case-insensitive platform? always with an opt-out for deployment to a case-sensitive server?) that points at both Command and command, which are then mangled to a form that preserves case information.


(This part is my comment)

If stability of the specific page weren't a goal, the disambiguated pages could just be struct.command-1.html and struct.command-2.html. If stability of pages is a concern, then a punycode-like technique could be used to encode case information rather than unicode symbol insertions (e.g. struct.command.1s.html would encode the capitalization of 10000002). (Note: stability of pages only really should matter through semver-compatible changes, so removing one of the command names removing the mangled pages is presumably fine.)

@Nemo157
Copy link
Member

Nemo157 commented Sep 19, 2020

One worry I just had after non-ascii was just brought up is Unicode normalization in file systems, luckily it looks like rustc already does normalization when comparing identifiers (EDIT: and the RFC specifies that this must be NFC), so as long as the compiler and file system do the same normalization then file systems that support Unicode but normalize it should work fine:

error[E0428]: the name `Foó` is defined multiple times
 --> src/lib.rs:3:1
  |
2 | pub struct Foó; // "Fo\u{f3}"
  | --------------- previous definition of the type `Foó` here
3 | pub struct Foó ; // "Foo\u{301}"
  | ^^^^^^^^^^^^^^^ `Foó` redefined here
  |
  = note: `Foó` must be defined only once in the type namespace of this module

@ollie27
Copy link
Member

ollie27 commented Sep 20, 2020

Duplicate of #25879

@ollie27 ollie27 marked this as a duplicate of #25879 Sep 20, 2020
@ollie27 ollie27 closed this as completed Sep 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. O-macos Operating system: macOS O-windows Operating system: Windows T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants