Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add case insensitive file systems support in rustdoc #83612

Closed

Conversation

GuillaumeGomez
Copy link
Member

Fixes #25879.
Fixes #80504.

This PR adds the support for case insensitive file system. To do so, we first detect if we are on a case insensitive file system (writing in files with same "name" if lowercased and then check if we have both different content or just one). This can be overriden with the --generate-case-insensitive option (which I used to test and for the tests).

Now for the front-end: we generate a file for the conflicted URL which contains JS which will load the expected file. The conflicted files are put in non-conflict file paths (simply adding a number depending on their position in the vector).

If you want to test it locally even on linux, you can check the Self and self keywords.

@GuillaumeGomez GuillaumeGomez added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. A-rustdoc-ui Area: rustdoc UI (generated HTML) labels Mar 28, 2021
@rust-highfive
Copy link
Collaborator

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez

@rust-highfive
Copy link
Collaborator

r? @ollie27

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 28, 2021
@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do so, we first detect if we are on a case insensitive file system (writing in files with same "name" if lowercased and then check if we have both different content or just one). This can be overriden with the --generate-case-insensitive option (which I used to test and for the tests).

I think we should do this unconditionally, not just when the current filesystem happens to be case-insensitive. The error in #83154 comes from trying to install linux docs on MacOS - if you keep the linux docs the same, they'll still give the same error.

Comment on lines +249 to +254
format!(
"var script = document.createElement('script');\
script.src = {:?};\
document.body.appendChild(script);",
script_src,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be dynamic with createElement('script')? Couldn't you just add the JS to the page directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could, but that would make the page a lot heavier (minimum twice the normal size since there at least two "items" to be loaded).

Copy link
Member

@jyn514 jyn514 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not my point - the idea here is you want to put the JS in a separate file so the main page is smaller, right? Why not add <script src="{script_src}"> rather than add that dynamically?

@GuillaumeGomez
Copy link
Member Author

GuillaumeGomez commented Mar 29, 2021

To do so, we first detect if we are on a case insensitive file system (writing in files with same "name" if lowercased and then check if we have both different content or just one). This can be overriden with the --generate-case-insensitive option (which I used to test and for the tests).

I think we should do this unconditionally, not just when the current filesystem happens to be case-insensitive. The error in #83154 comes from trying to install linux docs on MacOS - if you keep the linux docs the same, they'll still give the same error.

I'm not sure to understand: currently, this is checked everytime you run rustdoc, unless you use --generate-case-insensitive which will then enforce to check for conflicts. So you can get linux docs on macos and as long as you generate them on your mac, it'll work locally (or with --generate-case-insensitive option if you are on a case sensitive file system and want to export it).

EDIT: Oh, I think I just understood what you meant. Unfortunately, the current solution doesn't work on case sensitive file systems because if you have "a" and "A", it'll be two different files for the system. I can go around this limitation by creating a link to the first file if you want? Or simply duplicating it, as you prefer.

@kinnison
Copy link
Contributor

I think it should always generate safe filenames.

@GuillaumeGomez
Copy link
Member Author

They are safe on case sensitive file systems. I think we shouldn't do it automatically if unnecessary. But the case insensitive support should work on both kind of systems, so I'll fix the current issue.

@kinnison
Copy link
Contributor

If you intend to make the stdlib docs to always be generated with the flag to force case-insensitive support then okay. There is an issue that installing the linux docs on a macos filesystem (for use in a cross-compilation docker) fails because the Linux docs are built assuming a case-sensitive FS.

@GuillaumeGomez
Copy link
Member Author

I don't mind us using this flag when generating std docs so it can be redistributed everywhere without issue. But for websites like docs.rs, it makes more sense to just keep the current system (less files to handle, less JS, etc).

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind us using this flag when generating std docs so it can be redistributed everywhere without issue. But for websites like docs.rs, it makes more sense to just keep the current system (less files to handle, less JS, etc).

I don't really like special casing the standard library docs more than we have to - it seems useful to me to have the docs redistributable anywhere, regardless of the project. docs.rs doesn't have downloadable docs for now, so it's less of an issue, but it may in the future. Projects that want to distribute their own docs, like windows-rs and rocket, would also benefit from having this.

If the current solution is too JS heavy, I would rather work on making that more efficient, since I expect people to read the standard library docs at least as much as they read docs.rs.

Comment on lines +249 to +254
format!(
"var script = document.createElement('script');\
script.src = {:?};\
document.body.appendChild(script);",
script_src,
)
Copy link
Member

@jyn514 jyn514 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not my point - the idea here is you want to put the JS in a separate file so the main page is smaller, right? Why not add <script src="{script_src}"> rather than add that dynamically?

@GuillaumeGomez
Copy link
Member Author

I don't really like special casing the standard library docs more than we have to - it seems useful to me to have the docs redistributable anywhere, regardless of the project. docs.rs doesn't have downloadable docs for now, so it's less of an issue, but it may in the future. Projects that want to distribute their own docs, like windows-rs and rocket, would also benefit from having this.

It's not special casing, it's simply about enabling an option. I guess the debate here is wether or not we want to enable it by default. I'd prefer not though because some parts (not much but still) of the documentation wouldn't work without JS enabled.

If the current solution is too JS heavy, I would rather work on making that more efficient, since I expect people to read the standard library docs at least as much as they read docs.rs.

The original page is very small, so that part isn't an issue from what I could see. The parsing of the JS is also very quick since it's mostly strings.

That's not my point - the idea here is you want to put the JS in a separate file so the main page is smaller, right? Why not add <script src="{script_src}"> rather than add that dynamically?

Because if we do that, the content will be loaded in all cases (if it's part of the DOM, it's loaded). So if we have 3 conflicts, we'll load the three items even though we only need one.

@jyn514
Copy link
Member

jyn514 commented Mar 29, 2021

Because if we do that, the content will be loaded in all cases (if it's part of the DOM, it's loaded). So if we have 3 conflicts, we'll load the three items even though we only need one.

Ok I see, I misunderstood how it works - it takes the entire page of the item and loads it in the current page using JS. That's also why it needs JS to work.

Rather than doing that, can you redirect to the disambiguated page? That shouldn't need any JS at all, just <meta http-refresh>, and avoids the issue with dynamic loading (which @pietroalbini will be happy about).

@est31
Copy link
Member

est31 commented Mar 29, 2021

I'd prefer not though because some parts (not much but still) of the documentation wouldn't work without JS enabled.

Which parts of the std libs documentation have collisions right now? It seems to me that they are quite rare.

@jyn514
Copy link
Member

jyn514 commented Mar 29, 2021

@est31 #80504 (it's in the PR description)

@GuillaumeGomez
Copy link
Member Author

Rather than doing that, can you redirect to the disambiguated page? That shouldn't need any JS at all, just <meta http-refresh>, and avoids the issue with dynamic loading (which @pietroalbini will be happy about).

And yet another thing I didn't know about. If it works, it'll be so awesome! Definitely gonna give it a try. If this work, I'll remove the split between sensitive and insensitive case like we currently have.

@GuillaumeGomez
Copy link
Member Author

I've been thinking even more about distributing the case insensitive by default, but it'd be problematic: on case sensitive file systems, we'll need to create file links to the file to handle all URLs (which isn't a problem). However, those file links might override the original file on the case insensitive systems. So independently of the solution suggested by @jyn514, how could we solve this problem?

@est31
Copy link
Member

est31 commented Mar 29, 2021

Just did a quick check in the browser console, apparently it's indeed the only doubling in the std docs:

{ let b = new Map(); searchIndex.std.i.map(function(v) {return [v[1].toLowerCase(), v[1], v[0]];}).filter(args => {let [lc, nc, v] = args; const k = lc+v; if (b.has(k)) { if (b.get(k) !== nc) { return true; } } b.set(k, nc); return false;});}

Outputs:

[
  [
    "self",
    "Self",
    21
  ]
]

@est31
Copy link
Member

est31 commented Mar 29, 2021

So independently of the solution suggested by @jyn514, how could we solve this problem?

One way might be to generate files for each case version, but each file generated contains a hidden div for each casing. On load, there is a little bit of js that looks at window.location.href and auto expands the correct div. That way, if the docs get messed up by a case insensitive file system, there is still a recovery path. To support non-js users, one can make the div that corresponds to the intended casing of the file be shown by default, and only be hidden by the js.

It's probably a nightmare though as I guess most of the content of the page has to be in that div :).

@jyn514
Copy link
Member

jyn514 commented Mar 29, 2021

I've been thinking even more about distributing the case insensitive by default, but it'd be problematic: on case sensitive file systems, we'll need to create file links to the file to handle all URLs (which isn't a problem). However, those file links might override the original file on the case insensitive systems. So independently of the solution suggested by @jyn514, how could we solve this problem?

I don't understand this problem, can you describe it more? How would it go wrong?

@GuillaumeGomez
Copy link
Member Author

Let's say you don't generate links on case sensitive file system: if you have a capital letter, the file won't be found. Which is why you need links if case sensitive. But links remain files, so on case insensitive, they might overwrite the "good" file (the conflict file like I call it). If that makes more sense?

@jyn514
Copy link
Member

jyn514 commented Mar 29, 2021

We discussed this in DMs - the issue is that keyword.self.html and keyword.Self.html both still need to exist on case-sensitive file systems. So we end up with the same issue as before, there needs to be a file in both cases, which breaks on case-insensitive file-systems. The only fix if we want docs generated on ext4 to work on NTFS is to change the names of the files we emit. @GuillaumeGomez is hoping to write up an RFC sometime soon.

@bors
Copy link
Contributor

bors commented Apr 2, 2021

☔ The latest upstream changes (presumably #83781) made this pull request unmergeable. Please resolve the merge conflicts.

@jyn514
Copy link
Member

jyn514 commented Apr 2, 2021

@GuillaumeGomez IIUC we're currently planning to take a different approach, right? Does it make sense to keep this PR open?

@GuillaumeGomez
Copy link
Member Author

No. I guess that the FS kind detection part will be useful though. :)

@GuillaumeGomez GuillaumeGomez deleted the case-insensitive branch April 2, 2021 17:15
@GuillaumeGomez GuillaumeGomez restored the case-insensitive branch September 1, 2021 10:24
@GuillaumeGomez GuillaumeGomez deleted the case-insensitive branch August 19, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-ui Area: rustdoc UI (generated HTML) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
8 participants