Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle local images in rustdoc output #3397

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions text/000-rustdoc-bundle-local-resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
Rustdoc: Bundle local images

- Feature Name: NONE
- Start Date: 2023-02-06
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#32104](https://github.com/rust-lang/rust/issues/32104)

# Summary
[summary]: #summary

This RFC proposes to allow the bundling of local images in rustdoc HTML output. A draft implementation is available as [#107640](https://github.com/rust-lang/rust/pull/107640).

# Motivation
[motivation]: #motivation

Doc authors want to produce docs that are consistent across local `cargo doc` output, `docs.rs`, and self-hosted docs. They would also like to include images (like logos and diagrams), and scripts (like KaTeX for rendering math symbols). Both doc authors and doc readers would like for those resources to not be subject to link-rot, which means it should be possible to build docs for an old version of a crate and have the images and scripts reliably available. Doc readers would like for `cargo doc` output to be rendered correctly by their browsers even when they are offline.

Right now, there are attributes that can set a logo and a favicon for documentation, but they must to point to an absolute URL, which prevents bundling the logo and favicon in the source repository. Also, while `<script>` tags are allowed in rustdoc, they have a similar problem: if they load script from some URL, that URL needs to be absolute or it won't work consistently across `cargo doc` and `docs.rs`.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

This RFC proposes to allow rustdoc to include local images in the generated documentation by copying them into the output directory.

This would be done by allowing users to specify the path of a local resource file in doc comments. The resource file would be stored in the `doc.files` folder. The `doc.files` folder will be at the "top level" of the rustdoc output level (at the same level as the `static.files` or the `src` folders).

The only local resources considered will be the ones in the markdown image syntax: `![resource title](path)`, where `<path>` is the path of the resource file relative to the source file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this solve the <script> usecase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't cover this case at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... then I guess it might be better not to mention it in the "Motivation" section, or explicitly mention that this RFC does not cover this case :)

Copy link
Member Author

@GuillaumeGomez GuillaumeGomez Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this line it mentions The only local resources considered will be the ones in the markdown image syntax. But it's true that a bit above we have They would also like to include images (like logos and diagrams), and scripts, which is confusing. I'll remove this part. But otherwise, where is it not clear enough so I can make it more obvious?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like: "While using custom <script> tags is a valid usecase, this RFC only focusses on the usecase of dealing with images" at the end of the motivation section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


The path could be any relative or absolute file path. For example, to include an image generated by [`build.rs`](https://doc.rust-lang.org/cargo/reference/build-scripts.html), concatenate a path with the `OUT_DIR` environment variable:

```rust
/// Using a local image
#[doc=concat!("![with absolute path](", env!("OUT_DIR"), "/local/image.png)")]
///
/// Using a local image ![with relative path](../local/image.png)
```

Since the local images are all put in the same folder, if the same is imported from different crates, the content won't be duplicated since they have the same name and the same hash.

If the path isn't referring to a file, a warning will be emitted and rustdoc will left the path unchanged in the generated documentation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this result in relative links to rustdoc pages that aren't using intra-doc links warning? Warning for these is fine, but should ideally be able to say "this looks like an intra-doc link, use one instead."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't understand what you mean. Intra-doc links is a feature used on links whereas this feature is only applied on image links. For example:

/// ![image](../relative-link-to-image)
///
/// [intra_doc_link_type]


For published crates, `docs.rs` builds the contents of the `.crate` package in a sandbox with no internet access. Make sure any resources your docs need are [included](https://doc.rust-lang.org/cargo/reference/manifest.html#the-exclude-and-include-fields) in the package.

The local resources files are not affected by the `--resource-suffix`.

The impact on `docs.rs` would also be very minimal as the size of a published crate resources is limited to a few megabytes. The only thing needed would be to handle the new `doc.files` folder.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for images this might be true, but I'm a bit worried about JS files. while docs.rs currently allows running arbitrary JS from 3rd-party servers, the situation might become even worse if docs.rs itself would serve those potentially malicious files.

or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

![] syntax generates <img> HTML tag and nothing else. So the idea behind this RFC is to bundle the resource (if local) into the generated rustdoc output.

If you put a .js file in it, it'll be bundled inside a <img> (so not displayed) and I suppose you'll then be able to call it with <script> though. But the same can be done currently with any file extension and then load it as JS into a <script>. So in this regard, there are no changes I suppose?


There are two options that will be impacted by this RFC: the favicon and logo that you can respectively set through:

```rust
#![doc(html_favicon_url = "some_path", html_logo_url = "some_other_path")]
```

They will follow the same rule as for other images: if this is a local path, the local file will be copied and the paths to it will be rewritten.

To support `#[doc(inline)]` for foreign items using local resources, it will rely on the `-Zrustdoc-map` option.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

A new rustdoc pass will be added which would go through all documentation to gather local resources into a map.

Then in HTML documentation generation, the local resources pathes will be replaced by their equivalent linking to the output directory instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then in HTML documentation generation, the local resources pathes will be replaced by their equivalent linking to the output directory instead.
Then in HTML documentation generation, the local resources paths will be replaced by their equivalent linking to the output directory instead.


The local resources files will be renamed as follows: `{original filename}-{hash}{extension}`. The `{hash}` information will be computed from the local resource file content.

You can look at what the implementation could look like in [#107640](https://github.com/rust-lang/rust/pull/107640).

When an image is included in an item that gets inlined across a crate, rustdoc will treat it like a cross-crate intra-doc link, using `--extern-html-root-url` so that `docs.rs` can hotlink the image from the crate that holds a copy of the image. This has a few upsides and downsides compared to the approach where the image itself is copied into the crate with the inlined docs.

* reduces the number of duplicated images that `docs.rs` has to store
* doesn't require the source code for the source crate when inlining
* only requires storing the hash of the file in the `.rmeta`, not the whole image
* requires rustc to look at the doc comments and hash the image(s)
Copy link
Contributor

@notriddle notriddle May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsha

While doing this, can it also produce a lint warning if the image is bigger than 50KiB or so?

Of course, any value chosen here will be arbitrary and contentious, but so does adding operational complexity to crates.io, docs.rs, and third parties that want to run rustdoc builds in sandboxes (how would hosting images separately fit into Bazel?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps; or perhaps the lint should be something like "the total of all your non-code resources is >10% of the crate size." It's still not really satisfying though, since it doesn't fully solve the problem. If you want to include lots of resources, or large resources, the only solution is going to be to refer to them using absolute URLs.

Out of curiosity I spot-checked a couple of crates, regex and rayon. They're 248 kB and 169 kB respectively, so 10% would be 24.8 kB and 16.9 kB.

Also it's worth mentioning one counterpoint to my concern: source files that are bundled into .crate files are not minified and comments aren't removed. That's important specifically so docs can be generated from them (including the source file view within rustdoc). So in some sense published crates already do contain bytes that are only useful for documentation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in some sense published crates already do contain bytes that are only useful for documentation.

Examples and tests are also arguably superfluous files — they benefit crater, and those who read the source code of the downloaded crate, but not the typical downloader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it with percentages seems unnecessarily complicated for something that’s just an arbitrary metric pulled out of a hat anyway.

  • as long as you’re using ordinary markdown, you won’t be able to use responsive images, so mobile viewers on metered data plans reading docs.rs will be downloading full-sized images
  • docs.rs isn’t a very good image CDN, since it doesn’t reencode to WebP or anything
  • anyone building docs locally will be downloading the image even if they don’t look at it

In other words, it’s not just about cargo build. Coping with large images is just a chore.

What I’m trying to do is gesture in the direction of “try to use small images.” Not just solve the problem for plain cargo builds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random musings on cargo's behavior

To protect against zip bombs, cargo had a 512 MB limit when extracting crates and people ran into this (rust-lang/cargo#11151) and we updated it to take compression size into account (rust-lang/cargo#11337) so we are less likely to hit the 512 MB limit unintentionally.

We also now report crate size on package/publish (rust-lang/cargo#11270). One future possibility mentioned in that PR is to warn when binary files are included (rust-lang/cargo#9058). In general, cargo has been hesitant about adding warnings that because we haven't had a way for people to disable them but with [lints] (rust-lang/cargo#12115), we have the possibility to add [lints.cargo] support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be cognizant of users. By including files that aren't as necessary

  • Slower download of .crate file
  • Extra disk space taken up (currently will be twice, once for the .crate and once for the extracted directory)
  • Extra time to extract the .crate

Some things that would help (and are currently being discussed)

  • Garbage collection of .crate files and extracted .crate files
  • Building directly from .crate files (this is a big more out there)

Dependent CI jobs will never use this content.

I wonder how many users are like me that almost exclusively use docs.rs rather than cargo doc and any missing images wouldn't be a big deal. Or in other words, we should consider if the benefit to the community outweighs the cost to the community.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just answering:

I wonder how many users are like me that almost exclusively use docs.rs rather than cargo doc and any missing images wouldn't be a big deal. Or in other words, we should consider if the benefit to the community outweighs the cost to the community.

It'd be needed for docs.rs too unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be needed for docs.rs too unfortunately.

How so? I'm not seeing anything listed in the Motivation that benefits us docs.rs-exclusive users (speaking as someone who also has crates that include images).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you host the images on a website, you can't actually use them in the documentation generated by docs.rs. There are some ways to go around this limitation, for example: generating the base64 equivalent of the image and including it directly into your crate doc comment. But it's not great. Hence this RFC. But the problem is as you mentioned that it would impact the community by downloading more content that wouldn't actually be used.

A potential solution you mentioned (Building directly from .crate files (this is a big more out there)) would very likely lessen the negative impact. But better take things slow and be sure we don't miss anything. :)

* produces broken images when `cargo doc --no-deps` is run locally
* requires the URLs generated for these images to be stable across rustdoc versions

Embedding images in rmeta files is probably a bad idea, requiring access to the source code for a dependent crate when building the dependency would work poorly with some third-party build systems, and not supporting images in cross-crate inlined docs would be inconsistent and weird.

# Drawbacks
[drawbacks]: #drawbacks

Allowing local resources in rustdoc output could lead to big output files if users include big resource files. This could lead to slower build times and increase the size of generated documentation (in particular in case of very big local resources!).

Another problem is that people will add images into their published crates, increasing the package size whereas it's only used for documentation.

# Prior art
[prior-art]: #prior-art

- [sphinx](https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-latex_additional_files)
- [haddock](https://haskell-haddock.readthedocs.io/en/latest/invoking.html?highlight=image#cmdoption-theme): it's mentioned in this command documentation that local files in the given directory will be copied into the generated output directory.
- [doxygen](https://doxygen.nl/manual/commands.html#cmdimage): supported through `\image`.
- [embed-doc-image](https://docs.rs/embed-doc-image/latest/embed_doc_image/): a proc-macro based version which directly embed the content into the generated documented as a base64 string.

Another approach to this feature:

[ePUB packages](https://www.w3.org/publishing/epub3/epub-packages.html#sec-pkg-manifest) use explicitly-declared manifests. It allows to have a fallback chain mechanism (going through resources for an entry until an available resource is found).

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Currently, to provide resources, users need to specify external URLs for resources or inline them (if possible like the `svg` image format) directly into the documentation. It has the advantage to avoid the problem of large output files, but it also requires users to upload their resources to a web server to make them available everywhere.

# Unresolved Questions
[unresolved-questions]: #unresolved-questions

- Should we put a size limit on the local resources?
- Should we somehow keep the original local resource filename instead of just using a number instead?
- Should we use this feature for the logo if it's a local file?

# Possible extensions
[possible-extensions]: #possible-extensions

This feature could be extended to DOM content using local resources. It would require to add parsing for HTML tags attributes. For example:

```html
/// <video src="../some-video.mp4">
```