Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create real parser for search queries #90630

Merged
merged 28 commits into from
Apr 21, 2022

Conversation

GuillaumeGomez
Copy link
Member

@GuillaumeGomez GuillaumeGomez commented Nov 5, 2021

You can test it here.

This PR adds a real parser for the query engine in rustdoc. The parser is quite simple but it allows to makes query handling much easier. I added a new testsuite to ensure it works as expected and ran fuzzing checks on it for a few hours without problems.

So about the parser: as you can see in the screenshot, it handles recursive generics parsing. It also allows to set which item should use exact matching by adding double-quotes around it (look for exact_search in the screenshot).

Now about the query engine itself: I simplified it a lot thanks to the parsed query. It behaves mostly the same when there is only one argument, but is much more powerful when there are more than one.

When making this change, we also removed the support for multi-query.

PS: A big part of the PR is tests and test-related code. :)

r? @camelid

@GuillaumeGomez GuillaumeGomez added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. A-type-based-search Area: Searching rustdoc pages using type signatures A-rustdoc-search Area: Rustdoc's search feature A-rustdoc-js Area: Rustdoc's front-end labels Nov 5, 2021
@rust-highfive
Copy link
Collaborator

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 5, 2021
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@GuillaumeGomez
Copy link
Member Author

Ok, seems like I fixed all tidy issues. :)

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wish you'd talked this over with the team before adding lots more custom syntax. I was under the impression you were fixing bugs, not adding entirely new features.

src/bootstrap/builder.rs Outdated Show resolved Hide resolved
Comment on lines 864 to 865
} else {
builder.info("No nodejs found, skipping \"src/test/rustdoc-js-parser\" tests");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not force-skip this. Make the default condition whether node is installed instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/librustdoc/html/static/js/main.js Outdated Show resolved Hide resolved
<code>* -&gt; vec</code>)",
"Search multiple things at once by splitting your query with comma (e.g., \
<code>str,u8</code> or <code>String,struct:Vec,test</code>)",
"You can look for functions based on its arguments (e.g., \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need special syntax? There's already a "function parameters" tab.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a special syntax or anything. Maybe I badly wrote it but it's simply that you can look at function arguments specifically if you want. Extra explanations:

item

This will look for item pretty much everywhere: in the items' name, in their arguments if they have any and in the returned type (if there is any).

item(arg)

It'll now look for item only in item's name and will look for arg only in arguments.

And so this:

(arg)

Will look for arg only in arguments. This is what I meant here.

@GuillaumeGomez
Copy link
Member Author

I really wish you'd talked this over with the team before adding lots more custom syntax. I was under the impression you were fixing bugs, not adding entirely new features.

Except I didn't add any new syntax in here... All this was already supported before in very hackish ways. Test it on the current docs and you'll see. I just wrote a proper parser over the already existing syntax.

@jyn514
Copy link
Member

jyn514 commented Nov 6, 2021

Except I didn't add any new syntax in here... All this was already supported before in very hackish ways.

... this seems pretty clearly not true? Rustdoc doesn't currently support double-quotes to mean "exact match", nor the tuple (u8, Vec) syntax, nor whatever the macro<f>:foo syntax is doing.

@GuillaumeGomez
Copy link
Member Author

Oh you're right: it doesn't support specifying arguments directly (with ()). Sorry, I was really sure it already did...

However, it supports double quotes, it supports name<generic> and it supports ->.

@camelid
Copy link
Member

camelid commented Nov 6, 2021

I have neither the time nor the knowledge of rustdoc's JS to review this, but I can give some feedback on the UI :)

@jyn514 I'm assigning to you for now since you've started reviewing, but feel free to re-assign. r? @jyn514

@rust-highfive rust-highfive assigned jyn514 and unassigned camelid Nov 6, 2021
@GuillaumeGomez
Copy link
Member Author

@camelid Then please tell what you think of the UI. ;)

@camelid
Copy link
Member

camelid commented Nov 6, 2021

To allow users to see how the queries are "understood" by the query engine, I added the following:

I really don't think we should be adding this. It exposes way too much of rustdoc's internals, and it'll likely confuse users. Instead, we should try to make rustdoc's search "Just Work" the way users expect.

@camelid
Copy link
Member

camelid commented Nov 6, 2021

Also, the search doesn't seem to be working? String -> PathBuf doesn't work.

@GuillaumeGomez
Copy link
Member Author

Also, the search doesn't seem to be working? String -> PathBuf doesn't work.

Indeed... If we keep this approach, I'll check what's wrong.

I really don't think we should be adding this. It exposes way too much of rustdoc's internals, and it'll likely confuse users. Instead, we should try to make rustdoc's search "Just Work" the way users expect.

This the big question: what do users expect? Maybe the approach I took here to allow to have more advanced check isn't the right way. Maybe we should simplify this as follows: generating the items "signature" and then simply run levenshtein on all of them and sort the result. But that approach would have so many downsides as well.

Or did you simply suggest to not display this?

@camelid
Copy link
Member

camelid commented Nov 6, 2021

Or did you simply suggest to not display this?

Yes, that's what I suggest :)

@GuillaumeGomez
Copy link
Member Author

Fine by me!

src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved
src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved
src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved
src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved
src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved
src/librustdoc/html/static/js/search.js Show resolved Hide resolved
src/librustdoc/html/static/js/search.js Outdated Show resolved Hide resolved
@jsha
Copy link
Contributor

jsha commented Nov 7, 2021

Trying to understand quotes for exact match better. On current nightly, if you search for str, you'll get a lot of results, including substring matches like test::test::TestExecTime::to_string, test::test::TestResult, std::ffi::CStr, and std::ffi::OsStr:

https://doc.rust-lang.org/nightly/settings.html?search=str

If you add quotation marks, you'll only get matches where the last component of the path matches "str" case-insensitively: str (primitive type), alloc::str, core::str, std::str:

https://doc.rust-lang.org/nightly/settings.html?search=%22str%22

That seems like pretty reasonable behavior. Am I right in guessing that we use quotes for this exact match meaning because that's how Google indicates exact match? https://support.google.com/websearch/answer/2466433?hl=en. Google's use of quotes for exact match originates in text search: normally a search for [foo bar] means "return all documents that have foo somewhere and bar somewhere." But a search for ["foo bar"] means "return all documents that have foo immediately before bar". As stemming and other approximate-match searches became more prominent, it became useful for quotes to also mean "turn off stemming (etc)". So a search for ["foo"] means "return all documents that have foo, and foos, fooing, etc don't count."

It's useful to know that history, because our use case for quotes is different: we never have multi-word searches (so far), and we don't do stemming. We do do sub-token matching, which is not something traditional search engines usually do because it's fairly expensive. For instance str can match TestResult.

It looks like this PR intends to allow quotes for exact match in all positions. For instance, I could search for Vec<"&str">. That would find all types that include somewhere in their signature a Vec<&str>, or superstrings of Vec like Moravec<&str>, but not superstrings of str like Vec<&TestResult>.

That seems like overkill, feature-wise. Instead, we should say:

  • For simple searches containing only letters, we do substring matching (and levenshtein matching, which we do today).
  • However, as soon as the query get any more complicated, like searching for function signatures or including generics, we do exact match on each token.
  • For simple searches, substring matching can be overridden by putting a quote in the first position and in the last position.
  • Quotes are disallowed in any other position.

By the way, other reviewers: you may notice that searches on the demo link are slower to load than on doc.rust-lang.org. I believe that's because I haven't set up automatic compression on rustdoc.crud.net, so the search JS is larger than it would be in real life. I'll try to fix that later this week.

@jyn514 jyn514 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 8, 2021
@GuillaumeGomez
Copy link
Member Author

Updated (with the correct handling of whitespaces for type filter this time...).

@rust-log-analyzer

This comment has been minimized.

@notriddle
Copy link
Contributor

--- rustdoc.json	
+++ rustdoc-search-ref.json	
@@ -3,7 +3,7 @@
   "foundElems": 0,
   "original": "mod: :",
   "returned": [],
-  "typeFilter": 0,
+  "typeFilter": -1,
   "userQuery": "mod: :",
-  "error": null
+  "error": "ERROR"
 }
\ No newline at end of file
Error: "FAIL"
Query: mod: :

@GuillaumeGomez
Copy link
Member Author

And fixed this one as well.

@notriddle
Copy link
Contributor

Well, all four-token checks pass when run against the fuzzer. They seem to be parsing the same language now!

@GuillaumeGomez
Copy link
Member Author

Awesome! Thanks a lot for writing the eBNF fuzzer!

@notriddle
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Apr 20, 2022

📌 Commit 4d26bde has been approved by notriddle

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 20, 2022
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 21, 2022
Rollup of 5 pull requests

Successful merges:

 - rust-lang#90630 (Create real parser for search queries)
 - rust-lang#96193 ([fuchsia] Add implementation for `current_exe`)
 - rust-lang#96196 (Remove assertion that all paths in `ShouldRun` exist)
 - rust-lang#96228 (Fix locations for intrinsics impls and change to links)
 - rust-lang#96236 (Add an explicit `Span` field to `OutlivesConstraint`)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 976c6b2 into rust-lang:master Apr 21, 2022
@rustbot rustbot added this to the 1.62.0 milestone Apr 21, 2022
@GuillaumeGomez GuillaumeGomez deleted the improve-rustdoc-search branch March 1, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-js Area: Rustdoc's front-end A-rustdoc-search Area: Rustdoc's search feature A-type-based-search Area: Searching rustdoc pages using type signatures S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.