Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a check for swapped words when we can't find an identifier #67849

Merged
merged 3 commits into from
Jan 8, 2020

Conversation

cjkenn
Copy link
Contributor

@cjkenn cjkenn commented Jan 4, 2020

Fixes #66968

Couple things here:

  1. The matches take the precedence of case insensitive match, then levenshtein match, then swapped words match. Doing this allows us to not even check for swapped words unless the other checks return None.
  2. I've assumed that the swapped words check is not held to the limits of the max levenshtein distance threshold (ie. we want to try and find a match even if the levenshtein distance is very high). This means that we cannot perform this check in the fold that occurs after the filter_map call, because the candidate will be filtered out. So, I've split this into two separate fold calls, and had to collect the original iterator into a vec so it can be copied (I don't think we want to change the function signature to take a vec or require the Copy trait). An alternative implemenation may be to remove the filter_map, fold over the entire iterator, and do a check against max_dist inside the relevant cases there.

r? @estebank

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 4, 2020
@rust-highfive

This comment has been minimized.

@Centril
Copy link
Contributor

Centril commented Jan 4, 2020

Please add some regression tests. :)

Copy link
Contributor

@estebank estebank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a src/test/ui/suggestions/ test.

Really happy to see this!

@@ -52,14 +52,16 @@ where
T: Iterator<Item = &'a Symbol>,
{
let max_dist = dist.map_or_else(|| cmp::max(lookup.len(), 3) / 3, |d| d);
let name_vec: Vec<&Symbol> = iter_names.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid this collect by cloning the iterator instead and passing that to find_match_by_sorted_words, if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure how to do that. If we call cloned(), iter_names is moved and we can't use that iterator again, and we still can't use the cloned one twice. Is there another way to clone it so that we can still use the original iter_names afterwards?

@the8472
Copy link
Member

the8472 commented Jan 4, 2020

Similar to #66849, wouldn't it make sense to switch from the levenshtein distance to damerau-levenshtein or something similar that considers whole-word transpositions a distance of 1 instead of word-length.

Adding more and more alternative algorithms means that the benefit of each is mutually exclusive. Folding the operations into a single algorithm allows the detection of two or more different kinds of misspellings.

@estebank
Copy link
Contributor

estebank commented Jan 8, 2020

@the8472 I think that would make sense as follow up work to this.

@bors r+

@bors
Copy link
Contributor

bors commented Jan 8, 2020

📌 Commit e01e8b9 has been approved by estebank

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 8, 2020

fn find_match_by_sorted_words<'a>(iter_names: Vec<&'a Symbol>, lookup: &str) -> Option<Symbol> {
iter_names.iter().fold(None, |result, candidate| {
if sort_by_words(&candidate.as_str()) == sort_by_words(lookup) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you're only doing comparison on the sorted elements...

fn sort_by_words(name: &str) -> String {
let mut split_words: Vec<&str> = name.split('_').collect();
split_words.sort();
split_words.join("_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I wonder if it wouldn't be more efficient to return a Vec<&str> here.

That being said, this is only done in a case where we're already emitting an error, so it should be fine.

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Jan 8, 2020
Add a check for swapped words when we can't find an identifier

Fixes rust-lang#66968

Couple things here:
1. The matches take the precedence of case insensitive match, then levenshtein match, then swapped words match. Doing this allows us to not even check for swapped words unless the other checks return `None`.
2. I've assumed that the swapped words check is not held to the limits of the max levenshtein distance threshold (ie. we want to try and find a match even if the levenshtein distance is very high). This means that we cannot perform this check in the `fold` that occurs after the `filter_map` call, because the candidate will be filtered out. So, I've split this into two separate `fold` calls, and had to collect the original iterator into a vec so it can be copied (I don't think we want to change the function signature to take a vec or require the `Copy` trait). An alternative implemenation may be to remove the `filter_map`, `fold` over the entire iterator, and do a check against `max_dist` inside the relevant cases there.

r? @estebank
bors added a commit that referenced this pull request Jan 8, 2020
Rollup of 10 pull requests

Successful merges:

 - #67774 (Try statx for all linux-gnu target.)
 - #67781 (Move `is_min_const_fn` query to librustc_mir.)
 - #67798 (Remove wrong advice about spin locks from `spin_loop_hint` docs)
 - #67849 (Add a check for swapped words when we can't find an identifier)
 - #67875 (Distinguish between private items and hidden items in rustdoc)
 - #67887 (`Option::{expect,unwrap}` and `Result::{expect, expect_err, unwrap, unwrap_err}` have `#[track_caller]`)
 - #67955 (rustdoc: Remove more `#[doc(cfg(..))]` duplicates)
 - #67977 (Updates for VxWorks)
 - #67985 (Remove insignificant notes from CStr documentation)
 - #68003 (ci: fix wrong shared.sh import for publish_toolstate)

Failed merges:

 - #67820 (Parse the syntax described in RFC 2632)
 - #67979 (Move `intravisit` => `rustc_hir` + misc cleanup)

r? @ghost
bors added a commit that referenced this pull request Jan 8, 2020
Rollup of 10 pull requests

Successful merges:

 - #67774 (Try statx for all linux-gnu target.)
 - #67781 (Move `is_min_const_fn` query to librustc_mir.)
 - #67798 (Remove wrong advice about spin locks from `spin_loop_hint` docs)
 - #67849 (Add a check for swapped words when we can't find an identifier)
 - #67875 (Distinguish between private items and hidden items in rustdoc)
 - #67887 (`Option::{expect,unwrap}` and `Result::{expect, expect_err, unwrap, unwrap_err}` have `#[track_caller]`)
 - #67955 (rustdoc: Remove more `#[doc(cfg(..))]` duplicates)
 - #67977 (Updates for VxWorks)
 - #67985 (Remove insignificant notes from CStr documentation)
 - #68003 (ci: fix wrong shared.sh import for publish_toolstate)

Failed merges:

 - #67820 (Parse the syntax described in RFC 2632)
 - #67979 (Move `intravisit` => `rustc_hir` + misc cleanup)

r? @ghost
@bors bors merged commit e01e8b9 into rust-lang:master Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On identifier not found, detect swapped words
6 participants