Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(YouTube - Keyword filter): Add syntax to match whole keywords and not substrings #3592

Conversation

LisoUseInAIKyrios
Copy link
Contributor

@LisoUseInAIKyrios LisoUseInAIKyrios commented Aug 28, 2024

Adds syntax to match keywords by word boundaries.
This syntax now allows short phrases like ai that previously could not be used.

How to use:

Surround any keyword or phrase with double quotes, such as:

Keyword: "ai"

Example videos that will be hidden:

  • Is AI bad?
  • Are you getting on AI's hype train?
  • New #ai1337 model release!

But unrelated videos will not be hidden:

  • Guide to DMCA fair use
  • What air quality means to you
     

Keyword: "vice news"

Example videos that will be hidden:

  • The bankruptcy of Vice News

But unrelated videos will not be hidden:

  • Today's Advice Newscast

 
If you have keywords in languages that do not use spaces between words (Chinese, Japanese, Burmese, etc), then using whole word syntax is not recommended. But for all other languages it is recommended whenever possible since the filtering is much more accurate.

Using quotes will interfere with most pluralized words, as "fox" will no longer match foxes, so multiple variations may be needed for some words. But punctuation and other non letters are ignored, so "fox" will match and hide fox's, fox?, and #fox123.

Integration changes

@oSumAtrIX
Copy link
Member

The user may not intuitively understand this behavior. The description should mention that only entire words are matched. Perhaps it is possible to add a toggle to control whether to match words or also substrings

@LisoUseInAIKyrios
Copy link
Contributor Author

LisoUseInAIKyrios commented Aug 28, 2024

Yes the description should mention that both singular and plural keywords can be added.

I thought about adding a toggle to enable/disable whole word matching. But if toggled off then it's back to the old 3 character minimum length, and the under 3 character toast or the settings description would be a lot more complicated. Or remove the character limit even when whole word matching is off, but that will likely hide way too much stuff without the user realizing it.

I think requiring plurals might be the simplest, and a side effect of that is always using the more accurate keyword matching.

…erence into two sections to reduce text clipping of translations
@LisoUseInAIKyrios
Copy link
Contributor Author

I realized that whole word matching will act weird or fail when used with Chinese and other pictograph languages, since spacing around words is not common.

So whole word matching should be an on/off toggle.

@LisoUseInAIKyrios LisoUseInAIKyrios changed the title feat(YouTube - Keyword filter): Remove keyword minimum length. Match whole keywords and not substrings feat(YouTube - Keyword filter): whole keywords and not substrings Aug 29, 2024
@LisoUseInAIKyrios LisoUseInAIKyrios changed the title feat(YouTube - Keyword filter): whole keywords and not substrings feat(YouTube - Keyword filter): Add option to match whole keywords and not substrings Aug 29, 2024
@KobeW50
Copy link
Contributor

KobeW50 commented Aug 29, 2024

Maybe whole word matching can be set with an operator, such as %, similar to how the $ and ^ operators were used in #2682

This will allow users to specify which words should only be filtered by whole-word matching and which should be filtered through the general matching scheme. This solves the issue with pictographic languages, and makes it so users don't need to add the plural version of words for words that they don't use whole-word matching on.

Example:

%ai (will only filter "ai" and not a word such as "rain")
robot (will filter "robot" and "robots")

@LisoUseInAIKyrios
Copy link
Contributor Author

LisoUseInAIKyrios commented Aug 29, 2024

@KobeW50 Yeah that sounds interesting.

Would need to use syntax that nobody would ever want as part of a keyword itself (or would need to support escaping the syntax).

How to apply to a multiword phrase and not a single word, like ai show where you don't want to hide unrelated stuff like Thai showbiz?
 

Edit: The more complex quoting scheme below was not pursued, and a simple outer quote only was used.

Maybe use quotes as that is easy to understand and it implies a literal meaning. It also would allow matching each side of a multi-word phrase:

"ai" hides only ai.
It does not match any substring of another word.

"ai show" hides only ai show.
It does not match fail show, ai showbiz, or anything else.

"ai" show hides ai show, ai shows, ai showbiz, etc.
But does not match fail show.

ai "show" hides fail show, Thai show, etc.
But not match ai showbiz.

Using quotes that are not part of the start/end of a phrase would be ignored since it effectively would do nothing.
 

Or just keep it really simple and only support quotes around the entire word/phrase. That might be the simplest and maybe all that's needed. Only allowing quotes around the entire word/phrase also removes the issue of allowing quotes for whatever reason, because then "I love "air quotes"" would allow matching the exact string of I love "air quotes"

@LisoUseInAIKyrios
Copy link
Contributor Author

I'll give the quotes idea a go and see how that works.

@oSumAtrIX
Copy link
Member

oSumAtrIX commented Aug 29, 2024

You can employ escaping in order to allow the user to use quotes as the string to filter.

Escaping " with \ could simply be treated as part of the string to filter.
In ReVanced CLI dev branch I have written a parser (see the test package) to convert strings into any primitive type and lists of these types. There i used " to convert integers as literal strings for example but using \ i can escape the " and count that as part of the value.

@LisoUseInAIKyrios
Copy link
Contributor Author

I will try using quotes around an entire word/phrase, as I think the selective whole word matching is too complex and not needed.

If quotes on the outside are the only valid place for syntax, then there is no need for escaping because plain quotes anywhere else are left as-is and treated as part of the keyword/phrase.

@LisoUseInAIKyrios LisoUseInAIKyrios changed the title feat(YouTube - Keyword filter): Add option to match whole keywords and not substrings feat(YouTube - Keyword filter): Add syntax to match whole keywords and not substrings Aug 29, 2024
@LisoUseInAIKyrios LisoUseInAIKyrios merged commit f5fb351 into ReVanced:dev Aug 30, 2024
2 checks passed
@LisoUseInAIKyrios LisoUseInAIKyrios deleted the feat/keyword_filters_whole_words branch August 30, 2024 21:39
revanced-bot pushed a commit that referenced this pull request Aug 30, 2024
# [4.14.0-dev.3](v4.14.0-dev.2...v4.14.0-dev.3) (2024-08-30)

### Features

* **YouTube - Keyword filter:** Add syntax to match whole keywords and not substrings ([#3592](#3592)) ([f5fb351](f5fb351))
E85Addict pushed a commit to E85Addict/revanced-patches that referenced this pull request Sep 9, 2024
# [4.14.0-dev.1](v4.13.3...v4.14.0-dev.1) (2024-09-09)

### Bug Fixes

* **Pixiv - Hide ads:** Fix for latest version ([ReVanced#3616](https://github.com/E85Addict/revanced-patches/issues/3616)) ([98956e8](98956e8))
* **SwissID:** Rename `Remove Google Play Integrity Integrity check` to `Remove Google Play Integrity check` ([ReVanced#3558](https://github.com/E85Addict/revanced-patches/issues/3558)) ([0f5a771](0f5a771))
* **YouTube - ReturnYouTubeDislike:** Show estimated like count for videos with hidden likes ([ReVanced#3601](https://github.com/E85Addict/revanced-patches/issues/3601)) ([005be82](005be82))
* **YouTube - SponsorBlock:** Handle if the user enters an invalid number into any SB settings ([37b3dd1](37b3dd1))

### Features

* Add `Change data directory location` patch ([ReVanced#3602](https://github.com/E85Addict/revanced-patches/issues/3602)) ([5998029](5998029))
* Add `Check environment` patch ([ReVanced#3610](https://github.com/E85Addict/revanced-patches/issues/3610)) ([fbcbdaf](fbcbdaf))
* **Duolingo:** Add `Disable ads` and `Enable debug menu` patch ([ReVanced#3422](https://github.com/E85Addict/revanced-patches/issues/3422)) ([d0a8599](d0a8599))
* **YouTube - Keyword filter:** Add syntax to match whole keywords and not substrings ([ReVanced#3592](https://github.com/E85Addict/revanced-patches/issues/3592)) ([f5fb351](f5fb351))
* **YouTube - Spoof client:** Allow forcing AVC codec with iOS ([ReVanced#3570](https://github.com/E85Addict/revanced-patches/issues/3570)) ([1a49d1f](1a49d1f))
* **YouTube:** Support versions 19.17 thru 19.30 ([a69c4f3](a69c4f3))
revanced-bot pushed a commit that referenced this pull request Sep 18, 2024
# [4.14.0](v4.13.3...v4.14.0) (2024-09-18)

### Bug Fixes

* **Pixiv - Hide ads:** Fix for latest version ([#3616](#3616)) ([98956e8](98956e8))
* **Soundcloud - Hide ads:** Support latest version ([#3628](#3628)) ([66e7e33](66e7e33))
* **SwissID:** Rename `Remove Google Play Integrity Integrity check` to `Remove Google Play Integrity check` ([#3558](#3558)) ([0f5a771](0f5a771))
* **YouTube - ReturnYouTubeDislike:** Show estimated like count for videos with hidden likes ([#3601](#3601)) ([005be82](005be82))
* **YouTube - SponsorBlock:** Add summary text to 'view my segments' button ([df80b9f](df80b9f))
* **YouTube - SponsorBlock:** Handle if the user enters an invalid number into any SB settings ([37b3dd1](37b3dd1))
* **YouTube:** Fix issues related to playback by replace streaming data ([#3582](#3582)) ([dfa94d7](dfa94d7))

### Features

* Add `Change data directory location` patch ([#3602](#3602)) ([5998029](5998029))
* Add `Check environment` patch ([#3610](#3610)) ([fbcbdaf](fbcbdaf))
* **Duolingo:** Add `Disable ads` and `Enable debug menu` patch ([#3422](#3422)) ([d0a8599](d0a8599))
* **Sync for Reddit:** Add `Fix /user/ endpoint` patch ([46d11f3](46d11f3))
* **Sync for Reddit:** Rename patch to `Use /user/ endpoint` ([98ead49](98ead49))
* **YouTube - Hide Shorts components:** Hide 'Use this sound' button ([#3647](#3647)) ([33fc090](33fc090))
* **YouTube - Keyword filter:** Add syntax to match whole keywords and not substrings ([#3592](#3592)) ([f5fb351](f5fb351))
* **YouTube - Spoof client:** Allow forcing AVC codec with iOS ([#3570](#3570)) ([1a49d1f](1a49d1f))
* **YouTube Music:** Make working patches compatible with latest versions ([#3556](#3556)) ([12f6f19](12f6f19))
* **YouTube:** Add donation link to settings about screen ([#3626](#3626)) ([0684ab5](0684ab5))
E85Addict pushed a commit to E85Addict/revanced-patches that referenced this pull request Sep 18, 2024
# [4.14.0](v4.13.3...v4.14.0) (2024-09-18)

### Bug Fixes

* **Pixiv - Hide ads:** Fix for latest version ([ReVanced#3616](https://github.com/E85Addict/revanced-patches/issues/3616)) ([98956e8](98956e8))
* **Soundcloud - Hide ads:** Support latest version ([ReVanced#3628](https://github.com/E85Addict/revanced-patches/issues/3628)) ([66e7e33](66e7e33))
* **SwissID:** Rename `Remove Google Play Integrity Integrity check` to `Remove Google Play Integrity check` ([ReVanced#3558](https://github.com/E85Addict/revanced-patches/issues/3558)) ([0f5a771](0f5a771))
* **YouTube - ReturnYouTubeDislike:** Show estimated like count for videos with hidden likes ([ReVanced#3601](https://github.com/E85Addict/revanced-patches/issues/3601)) ([005be82](005be82))
* **YouTube - SponsorBlock:** Add summary text to 'view my segments' button ([df80b9f](df80b9f))
* **YouTube - SponsorBlock:** Handle if the user enters an invalid number into any SB settings ([37b3dd1](37b3dd1))
* **YouTube:** Fix issues related to playback by replace streaming data ([ReVanced#3582](https://github.com/E85Addict/revanced-patches/issues/3582)) ([dfa94d7](dfa94d7))

### Features

* Add `Change data directory location` patch ([ReVanced#3602](https://github.com/E85Addict/revanced-patches/issues/3602)) ([5998029](5998029))
* Add `Check environment` patch ([ReVanced#3610](https://github.com/E85Addict/revanced-patches/issues/3610)) ([fbcbdaf](fbcbdaf))
* **Duolingo:** Add `Disable ads` and `Enable debug menu` patch ([ReVanced#3422](https://github.com/E85Addict/revanced-patches/issues/3422)) ([d0a8599](d0a8599))
* **Sync for Reddit:** Add `Fix /user/ endpoint` patch ([46d11f3](46d11f3))
* **Sync for Reddit:** Rename patch to `Use /user/ endpoint` ([98ead49](98ead49))
* **YouTube - Hide Shorts components:** Hide 'Use this sound' button ([ReVanced#3647](https://github.com/E85Addict/revanced-patches/issues/3647)) ([33fc090](33fc090))
* **YouTube - Keyword filter:** Add syntax to match whole keywords and not substrings ([ReVanced#3592](https://github.com/E85Addict/revanced-patches/issues/3592)) ([f5fb351](f5fb351))
* **YouTube - Spoof client:** Allow forcing AVC codec with iOS ([ReVanced#3570](https://github.com/E85Addict/revanced-patches/issues/3570)) ([1a49d1f](1a49d1f))
* **YouTube Music:** Make working patches compatible with latest versions ([ReVanced#3556](https://github.com/E85Addict/revanced-patches/issues/3556)) ([12f6f19](12f6f19))
* **YouTube:** Add donation link to settings about screen ([ReVanced#3626](https://github.com/E85Addict/revanced-patches/issues/3626)) ([0684ab5](0684ab5))

### Performance Improvements

* Personal Logo && Add upstream sync ([2e4ef0a](2e4ef0a))
E85Addict pushed a commit to E85Addict/revanced-patches that referenced this pull request Sep 18, 2024
# [4.14.0](v4.13.3...v4.14.0) (2024-09-18)

### Bug Fixes

* **Pixiv - Hide ads:** Fix for latest version ([ReVanced#3616](https://github.com/E85Addict/revanced-patches/issues/3616)) ([98956e8](98956e8))
* **Soundcloud - Hide ads:** Support latest version ([ReVanced#3628](https://github.com/E85Addict/revanced-patches/issues/3628)) ([66e7e33](66e7e33))
* **SwissID:** Rename `Remove Google Play Integrity Integrity check` to `Remove Google Play Integrity check` ([ReVanced#3558](https://github.com/E85Addict/revanced-patches/issues/3558)) ([0f5a771](0f5a771))
* **YouTube - ReturnYouTubeDislike:** Show estimated like count for videos with hidden likes ([ReVanced#3601](https://github.com/E85Addict/revanced-patches/issues/3601)) ([005be82](005be82))
* **YouTube - SponsorBlock:** Add summary text to 'view my segments' button ([df80b9f](df80b9f))
* **YouTube - SponsorBlock:** Handle if the user enters an invalid number into any SB settings ([37b3dd1](37b3dd1))
* **YouTube:** Fix issues related to playback by replace streaming data ([ReVanced#3582](https://github.com/E85Addict/revanced-patches/issues/3582)) ([dfa94d7](dfa94d7))

### Features

* Add `Change data directory location` patch ([ReVanced#3602](https://github.com/E85Addict/revanced-patches/issues/3602)) ([5998029](5998029))
* Add `Check environment` patch ([ReVanced#3610](https://github.com/E85Addict/revanced-patches/issues/3610)) ([fbcbdaf](fbcbdaf))
* **Duolingo:** Add `Disable ads` and `Enable debug menu` patch ([ReVanced#3422](https://github.com/E85Addict/revanced-patches/issues/3422)) ([d0a8599](d0a8599))
* **Sync for Reddit:** Add `Fix /user/ endpoint` patch ([46d11f3](46d11f3))
* **Sync for Reddit:** Rename patch to `Use /user/ endpoint` ([98ead49](98ead49))
* **YouTube - Hide Shorts components:** Hide 'Use this sound' button ([ReVanced#3647](https://github.com/E85Addict/revanced-patches/issues/3647)) ([33fc090](33fc090))
* **YouTube - Keyword filter:** Add syntax to match whole keywords and not substrings ([ReVanced#3592](https://github.com/E85Addict/revanced-patches/issues/3592)) ([f5fb351](f5fb351))
* **YouTube - Spoof client:** Allow forcing AVC codec with iOS ([ReVanced#3570](https://github.com/E85Addict/revanced-patches/issues/3570)) ([1a49d1f](1a49d1f))
* **YouTube Music:** Make working patches compatible with latest versions ([ReVanced#3556](https://github.com/E85Addict/revanced-patches/issues/3556)) ([12f6f19](12f6f19))
* **YouTube:** Add donation link to settings about screen ([ReVanced#3626](https://github.com/E85Addict/revanced-patches/issues/3626)) ([0684ab5](0684ab5))

### Performance Improvements

* Personal Logo && Add upstream sync ([2e4ef0a](2e4ef0a))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(YouTube - Keyword filter): Allow two letter keywords
3 participants