Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regex for uBO (addresses https://github.com/DandelionSprout/adfil… #134630

Closed
wants to merge 1 commit into from

Conversation

Yuki2718
Copy link
Collaborator

@Yuki2718 Yuki2718 commented Nov 11, 2022

…t/discussions/163#discussioncomment-4117315)

Creating the pull request

Please include a summary of the change and which issue is fixed
If the related issue does not exist in our repository, please create it before making pull request
It is highly recommended to use our Web Reporting Tool instead of creating an issue on GitHub directly
Please note, that we verify every pull request manually, so it may take time to apply it

Prerequisites

To avoid invalid pull requests, please check and confirm following terms

  • This is not an ad/bug report;
  • My code follows the guidelines and syntax of this project;
  • I have performed a self-review of my own changes;
  • My changes do not break web sites, apps and files structure.

What problem does the pull request fix?

If the problem does not fall under any category that is listed here, please write a comment below in corresponding section

  • Missed ads or ad leftovers;
  • Website or app doesn't work properly;
  • AdGuard gets detected on a website;
  • Missed analytics or tracker;
  • Social media buttons — share, like, tweet, etc;
  • Annoyances — pop-ups, cookie warnings, etc;
  • Filters maintenance.

What issue is being fixed?

Enter the issue address

DandelionSprout/adfilt#163 (reply in thread)
It turned out uBO requires - to be escaped if it's at the end.

Terms

  • By submitting this issue, I agree that pull request does not contain private info and all conditions are met

@Alex-302 Alex-302 added the A: In progress Work on the issue is in progress label Nov 14, 2022
@Alex-302
Copy link
Member

Our regexes are valid. The most of filters have rules with -] in regexes.

@Yuki2718
Copy link
Collaborator Author

Yuki2718 commented Nov 14, 2022

Then how about NOT_PLATFORM(ext_ublock)? uBF has counterparts of these regex rules anyway.

The most of filters have rules with -] in regexes.

Yeah, found on Annoyances as well and I won't fix them all. Not too serious issue.

@scripthunter7
Copy link
Member

I think that this issue can also occur in the case of many filter lists, since -] is a valid regex, so this PR does not solve the problem itself. Maybe a better solution would be to replace -] with \-] internally before the regexp is passed to the analyzer.

@Yuki2718
Copy link
Collaborator Author

@scripthunter7
Copy link
Member

So move to https://github.com/AdguardTeam/FiltersCompiler ?

I think this issue is related to the uBlock parser:
https://github.com/gorhill/uBlock/blob/master/src/js/static-filtering-parser.js#L26

I guess uBO also uses lists that don't go through the AG's Filters Compiler, and those lists can easily use -] in regexps.

@Yuki2718
Copy link
Collaborator Author

@gorhill

@scripthunter7
Copy link
Member

By the way, here is the problematic part in the analyzer library:

https://github.com/foo123/RegexAnalyzer/blob/807f106914ed8b650d792d7a158f63040fd579c3/src/js/Regex.js#L1079-L1099

When line 1094 checks whether the currently iterated character is a hyphen, you need a lookahead in which you check whether the next character sequence is whitespaces + closing bracket.

But this seems like a rather outdated / low documented / low tested library, I think there are much more modern ones.

@scripthunter7
Copy link
Member

Without any warranty, something like this: https://github.com/scripthunter7/RegexAnalyzer/blob/master/src/js/Regex.js

@gorhill
Copy link

gorhill commented Nov 14, 2022

We discussed this a long time ago, I said I would fix this eventually, we should not bother AdGuard's repo just because I take long to fix stuff which is one thing among all the rest I need to work on.

@Yuki2718 Yuki2718 closed this Nov 14, 2022
@Yuki2718 Yuki2718 deleted the fix_regex_for_uBO branch November 14, 2022 13:55
gorhill added a commit to gorhill/uBlock that referenced this pull request Nov 17, 2022
Fixed flawed extraction of tokens with optional sequences, i.e.
when quantifier could be zero.
Related issue:
- uBlockOrigin/uBlock-issues#2367

Ignore look-around sequences as suggested when normalizing into
tokenizable string.
Related issue:
- uBlockOrigin/uBlock-issues#2368

Fix regex analyzer throwing with trailing `-` in character
class sequence.
Related issue:
- AdguardTeam/AdguardFilters#134630
@danny0838
Copy link

danny0838 commented Nov 17, 2022

@gorhill About the commit (I cannot leave a comment in the commit page, seems to be an issue of GitHub?):

I wonder if modifying the regex.js library without populating to the upstream is a good idea. Also this doesn't seem to deal with the [-abc] case.

I would recommend preprocessing regex strings by escaping starting and ending - with \- before sending to the analyzer instead, until the library is fixed.

@krystian3w
Copy link
Contributor

krystian3w commented Nov 21, 2022

(I cannot leave a comment in the commit page, seems to be an issue of GitHub?)

That is limitation to disable open issues in repository of add-on code.

Sometimes someone can turn it off - once I managed to give a reaction to PR and after a few days someone blocked it (undo/new actions was down).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: In progress Work on the issue is in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants