Add $path modifier of Adguard or any equivalents of this #1690

Yuki2718 · 2021-08-21T07:20:45Z

Prerequisites

I verified that this is not a filter issue (MUST be reported at filter issue tracker)
This is not a support issue or a question
I performed a cursory search of the issue tracker to avoid opening a duplicate issue
The issue is not present after wholly disabling uBlock Origin ("uBO") in the browser
I checked the documentation to understand that the issue I report is not a normal behavior

I tried to reproduce the issue when...

uBO is the only extension
uBO with default lists/settings
using a new, unmodified browser profile

Description

AdguardTeam/CoreLibs#124 (comment)
The modifier is useful at least on Japanese sites and blog.livedoor.jp is the best example where thousands of different structure of pages share the same domain. This resulted in tons of verbose filters in my personal lists to avoid false positive, otherwise a filter working for a page breaks another page. An actual case reported to me (https://jbbs.shitaraba.net/bbs/read.cgi/internet/25463/1598352715/206) is that blog.livedoor.jp###containerWrap > #container > .blog-title-outer + #content.hfeed made for http://blog.livedoor.jp/sexykpopidol/ breaks http://blog.livedoor.jp/kknao81/. Currently for this very reason I ignore some ad-placeholder on blog.livedoor.jp in AdGuard Japanese, which can fully be resolved if the modifier becomes available.

(Not worth your time but in https://blog.livedoor.com/ranking/ you can pick many blog.livedoor.jp sites with different structure each.)

A specific URL where the issue occurs

blog.livedoor.jp

Steps to Reproduce

NA

Expected behavior

NA

Actual behavior

NA

uBlock Origin version

1.37.2

Browser name and version

Irrelevant

Operating System and version

Irrelevant

The text was updated successfully, but these errors were encountered:

uBlock-user · 2021-08-21T07:37:58Z

We wouldn't need to use $shide to partially disable cosmetic filtering on google shopping pages(uBlockOrigin/uAssets#9642, uBlockOrigin/uAssets#9694) if we could use $path to narrow blocking google product ads to google search engine pages only.

Likewise for bing ?

gorhill · 2021-08-21T12:04:39Z

An actual case reported to me

This works now without having to further complicate the code base and neither have to wait:

blog.livedoor.jp##link[rel="alternate"][href*="sexykpopidol"]:upward(html) #containerWrap > #container > .blog-title-outer + #content.hfeed

Yuki2718 · 2021-08-21T12:14:45Z

@gorhill I already fixed the issue and the point is not that I can't address them within the current arsenal. I can fix each case, but every time the rules become more complex and verbose.

Yuki2718 · 2021-08-21T12:25:48Z

But okay it seems link[rel="alternate"][href*=...]:upward(html) can always be used in case of blog.livedoor.jp. Since in the current uBO procedural engine will only be triggered if the first matching node exists, it can hardly be a matter. How about another point made by @uBlock-user ?

gorhill · 2021-08-21T12:34:04Z

I could add a procedural operator like so:

blog.livedoor.jp##:match-path(/sexykpopidol) #containerWrap > #container > .blog-title-outer + #content.hfeed

This way implementation becomes quite trivial without having to further complicate the code base for all cosmetic filters. The disadvantage is that this is a procedural cosmetic filters, but it's opened to optimisation, i.e. a filtering engine could have special treatment for such filter in order to have them injected early along with other plain CSS selector-based filters.

Also an advantage is that we just keep reusing the same now familiar operator-based approach, which was the entire point of these operators, that is easy to extend with little to no impact on codebase complexity.

To calrify "code complexity", the suggested [path=...] approach by AdGuard adds quite more complexity to the parsing/storing/lookup of cosmetic filters, while a mere new procedural operator requires no refactoring, procedural operators were designed to naturally be extensible with no added widespread complication.

uBlock-user · 2021-08-21T12:38:26Z

Would it be possible to have whitelist approach with :match-path ? I would like simply skip a path such as /shop at google/bing.

gorhill · 2021-08-21T12:39:35Z

If argument to :match-path() can be a regex, than you can just use a negative lookahead?

Or even :not(:match-path(...)) ... should work I gather?

Yuki2718 · 2021-08-21T12:43:04Z

Sounds great! The reason AG use [path=...] is that they already use a similar [app=...] but probably they can transform [path=...] to :match-path() just like they do for other filters, so I have no problem.

uBlock-user · 2021-08-21T12:44:27Z

:not(:match-path(/shop)) would do.

gorhill · 2021-08-21T12:50:16Z

Trying to come out with a name I won't feel like changing in the future, please advise. Should it be:

:matches-path(...)
:match-path(...)
:path-has-text(...)
?

Keep in mind this is the list of supported operators for now: https://github.com/gorhill/uBlock/blob/b44d9219c321cf42379b1b7e8711728321ef0a16/src/js/static-filtering-parser.js#L1818-L1838. Consistency is important.

uBlock-user · 2021-08-21T12:57:41Z

Third can be changed to :has-path() as :path-has-text() sounds un-necessarilly longer.

Yuki2718 · 2021-08-21T13:04:30Z

Trying to come out with a name I won't feel like changing in the future, please advise. Should it be:

:matches-path(...)

:match-path(...)

:path-has-text(...)

?

Consistency is important.

I'm not English-native but it appears match-path is inconsistent with matches-css.

gorhill · 2021-08-21T13:07:40Z

I am also not English-native, and I wonder whether :matches-css() was the right choice, so I wouldn't want to repeat the same mistake, and if this was a mistake, I can always deprecate :matches-css() and instead make :match-css() official. I often make mistakes with those trailing s in English, so I am not a reference on when to properly use these.

uBlock-user · 2021-08-21T13:09:18Z

Both are fine, no need to change anything regarding :matches-css()

Related issue: - uBlockOrigin/uBlock-issues#1690 New procedural operator: `:matches-path(...)` Description: this is a all-or-nothing passthrough operator, which on/off behavior is dictated by whether the argument match the path of the current location. The argument can be either plain text to be found at any position in the path, or a literal regex against which the path is tested. Whereas cosmetic filters can be made specific to whole domain, the new `:matches-path()` operator allows to further narrow the specificity according to the path of the current document lcoation. Typically this procedural operator is used as first operator in a procedural cosmetic filter, so as to ensure that no further matching work is performed should there be no match against the current path of the current document location. Example of usage: example.com##:matches-path(/shop) p Will hide all `p` elements when visiting `https://example.com/shop/stuff`, but not when visiting `https://example.com/` or any other page on `example.com` which has no instance of `/shop` in the path part of the URL.

gorhill · 2021-08-21T14:09:12Z

:matches-path() will be in next dev build, 1.37.3b12.

If you look at the commit, you can see the required changes to the code were very little (even with an opportunistic unrelated change in there), meaning risk of regression is virtually zero -- so I far prefer this approach than supporting a new spec which would require a whole lot of work and assuming I would be fine with all the specs, I would work on this only in a far, undetermined future, with a lot of potential regression (including in performance) as a result.

Related issue: - uBlockOrigin/uBlock-issues#1690

uBlock-user · 2021-08-21T15:30:14Z

@gorhill on https://www.google.com/search?q=Under+Armour+Sportstyle+Logo+Tank&newwindow=1&sxsrf=ALeKk012vqIPWbQDpDef2IwbreyIoADXPw:1627344963115&source=lnms&tbm=shop&sa=X&ved=2ahUKEwin-s-z_IHyAhVPUxUIHW_cCCQQ_AUoAXoECBMQAw&biw=2560&bih=1297

Filter www.google.*##:not(:matches-path(/\/search\?q\=\*tbm=shop|\/shopping/)) [href^="/aclk"][href$="adurl="]:not(> *:upward([class^="sh-sr__shop-result-group"])):upward(3) gets applied even though we're excluding /search?q=*tbm=shop from the path via :not.

To repro add the following filters --

www.google.*##:not(:matches-path(/\/search\?q\=\*tbm=shop|\/shopping/)) [href^="/aclk"][href$="adurl="]:not(>  *:upward([class^="sh-sr__shop-result-group"])):upward(3)
www.google.*#@#[href^="/aclk"][href$="adurl="]:not(:scope > :upward([class^="sh-sr__shop-result-group"])):upward(3)
@@||www.google.*/search?q=*tbm=shop$shide,badfilter

and then visit https://www.google.com/search?q=Under+Armour+Sportstyle+Logo+Tank&newwindow=1&sxsrf=ALeKk012vqIPWbQDpDef2IwbreyIoADXPw:1627344963115&source=lnms&tbm=shop&sa=X&ved=2ahUKEwin-s-z_IHyAhVPUxUIHW_cCCQQ_AUoAXoECBMQAw&biw=2560&bih=1297

Am I falling short somewhere ?

gorhill · 2021-08-21T15:41:57Z

I missed this comment:

I guess we'd better cover path+query right away and not wait for it to be requested from us.

So I went with path only, but I will modify to match against path + query.

Related feedback: - uBlockOrigin/uBlock-issues#1690 (comment)

uBlock-user · 2021-08-21T16:09:57Z

Still happens on 1.37.3b13 build as per the above repro...

gorhill · 2021-08-21T16:24:43Z

You regex is:

/\/search\?q\=\*tbm=shop|\/shopping/

You are escaping the *, meaning you are trying to match a literal asterisk in the URL, while there is no such character. Is this more what you intended?

/^/search\?q=.*?tbm=shop|^/shopping/

uBlock-user · 2021-08-21T16:28:25Z

You are escaping the *, meaning you are trying to match a literal asterisk in the URL,

Yeah, need to correct that.

tbm=shop is not always accompanied with ? sometimes it's & btw...

Thanks, works now as expected.

gorhill · 2021-08-21T16:37:53Z

tbm=shop is not always accompanied with ? sometimes it's & btw...

The ? in q=.*?tbm=shop is not a literal ?, .*? means non-greedy match to make regex faster.

uBlock-user · 2021-08-21T16:45:30Z

Yeah, I realised it after I commented, not used to crafting regexes everyday, thanks for helping out.

gorhill · 2022-06-03T16:51:08Z

this won't work

You are using invalid syntax.

subdomain.example.com should be unaffected

Exclude subdomain.example.com:

example.com,~subdomain.example.com##...

uBlock-user added the enhancement New feature or request label Aug 21, 2021

gorhill added a commit to gorhill/uBlock that referenced this issue Aug 21, 2021

Fix normalization of matches-path operator

c0c70e2

Related issue: - uBlockOrigin/uBlock-issues#1690

gorhill added a commit to gorhill/uBlock that referenced this issue Aug 21, 2021

Include query string when evaluating matches-path()

9d4006f

Related feedback: - uBlockOrigin/uBlock-issues#1690 (comment)

uBlock-user closed this as completed Aug 21, 2021

uBlock-user added the fixed issue has been addressed label Aug 21, 2021

Yuki2718 mentioned this issue Aug 23, 2021

Convert $path to :matches-path() for uBlock version of lists AdguardTeam/FiltersCompiler#117

Closed

Yuki2718 mentioned this issue Sep 7, 2021

blog.livedoor.jp AdguardTeam/AdguardFilters#93604

Closed

Yuki2718 referenced this issue in AdguardTeam/AdguardFilters Dec 14, 2022

Fix #137364 blog.livedoor.jp

18024a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add $path modifier of Adguard or any equivalents of this #1690

Add $path modifier of Adguard or any equivalents of this #1690

Yuki2718 commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021

Yuki2718 commented Aug 21, 2021

Yuki2718 commented Aug 21, 2021

gorhill commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 •

edited

Loading

Yuki2718 commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021

Yuki2718 commented Aug 21, 2021

gorhill commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Jun 3, 2022

Add $path modifier of Adguard or any equivalents of this #1690

Add $path modifier of Adguard or any equivalents of this #1690

Comments

Yuki2718 commented Aug 21, 2021 • edited Loading

Prerequisites

I tried to reproduce the issue when...

Description

A specific URL where the issue occurs

Steps to Reproduce

Expected behavior

Actual behavior

uBlock Origin version

Browser name and version

Operating System and version

uBlock-user commented Aug 21, 2021 • edited Loading

gorhill commented Aug 21, 2021

Yuki2718 commented Aug 21, 2021

Yuki2718 commented Aug 21, 2021

gorhill commented Aug 21, 2021 • edited Loading

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 • edited Loading

Yuki2718 commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 • edited Loading

uBlock-user commented Aug 21, 2021

Yuki2718 commented Aug 21, 2021

gorhill commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 • edited Loading

uBlock-user commented Aug 21, 2021 • edited Loading

gorhill commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Aug 21, 2021 • edited Loading

uBlock-user commented Aug 21, 2021 • edited Loading

gorhill commented Aug 21, 2021

uBlock-user commented Aug 21, 2021

gorhill commented Jun 3, 2022

Yuki2718 commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021 •

edited

Loading

gorhill commented Aug 21, 2021 •

edited

Loading

uBlock-user commented Aug 21, 2021 •

edited

Loading