Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an ability to apply cosmetic rules to specific URLs only #124

Closed
ameshkov opened this issue Nov 28, 2017 · 23 comments
Closed

Add an ability to apply cosmetic rules to specific URLs only #124

ameshkov opened this issue Nov 28, 2017 · 23 comments

Comments

@ameshkov
Copy link
Member

ameshkov commented Nov 28, 2017

Initially, it was discussed here:
AdguardTeam/AdguardForWindows#1992

We need the ability for cosmetic rules to be limited to specific locations or pages on websites.
First of all, for this, we need to be able to specify "modifiers" for cosmetic rules.

For instance, we could do something like this:
[path=/this/is/some/page]example.org##banner

In this example, the path modifier value is basically a basic filtering rule's pattern that accepts the same special characters (||^*) and is matched against the page URL.

What are your thoughts on this?

@sxgunchenko
Copy link

sxgunchenko commented May 24, 2021

Here is a proposal for the path modifier specification:

path

path limits the rule application area to specific locations or pages on websites.

Syntax

path=pattern

, where the pattern is a path mask to which the rule is restricted. Its syntax and behaviour are pretty much the same as the pattern of basic rules. The special characters can also be used (except for ||, as it does not make any sense in this case) (see examples below).

Please note, that the path modifier matches the query string as well.

The path modifier supports regular expressions in the same way the basic rules do.

Examples

  • [$path=page.html]##.textad - hides a div with a class textad at /page.html or /page.html?<query> or /sub/page.html or /another_page.html
  • [$path=/page.html]##.textad - hides a div with a class textad at /page.html or /page.html?<query> or /sub/page.html of any domain but not at /another_page.html
  • [$path=|/page.html]##.textad - hides a div with a class textad at /page.html or /page.html?<query> of any domain but not at /sub/page.html
  • [$path=/page.html|]##.textad - hides a div with a class textad at /page.html or /sub/page.html or /page.html?<query> of any domain
  • [$path=/page*.html]example.com##.textad - hides a div with a class textad at /page1.html or /page2.html or any other path matching /page<...>.html of example.com
  • [$domain=example.com,path=/page.html]##.textad - hides a div with a class textad at page.html of example.com and all subdomains but not at another_page.html
  • [$path=/\\/(sub1|sub2)\\/page\\.html/]##.textad - hides a div with a class textad at both /sub1/page.html and /sub2/page.html of any domain (please, note the escaped special characters from https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters#syntax-8)

@ameshkov
Copy link
Member Author

How would we escape [?

@sxgunchenko
Copy link

Just the same, for example:
[$path=/\\/\[a|b\]\\/page\\.html/]...

@ameshkov
Copy link
Member Author

Questions:

  1. Does url_path include the query string? If it does, we'd better rename it to [uri=/blahblah]. If it does not, path would do.
  2. Does it support wildcards and/or regular expressions?

@sxgunchenko
Copy link

  1. I think the path modifier should not match the query string. If it's ok, I will add a note about it in the spec.

  2. As for regular expressions, they are described in the proposal:

In case, the value has the following form: /<url_path>/, <url_path> is treated as a regular expression.
Examples
...
[$path=/\/(sub1|sub2)\/page\.html/]##.textad

As for wildcards, they can be expressed by regular expressions so I think we may not support them.

@ameshkov
Copy link
Member Author

ameshkov commented Jun 1, 2021

  1. Is it due to some technical limitation? Most likely, at some point, someone will request this feature. Will we extend it with a new query parameter then? Is it better than having a single one?
  2. The lack of wildcards may seem strange to filters devs as they're accustomed to having them in basic rules.

@sxgunchenko
Copy link

  1. No, it's not. I just thought that the query string is usually dynamic, so it makes no sense to match it. But that wouldn't be much of a problem if one used a wildcard or regular expression, so we can match the query string as well.
  2. Ok, I'll add it to the spec.

@ameshkov
Copy link
Member Author

ameshkov commented Jun 1, 2021

  1. If there's no technical limitations, I guess we'd better cover path+query right away and not wait for it to be requested from us.

@sxgunchenko
Copy link

Ok, I've updated the spec #124 (comment)

@ameshkov
Copy link
Member Author

ameshkov commented Jun 1, 2021

@sxgunchenko probably the the modifier should be different then since it matches uri now.

@sfionov
Copy link
Member

sfionov commented Jun 1, 2021

@ameshkov But it is taken entirely from "path" HTTP field :) IMHO "uri" may confuse because it is typically means entire URI including authority, and also any URN like "urn:ietf:rfc:9000" is URI too :)

@ameshkov
Copy link
Member Author

ameshkov commented Jun 1, 2021

@sfionov well, okay then, let it be path after all.

@sxgunchenko
Copy link

After review, the spec #124 (comment) has been updated

@DandelionSprout
Copy link
Member

This would've been a game changer back in 2018, when humans still used Internet Explorer 11 to a small extent. That way I could've used entries like [$app=iexplore.exe]vg.no##.ExampleEntry. But appealing to IE setups has become an effort with diminishing returns in 2021, especially after they removed Flash support in it.

@mjethani
Copy link

uri is also probably fine, because it's a relative URI. Maybe you want to support something like [uri=||bad-server.*.example.com/foo/]example.com##.annoying-ad. In other words, even though a full URL is not supported, in the future you might want to add support.

About the comma as a separator for multiple options: How would a literal comma be represented in this case?

Also I was wondering if someone might want support for exceptions. e.g. [uri=@@/foo/]example.com##.annoying-ad.

@mjethani
Copy link

About the comma as a separator for multiple options: How would a literal comma be represented in this case?

The same question about a literal ] (closing bracket).

@uBlock-user
Copy link

Would this syntax support whitelist mode ? For example, I would like to block google product ads everywhere except on path such as www.google.*/shopping/* so the syntax would be [$domain=www.google.*,path=!/shopping]##.textad something like that.

@ameshkov
Copy link
Member Author

ameshkov commented Aug 23, 2021

@uBlock-user this can be achieved with two rules:

google.*##.textad
[$path=/shopping]google.*#@#.textad

@sxgunchenko can we add a unit test for this use case?

@sxgunchenko
Copy link

can we add a unit test for this use case?

@ameshkov We already have one in the corresponding PR

@mjethani

In other words, even though a full URL is not supported, in the future you might want to add support.

I think it would be better to add a separate modifier that supports full URLs.

As for the special characters, if you want them to be a part of a modifier value, they must be escaped with \ (https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters#syntax-8)

@mjethani
Copy link

@sxgunchenko thanks.

The same question about a literal ] (closing bracket).

It also occurs to me now that the closing bracket would also have to be percent-encoded in a URL, which means the value specified here would also be encoded similarly. It would never occur in a filter between the [ and the ].

Are IPv6 addresses supported in cosmetic filters?

[2001:db8::8a2e:370:7334]##.ad

@sxgunchenko
Copy link

It would never occur in a filter between the [ and the ]

It seems like it depends on the software which makes a request. For example, a browser (at least chrome and FF) won't encode the request to http://example.com/?some=1[2 either entered in the address bar or encountered on an HTML page (like <script src="//example.net/[script].js"></script>)

Are IPv6 addresses supported in cosmetic filters?

They should be, but because of a bug they aren't matched
#1505

@zhelvis
Copy link

zhelvis commented Sep 29, 2021

@sxgunchenko I propose to update the specification:

[$path=/page.html|]##.textad - hides a div with a class textad at /page.html of any domain but not at /sub/page.html or /page.html?<query>

This example description is not correct. According to docs /sub/page.html will not be excluded.

Also it seems that using a $domain modifier with a list of domains is impractical. This can be added to the spec too.

@sxgunchenko
Copy link

This example description is not correct

Yup, it seems like the remainings of the previous logic description. The spec has been updated.

Also it seems that using a $domain modifier with a list of domains is impractical. This can be added to the spec too.

It is already there: https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters#non-basic-rules-modifiers-domain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants