Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Decode regexFilter matches in Declarative Net Request as query parameters #636

Open
oliverdunk opened this issue Jun 11, 2024 · 7 comments
Labels
needs-triage: safari Safari needs to assess this issue for the first time supportive: chrome Supportive from Chrome supportive: firefox Supportive from Firefox topic: dnr Related to declarativeNetRequest

Comments

@oliverdunk
Copy link
Member

oliverdunk commented Jun 11, 2024

As mentioned in #302, many sites add an additional redirect to external links. These interstitials often include the destination URL as a query parameter with URI component encoding:

Examples

https://www.google.com/url?q=https%3A%2F%2Fexample.com%2F%3Ftest
https://l.facebook.com/l.php?u=https%3A%2F%2Fexample.com%2F%3Ftest
https://steamcommunity.com/linkfilter/?u=https%3A%2F%2Fexample.com%2F%3Ftest
https://forum.donanimhaber.com/externallinkredirect?url=https%3A%2F%2Fexample.com%2F%3Ftest

Current Situation

A simple browser.declarativeNetRequest rule to skip these redirects would be as follows:

{
  "id": 1,
  "action": {
    "type": "redirect",
    "redirect": {
      "regexSubstitution": "\\1"
    }
  },
  "condition": {
    "regexFilter": "https://steamcommunity.com/linkfilter/\\?u=([^&]*).*",
    "resourceTypes": ["main_frame"]
  }
}

However, this does not decode the query parameter and therefore fails to properly redirect. For example, visiting https://steamcommunity.com/linkfilter/?u=https://example.com?test%26hello does not properly decode the %26 in the destination URL.

Proposal

Add a decodeRegexMatches property in the Redirect or RuleCondition dictionaries. This would process each match through decodeURIComponent before performing the regexSubstitution.

Limitations

This does not address all possible types of encoding. However, I wasn't able to find many exceptions - only one in fact - and this covers the sites which extensions like Privacy Badger were handling in Manifest V2.

Ideally we would have a more complete API, supporting things like only decoding certain matches or applying different types of processing, perhaps in a given order. However, this proposal was written with the hope of keeping the implementation complexity down so this can move forward sooner.

@github-actions github-actions bot added needs-triage: chrome Chrome needs to assess this issue for the first time needs-triage: firefox Firefox needs to assess this issue for the first time needs-triage: safari Safari needs to assess this issue for the first time labels Jun 11, 2024
@oliverdunk oliverdunk added supportive: chrome Supportive from Chrome and removed needs-triage: chrome Chrome needs to assess this issue for the first time labels Jun 11, 2024
@ghostwords
Copy link

Thanks for the proposal! This is a solution to the problem in #302.

Add a decodeRegexMatches property in the Redirect or RuleCondition dictionaries.

decodeRegexMatches should probably live together with regexSubstitution, because that's the one place it matters. So, add the property to the Redirect dictionary only?

Also, for some future-proofing, we should have decodeRegexMatches take an array of strings, where at the moment there is one possible value:

    "redirect": {
      "regexSubstitution": "\\1",
      "decodeRegexMatches": ["decodeURIComponent"]
    }

This looks good to me otherwise.

@ghostwords
Copy link

This would process each match through decodeURIComponent before performing the regexSubstitution.

This would process each captured group referenced by regexSubstitution?

@oliverdunk
Copy link
Member Author

So, add the property to the Redirect dictionary only?

No strong feelings on this, I think that's a reasonable justification.

Also, for some future-proofing, we should have decodeRegexMatches take an array of strings

I also don't feel strongly here. If we don't do it to begin with, I think we could quite easily add that later without causing too many issues.

This would process each captured group referenced by regexSubstitution?

Right, so \\1 would now be the equivalent of decodeURIComponent(\\1). I think that is sufficient but URL encoding is hard so if anyone notices certain decoding we would want to do differently, definitely keen to hear it. Ultimately I think we just want to match the encoding these sites are using.

@tophf
Copy link

tophf commented Jun 12, 2024

It might make sense to indicate the specific captured groups that need to be decoded e.g. "decodeURL": [1].

P.S. Suggesting to replace Declarative Net Request with declarativeNetRequest in the description to make this issue discoverable via search. Might also make sense to add a DNR label to all issues that contain Declarative, DNR, declarativeNetRequest.

@oliverdunk oliverdunk added the topic: dnr Related to declarativeNetRequest label Jun 12, 2024
@oliverdunk
Copy link
Member Author

Thanks for the thoughts @tophf!

It might make sense to indicate the specific captured groups that need to be decoded e.g. "decodeURL": [1].

I briefly mention this under "Limitations". I'm supportive of the ability to do that, but I think we might want to leave it out for the MVP. Otherwise it opens other questions (can you apply different decodings to different matches etc.) and I think this quickly becomes a larger proposal.

Updated the description and added an existing DNR label.

@Rob--W
Copy link
Member

Rob--W commented Jun 13, 2024

Add a decodeRegexMatches property in the Redirect or RuleCondition dictionaries.

This should be part of rule.action.redirect, not of rule.condition. The only way that it could influence the condition is if the resulting transformation would become an invalid URL, in which case you may want to fall back. But the possibility of 30000 "try to transform the URL and fall back to the next" seems very heavy, whereas I would be more comfortable with implementation complexity (and associated memory/CPU) when it is part of the (final) (redirect) action.

To enable future extensions, I would recommend a syntax that allows specifying separate substitutions for each individual match. The bare minimum is to support this through numbered capturing groups (\\1 in your example). Another option could be to support this through named capturing groups ((?<name>pattern)). Named capturing groups is supported by RE2 (source: re2 wiki: Syntax, used by Chromium) and RegExp in JavaScript (source: mdn: Named capturing group, used by Firefox (source) and Safari (source)). Note: the named capturing group feature specifies the syntax used for the matching part, not for the substitution. For substitution, you could draw inspiration from the named backreference syntax, \k<name> (mdn). Although documented as not supported by RE2, this does not have to be an implementation-limiting factor, because the actual backreference substitution can be implemented independently of the regular expression engine (which Firefox and Safari do, see the sources linked before). Chrome does currently use RE2's replace implementation (source), but does not have to. In fact, if it wanted to implement the capability requested here, it most likely has to re-implement substitution independently of the RE2 engine.

Here is an example that can be extended in the future. I'm suggesting an array syntax here instead of a dictionary in case (future) operations want to be dependent on the result of previous transformations.

    "redirect": {
      "regexSubstitution": "\\1",
      "regexSubstitutionTransforms": [
        { "group": 1, "operation": "decodeURIComponent" }
      ]
    }

@Rob--W Rob--W added supportive: firefox Supportive from Firefox and removed needs-triage: firefox Firefox needs to assess this issue for the first time labels Jun 13, 2024
@oliverdunk
Copy link
Member Author

@Rob--W, I spoke to the Chrome team about your updated syntax. We're a little hesitant due to the additional verbosity, but can also see the benefit, so we'd be comfortable with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage: safari Safari needs to assess this issue for the first time supportive: chrome Supportive from Chrome supportive: firefox Supportive from Firefox topic: dnr Related to declarativeNetRequest
Projects
None yet
Development

No branches or pull requests

4 participants