Skip to content
This repository has been archived by the owner on Nov 15, 2017. It is now read-only.

Net request filtering: overview

Raymond Hill edited this page May 9, 2014 · 95 revisions

The matrix filtering engine and the ABP filtering engine can be selectively turned on or off, independently. Whether matrix filtering and/or ABP filtering is on or off is a per-scope setting. When both filtering engines are turned off, HTTPSB can still be useful by acting as a reporting tool, as it will still report in the matrix and request log all the net connections a web page does.

Great care has been taken to write CPU- and memory-efficient code for both filtering engines.

Matrix filtering

Matrix filtering uses a inheritance model to transfer the block or allow status of a node to a lower precedence node:

Where type correspond to the type of the request, which currently can be: cookie, stylesheet/web font, image, plug-in, script, xmlhttprequest, iframe, and other (HTML5 video, HTML audio, SVG, and some other uncategorized requests). To use a concrete example, let's take the following URL: https://www.example.com/image.png.

The hostname and the type of the request is extracted from the URL:

  • Hostname: www.example.com
  • Type: image

To evaluate whether the request should be blocked or allowed, matrix filtering will first look-up the block/allow status using exactly the hostname and type, then go up the hierarchy for when no explicit rule is found:

  • Is image from www.example.com whitelisted?
  • Yes: request is allowed
  • No: Is image from www.example.com blacklisted?
  • Yes: request is blocked
  • No: Go up one step in the precedence hierarchy
  • Is image from example.com whitelisted?
  • Yes: request is allowed
  • No: Is image from example.com blacklisted?
  • Yes: request is blocked
  • No: Go up one step in the precedence hierarchy
  • Is everything from example.com whitelisted?
  • Yes: request is allowed
  • No: Is everything from example.com blacklisted?
  • Yes: request is blocked
  • No: Go up one step in the precedence hierarchy
  • Is image from anywhere whitelisted?
  • Yes: request is allowed
  • No: Is image from anywhere blacklisted?
  • Yes: request is blocked
  • No: Go up one step in the precedence hierarchy
  • Is everything from anywhere whitelisted?
  • Yes: request is allowed
  • No: Is everything from anywhere blacklisted?
  • Yes: request is blocked
  • No: request is blocked

Actually, things are a bit more complicated, as the pseudo-code above doesn't take into account that type/domain (or type/subdomain) nodes inherit from two ancestors. HTTPSB deals with the ambiguity by allowing such requests if and only if both ancestor nodes evaluate as "allowed". However, if strict blocking is disabled, the status of the domain (or subdomain) ancestor node has precedence over the type ancestor node.

The hierarchical evaluation in matrix filtering allows a user to easily toggle whole set of block/allow permissions by just blacklisting or whitelisting a single node. For instance, whitelisting (or blacklisting) the "all" node allows to turn all graylisted descendant nodes into allow (or block) mode.

All requests are evaluated in real-time against the current state of the matrix, so this means even future requests which have not been seen yet are affected by any change made to the matrix. For instance, this address a problem often raised by NoScript users: with HTTPSB, if a user whitelists (or blacklists) the all cell in the matrix, all graylisted descendant cells will inherit the allow (or block) status of the all cell.

Note that matrix filtering is used to evaluate more than just whether net requests are to be allowed or blocked. It is also used to evaluate:

  • Whether javascript execution should be blocked
  • Whether cookies should be stripped from outgoing HTTP headers
  • Whether HTTP referer information should be stripped from outgoing HTTP headers
  • Whether HTML5 localStorage should be emptied for a particular hostname

ABP filtering

Starting with version 0.8.4.0, HTTPSB supports to the parsing and enforcing ABP filter syntax.

ABP filters allow for a more granular control than matrix filtering when it comes to block net requests, as it is based on finding patterns in the whole URL, rather than just looking at the hostname and type of a net request.

At time of writing, not all ABP filters are supported, though the most common ones are supported. Not supported yet:

  • Any filter which has options other than the third-party option
  • Any filter which purpose is to hide elements

Still, roughly the proportion of parsed and enforced ABP filters is almost 90% of all ABP filters.

ABP filtering takes place after matrix filtering, therefore if a specific net request is blocked through matrix filtering, then ABP filtering will not be used, as the net request has already been evaluated as blocked.

Any net request which is evaluated as "allowed" by matrix filtering will however be further evaluated through the ABP filtering engine.

Great care has been taken to implement an efficient ABP filtering engine in HTTPSB (code was written from scratch), and the result is such that it consumes considerably less memory and CPU cycles than the official ABP extension on Chromium-based browsers.

To give a glimpse of the better performance of HTTPSB over ABP in handling ABP-compatible filters, I measured that on average, for the benchmark described below, ABP evaluates 119 filters/URL, while for the same benchmark HTTPSB evaluates 8 filters/URL. (The result in the case of HTTPSB was a worst-case scenario, as matrix filtering had been turned off so as to ensure proper measurement: no requests were blocked by the matrix filtering engine, therefore all filtering duty fell onto the ABP filtering engine).

Also, whereas ABP uses regular expressions internally to test for a filter match, HTTPSB uses simpler plain string comparisons whenever it is more efficient to do so, which is true for the great majority of filters (see http://jsperf.com/regexp-vs-indexof-abp-miss/3 and http://jsperf.com/regexp-vs-indexof-abp-hit/3).


Memory footprint

In the above screenshot, Adblock Plus 1.7.4, Adblock 2.6.28, and HTTPSB 0.8.9.2 were set to use EasyList without element hiding and EasyPrivacy. So-called "acceptable ads" was disabled in ABP.


CPU footprint: Average time spent to process each single net request: ABP vs. HTTPSB (for the CPU benchmark, matrix filtering was enabled in HTTPSB in allow all/block exceptionally mode)

Now regarding the above results, an important fact: HTTPSB was using an extra 56,000+ blocked hosts as matrix-filtering rules (those rules are enabled out-of-the-box), and yet despite this, HTTPSB runs much leaner and faster than ABP, as seen above.[1]

The test was run on Google Chrome 34 for Linux, on Linux Mint 16 64-bit. The benchmark was the same as the one used in "Comparative benchmarks against widely used blockers: Top 15 Most Popular News Websites" (except repeat was set to 2), then all the tabs were closed (except for the Extensions tab), and the browser was left idling for over 20 minutes to ensure the browser's garbage collector cleared unused memory from the extensions. The extensions were benchmarked alone, with no other extension present.


[1] I still have an extra improvement in store to further reduce memory footprint regarding the implementation of ABP filters, but I do not consider this a priority at this point.

Clone this wiki locally