Filter redundancy check/optimizations

Various discussions related to Adblock Plus development
Post Reply
cmmc
Posts: 158
Joined: Fri Apr 22, 2011 8:15 pm

Filter redundancy check/optimizations

Post by cmmc »

I just ran my filter lists, including custom filters through https://arestwo.org/famlam/redundantRuleChecker.html and found 1461 redundant rules, plus 82 errors, warnings or optimalizations.

An option to optimize all your filter lists as a whole would be very helpful. Maybe as an extra addon?
lewisje
Posts: 2743
Joined: Mon Jun 14, 2010 12:07 pm

Re: Filter redundancy check/optimizations

Post by lewisje »

I can imagine this pre-processing won't be such a performance problem in ordinary use, but it probably would increase storage (a new overall optimized list, in addition to the originals of each list, because the lists aren't all updated at once) and make the UI unresponsive while saving changes to custom filters (to re-optimize the combined list).
There's a buzzin' in my brain I really can't explain; I think about it before they make me go to bed.
cmmc
Posts: 158
Joined: Fri Apr 22, 2011 8:15 pm

Re: Filter redundancy check/optimizations

Post by cmmc »

Can't see why, since all that's needed is to prune patterns.ini, either manually, when updating, or at preset intervals. Currently, mine's about 3MB; optimized, it'd be about 2/2.5 MB. Plus, UI is already unresponsive as it is, not because of ABP lists, but because of FF's bad script handling, so anything that reduces processing requirements, can only improve overall UI responsiveness.
lewisje
Posts: 2743
Joined: Mon Jun 14, 2010 12:07 pm

Re: Filter redundancy check/optimizations

Post by lewisje »

Let me make it clearer: Your subscriptions will not all update at the same time; this means that if you want an optimized version of all subscriptions combined every time any subscription updates, you will need the cached versions of all the other subscriptions.

If the extension just re-updates all subscriptions when any of them needs to be, that's using the authors' bandwidth more than they're asking for, and if it just keeps the portions of the optimized list corresponding to the non-updated subscriptions, it could go out of sync with the originals (perhaps a filter in one of the lists that was considered redundant would no longer be redundant once another list is updated, but it would have been removed in the previous optimization pass).

---

I'll try to show a simple example below...

ListA has the following two filters:
||domain.tld^$script
||new.tld^$third-party
||lol.omg^$domain=ro.fl


ListB has the following filter:
||domain.tld^js/$script
||new.tld^
||wtf.bbq^$image


The optimized, combined list would look like this:
||domain.tld^$script
||lol.omg^$domain=ro.fl

||new.tld^
||wtf.bbq^$image


Now let's imagine that first rule was dropped from ListA for causing too many false positives and then it expired before ListB did; if the ListB portion were not updated, the intermediate list (before the new optimization pass) would look like this:
||new.tld^$third-party
||lol.omg^$domain=ro.fl

||new.tld^
||wtf.bbq^$image


Then after optimization, it would look like this:
||lol.omg^$domain=ro.fl
||new.tld^
||wtf.bbq^$image

---
However, if the extension had cached the original versions of ListA and ListB, it would combine them like this:
||new.tld^$third-party
||lol.omg^$domain=ro.fl

||domain.tld^js/$script
||new.tld^
||wtf.bbq^$image


and then the new optimized list would look like this:
||lol.omg^$domain=ro.fl
||domain.tld^js/$script
||new.tld^
||wtf.bbq^$image


---

The filter in bold is one that was originally considered redundant but, after another list was updated, no longer is; if it were just stripped out entirely rather than being cached, it wouldn't show up in the new list until ListB were updated. A similar effect would happen if ListB then removed ||new.tld^ for causing false positives and were updated before ListA: If the original ListA were not cached, then ||new.tld^$third-party, although no longer redundant, would not be in the re-optimized list until its own update time, which could be a few days out.

Of course, this mismatching could be avoided if the user manually updated all subscriptions, but that's not expected behavior: It should only be done to debug issues with a subscription that have just been fixed, not as a regular matter of course, using the authors' bandwidth more often than they ask.

I actually support this idea, but I just wanted to point out there are minor downsides: a minor extra hiccup whenever subscriptions are updated, and a few extra MB to cache the pre-optimized copies of filter lists.
There's a buzzin' in my brain I really can't explain; I think about it before they make me go to bed.
User avatar
mapx
Posts: 21940
Joined: Thu Jan 06, 2011 2:01 pm

Re: Filter redundancy check/optimizations

Post by mapx »

A p2p implementation would save a lot of bandwidth
cmmc
Posts: 158
Joined: Fri Apr 22, 2011 8:15 pm

Re: Filter redundancy check/optimizations

Post by cmmc »

@lewisje: Ok, now I get it, but the solution for that is actually very simple: instead of deleting the redundant filters, just move them to a new (local) 'redundancies' list. This list would never be processed by abp, but would always be checked by the redundancy checker. That way you'd always be verifying the full set of filters, so redundant filters could always be retrieved at any time, and abp would still only process the unique filters, reducing it's work load.

Another somewhat related suggestion I'd like to forward, which would also require changes to lists' structure, regards comments in filter lists: as you know, currently, these comments are added as a new line above filters, so unless you open the actual file, you'll never even know they're there, and then you still won't really know exactly which filters they apply to - next 1, 2, 10, 20 filters? My suggestion is to put comments after the filters, in the same line, so you can then have them available to users in a new 'comments' column, for whatever purpose they may be useful.
lewisje
Posts: 2743
Joined: Mon Jun 14, 2010 12:07 pm

Re: Filter redundancy check/optimizations

Post by lewisje »

That would require a change to the way comments work (emphasis mine): en/filters#comments
Adblock Plus wrote:Any rule that starts with an exclamation mark is considered a comment.
I'm thinking that if list authors were able to put comments on the same line as the filters they comment on, they would have done so.
There's a buzzin' in my brain I really can't explain; I think about it before they make me go to bed.
cmmc
Posts: 158
Joined: Fri Apr 22, 2011 8:15 pm

Re: Filter redundancy check/optimizations

Post by cmmc »

Which is why I'm proposing it here. (x2).

Personally, I just think it could be useful - it's the way it should've been done, imv.
lewisje
Posts: 2743
Joined: Mon Jun 14, 2010 12:07 pm

Re: Filter redundancy check/optimizations

Post by lewisje »

One good idea might be to add a comment specifier (or re-use !), to go into the same area as things like document and domain, after the $, which must be the final specifier (for blocking or whitelist filters), and support for C-style comments (/* comment */) in hiding and hiding-whitelist filters.

The reason it's not a good idea to just have ! specify a comment by itself is that it's a valid character in a URL.
There's a buzzin' in my brain I really can't explain; I think about it before they make me go to bed.
Post Reply