Adblock Plus and (a little) more

And we proudly present: The Deregifier! · 2006-08-23 00:26 by Wladimir Palant

After years of regexp mania we now have to go back to simple filters — starting with Adblock Plus 0.7 those are more efficient than regular expressions. This doesn’t mean that regular expressions got any slower, it’s that simple filters got very fast. There are also other advantages a dozen simple filters have over one long regexp. They are simply easier to read, “effective filter” in the blockable items tooltip presents more relevant information (one rule instead of the whole regexp), we get hit counts for every filter that are also more relevant, and we can deativate every simple filter separately if it causes problems.

Yet a number of regular expressions already exist in various filter lists, e.g. in Filterset.G. Splitting them up again is a boring, time-consuming and error-prone job. To make it somewhat simpler I wrote a new web tool: The Deregifier. It gets a filter list as input and tries to translate regular expressions back in simple filters. Of course not all regular expressions can be translated, the effects of the Adblock list optimizer can always be reversed however. As to Filterset.G: the tool managed to get rid of 24 regexps, increasing filter matching performance by 35%. 25 regular expressions are still left, however two of those couldn’t be converted simply because they contained dots in them that G forgot to escape.

Tags:

Comment [3]

  1. ecjs · 2006-08-23 11:06 · #

    This is a great tool. Thanks !

    However, I am going to test it a bit more, as it seems not to work for filters taken from http://adblock.free.fr

    Reply from Wladimir Palant:

    No, it doesn’t – because filters in this list use regexp features that are not translatable back into simple filters.

  2. ecjs · 2006-08-23 11:13 · #

    Alright, there is just the “dot trick” to mind in order to make it work correctly. Changing . into \. is sufficient.

  3. chewey · 2006-09-06 01:05 · #

    What would you think of the idea of deregifying list subscriptions in ABP itself?

    I played around a little bit (and used some manual deregifying on more complex rules) and ended up with a filter list seven times bigger than my “regular” list.

    Taking the effect of HTTP 304 responses into account, this would about triple my traffic, coming frighteningly close to my monthly limit.

    This and better readability (at least for myself) due to less redundancy in the list are the main reasons for me not (yet) to switch to a regex-free (or at least regex-reduced) list.

    Reply from Wladimir Palant:

    I thought about this. There are two reasons why I won’t do it:

    1. “Deregifying” requires a considerable amount of time and memory. It is ok if you choose to optimize your list every once in a while but doing this all the time in an extension is bad.

    2. Automatic expansion of regexps sometimes creates too many filters, e.g. I got several thousands for fanboy’s list. While this isn’t a problem for the matching, often not every single combination from the regexp is meant to be a filter. Furthermore, sometimes the resulting filters are too short which makes them just as slow as regexps. So you can’t make this process entirely automatic (yet?).

    The second issue should also be the problem in your case. Usually “deregifying” doesn’t increase the list size by more than ~20% – you probably have regexps with too many combinations there.

Commenting is closed for this article.