[Done] Faster filter matching algorithm

Various discussions related to Adblock Plus development
Wladimir Palant

[Done] Faster filter matching algorithm

Post by Wladimir Palant »

In forum/viewtopic.php?p=38095#p38095 lyricconch proposed a different filter matching algorithm that can significantly speed up filter matching in Adblock Plus. Filter matching is pretty fast already but it is always possible to improve. According to my measurements we cat get almost 50% improvement on Firefox 3.6 and more than 100% improvement on Firefox 4. The cost of that is slightly slower initialization, I measured exactly 3.5 ms difference in both Firefox versions - this is not worth mentioning compared to the total initialization time of Adblock Plus, and I plan to cache this data anyway so that it won't slow down browser startup. So, sounds like a great win and simple as well.

There is a catch however: with the new algorithm a different category of filters is designated as "slow". Right now a filter needs "eight unbroken text characters" to be fast. This requirement is replaced by "it should be possible to extract a keyword from a filter". A keyword should consist of three or more alphanumerical characters (% is also allowed to include things like %D1%82%D0%B5%D1%81%D1%82). And: the keyword needs to end in the filter. Some examples of what this means:

*/foo/bar_* contains two keywords: "foo" and "bar"
*foo/bar* contains no keyword because the actual address it matches could be "http://example.com/adfoo/barbas" - so the actual keywords would be "adfoo" and "barbas"
*.html| contains the keyword "html" (the anchor after it prevents that the keyword continues)
*/html^ also contains the keyword "html", separator placeholder (which could be address end, a slash or ? in the real address) also makes sure the keyword doesn't continue

The result is that filter subscription will need to adapt - while it is generally advisable to add some context when blocking on a keyword not all filter subscriptions currently do that consistently. Which is why I am scheduling this for Adblock Plus 1.4, should be enough time to adapt to the changes. I plan to land this change soon after Adblock Plus 1.3 release so that it will be possible to use development builds for testing.
Michael
Posts: 1361
Joined: Sat Dec 19, 2009 12:29 pm

Re: [Roadmap] Faster filter matching algorithm

Post by Michael »

This sounds like a great algorithm not only to speed up filter processing, but also to encourage good filter practices. Does the keyword have to be unique within the subscription?
Wladimir Palant

Re: [Roadmap] Faster filter matching algorithm

Post by Wladimir Palant »

No, the uniqueness requirement goes away - and with it the non-deterministic behavior of the slow filters mark. However, it is better if the list doesn't contain lots of filters that use the same keyword and there is no other keyword to choose.
Michael
Posts: 1361
Joined: Sat Dec 19, 2009 12:29 pm

Re: [Roadmap] Faster filter matching algorithm

Post by Michael »

Is there a patched version of Adblock Plus available for testing?
Wladimir Palant

Re: [Roadmap] Faster filter matching algorithm

Post by Wladimir Palant »

IceDogg
Posts: 909
Joined: Fri Jun 09, 2006 11:22 pm

Re: [Roadmap] Faster filter matching algorithm

Post by IceDogg »

Sorry if stupid question, but should this be tested by just users like myself?
Wladimir Palant

Re: [Roadmap] Faster filter matching algorithm

Post by Wladimir Palant »

Not yet. It's just a quick patch to test the idea, not finished yet. There will be time for end user testing after this change lands in the development builds.
User avatar
fanboy
Posts: 3446
Joined: Sun Jun 17, 2007 4:45 am
Contact:

Re: [Roadmap] Faster filter matching algorithm

Post by fanboy »

I'm trying to decode the "slow filters", using the experimental xpi, Why is it so large? Given that the standard 8char was fine till now. Annoys me somewhat.

Code: Select all

-120x600c
-130x600
-160X600b
-235x58p
-468x060
-fleshlight
.ad.footer
.adbanner
.adcontent
/160x600
/468X60t
/468by60
/468x060
/468x280
/468x60a
/468x60b
/468x60l
/468x60p
/468x60w
/468x80b
/468x80g
/470x030
/480x030
/728x90D
/728x90F
/728x90b
/728x90c
/728x90n
/760x120
/AD_Rotator
/AdHandler
/AdPostInject
/AdaptvAdserver
/BelowNewsAd
/GetAdvertImage
/GetArticleAdvert
/OASAdConnector
/ShowFlashAd
/ad-boxes
/ad-html
/ad-indicator
/ad-layer
/ad-leaderboard
/ad-middle
/ad-skyscraper
/ad-square
/ad-topban
/ad-vertical
/ad/468x
/ad/bann
/ad/code
/ad/frame
/ad/google
/ad/realclick
/ad728x15
/ad?sponsor
/adHeader
/adManagementAdv
/adStreamJS
/ad_350x
/ad_banner
/ad_bottom
/ad_common
/ad_footer
/ad_label
/ad_leaderboard
/ad_onbit
/ad_slide
/ad_wrapper
/adclick
/adcompany
/adengage
/adfetch
/adfshow
/adfunction
/adimage
/adinclude
/adjuggler
/adlayer
/admedia
/adpeeps
/ads160x600
/adsRotator
/adsWrapperIntl
/adsensev
/adtext4
/adtology
/adtrack
/adunitt
/advertis$image,script,subdocument
/advision
/awebanner
/banner160x
/banner468
/bannerad
/bannerframeopenads
/bannerserv
/banniere
/bizbannerre
/blogads
/eroadvertorial
/facebooksex
/footertextad
/geitonpop
/getadvertiserimage
/googlead
/headerAdvert
/iplgadshow
/js.ad/s
/js.ng/c
/js.ng/p
/js.ng/s
/js.ng/t
/js/adFrequency
/js/adlink
/js/adsense
/jsAdScript
/mr_tb_ad.
/netspiderads
/pagepeel
/promotools
/relatedads
/rotatingtextad
/sd/ad.php
/showadvert
/showbanner
/skyframeopenads
/spacedesc
/sponsoradds
/sponsorads
/sponsoredlinks
/st?ad_type
/valueclick
/virtuagirl
/webMailAd
/wp_ad.js
100x200.gif
120x500.gif
120x600.gif
120x600.htm
120x600.jpg
120x600.png
120x600.swf
125x125.gif
125x600.gif
125x600.swf
133x394.gif
160x300.gif
160x600.gif
160x600.htm
160x600.jpg
160x600.php
160x600.png
160x600.swf
160x6001
175x175.gif
180x300.gif
180x600-
20ad.gif
234x60.gif
246x120.swf
250X250.swf
250x210.jpg
300x250.gif
300x250.htm
300x250.jpeg$domain=~pics.tata.ru
300x250.jpg$domain=~mydamnchannel.com|~fanhouse.com
300x250.png
300x250.swf
300x250B.swf
300x250ad
300x60.gif
302x52.swf
316x250.swf
320x250_
336x280.gif
336x280.jpg
450x55.jpg
460x70.jpg
468-60.gif
468-60.swf
46860.gif
468_60.gif
468_60_1
468x060.
468x060_
468x100_
468x60.gif
468x60.htm
468x60.jpg
468x60.php
468x60.swf
468x60_1
468x60_2.
468x60ad
468x60banner
468x60nologo.swf
468x80.gif
470x60.gif
470x60.jpg
470x60.swf
480x80.jpg
486x60.js
522x62.jpg
700200.jpg
700x200.gif
728x290.gif
728x90.gif
728x90.htm
728x90.jpg
728x90.php
728x90.png
728x90.swf
728x90_2.
750x80.swf
750x90.gif
790x250.gif
800x120.gif
80x468.jpg
972_70.gif
;sz=300x60
=298x40px
=468x60%
?adunitid
@@/BannerAd$domain=careerone.com.au
@@/adiframe$script,domain=theteacherscorner.net
@@/advertisem$script,domain=mircscripts.org|tvnz.co.nz|skimtube.com|multiup.org|ratebeer.com
@@://10.1.1.
@@_140x140$domain=guardian.co.uk
@@coremetrics$domain=qvc.com
Ad468x60
LargeFlashAd.js
_120x600a
_120x800A
_148x154
_160x600
_300x250v
_468x060
_728x90p
_728x90r
_728x90v
_ad_advert
_ad_footer
_ad_placeholder
_minibanner
ads/468x60
advert_bottom
advert_cent
advert_left
advert_right
affiliate/banner
anner/728x
banman.asp
banmanpro
banner-468x
banner.cgi
banner300x
banner_id=
banner_vert
bannerad.
bannerad_
bannerid=
d468x60-
dhtmlpop
house_ad
innerads
left-ads
link-exchange
paypopup
pop_under
popinads
popunder$script
server/AAMALL
server/CCID
sz=160x200
||x7.to/ad/
User avatar
Hubird
Posts: 2850
Joined: Thu Oct 26, 2006 2:59 pm
Location: Australia
Contact:

Re: [Roadmap] Faster filter matching algorithm

Post by Hubird »

So using the experimental build it should be safe for filter list authors to start modifying / removing any slow filters ?

I guess I'm just looking for 100% confirmation that this is the way things are going to be :)
User avatar
fanboy
Posts: 3446
Joined: Sun Jun 17, 2007 4:45 am
Contact:

Re: [Roadmap] Faster filter matching algorithm

Post by fanboy »

This new algorithm pretty much kills the dimensions filter :(

Code: Select all

-120x600c
-130x600
-160X600b
-235x58p
-468x060
/160x600
/468X60t
/468by60
/468x060
/468x280
/468x60a
/468x60b
/468x60l
/468x60p
/468x60w
/468x80b
/468x80g
/470x030
/480x030
/728x90D
/728x90F
/728x90b
/728x90c
/728x90n
/760x120
100x200.gif
120x500.gif
120x600.gif
120x600.htm
120x600.jpg
120x600.png
120x600.swf
125x125.gif
125x600.gif
125x600.swf
133x394.gif
160x300.gif
160x600.gif
160x600.htm
160x600.jpg
160x600.php
160x600.png
160x600.swf
160x6001
175x175.gif
180x300.gif
180x600-
20ad.gif
234x60.gif
246x120.swf
250X250.swf
250x210.jpg
300x250.gif
300x250.htm
300x250.jpeg$domain=~pics.tata.ru
300x250.jpg$domain=~mydamnchannel.com|~fanhouse.com
300x250.png
300x250.swf
300x250B.swf
300x250ad
300x60.gif
302x52.swf
316x250.swf
320x250_
336x280.gif
336x280.jpg
450x55.jpg
460x70.jpg
468-60.gif
468-60.swf
46860.gif
468_60.gif
468_60_1
468x060.
468x060_
468x100_
468x60.gif
468x60.htm
468x60.jpg
468x60.php
468x60.swf
468x60_1
468x60_2.
468x60ad
468x60banner
468x60nologo.swf
468x80.gif
470x60.gif
470x60.jpg
470x60.swf
480x80.jpg
486x60.js
522x62.jpg
700200.jpg
700x200.gif
728x290.gif
728x90.gif
728x90.htm
728x90.jpg
728x90.php
728x90.png
728x90.swf
728x90_2.
750x80.swf
750x90.gif
790x250.gif
800x120.gif
80x468.jpg
972_70.gif
;sz=300x60
=298x40px
=468x60%
User avatar
Hubird
Posts: 2850
Joined: Thu Oct 26, 2006 2:59 pm
Location: Australia
Contact:

Re: [Roadmap] Faster filter matching algorithm

Post by Hubird »

I've been having issues with filters not matching when using the experimental build.

I use a filter suggested by the filter composer but when I refresh the page the item is still there :?
User avatar
Hubird
Posts: 2850
Joined: Thu Oct 26, 2006 2:59 pm
Location: Australia
Contact:

Re: [Roadmap] Faster filter matching algorithm

Post by Hubird »

Wladimir Palant wrote:I plan to land this change soon after Adblock Plus 1.3 release so that it will be possible to use development builds for testing.
The sooner the better really, filter list authors will want to make sure that any new filters are not slow. The easiest way to do this is to watch out for warnings in the filter composer.

I was using the experimental build but was altered to an updated version which I'm guessing does not use the new filter matching algorithm.
IceDogg
Posts: 909
Joined: Fri Jun 09, 2006 11:22 pm

Re: [Roadmap] Faster filter matching algorithm

Post by IceDogg »

Wladimir Palant wrote:Not yet. It's just a quick patch to test the idea, not finished yet. There will be time for end user testing after this change lands in the development builds.
Ok, thanks for letting me/us know. Sounding like it might need that testing too. I hope it goes well.
User avatar
fanboy
Posts: 3446
Joined: Sun Jun 17, 2007 4:45 am
Contact:

Re: [Roadmap] Faster filter matching algorithm

Post by fanboy »

Not sure why this is considered slow:

Code: Select all

||1.im.cz/ad/

vs

Code: Select all

http://1.im.cz/ad/
which is apprently better? :?
Wladimir Palant

Re: [Roadmap] Faster filter matching algorithm

Post by Wladimir Palant »

fanboy wrote:Not sure why this is considered slow:

Code: Select all

||1.im.cz/ad/
Because all keywords here ("1", "im", "cz", "ad") are too short, Adblock Plus needs at least 3 characters.
fanboy wrote:

Code: Select all

http://1.im.cz/ad/
which is apprently better? :?
Here "http" can be used as a keyword - which is only marginally better because it can be found in almost every URL. So probably better leave that filter as is - that should be a rare case where there is no meaningful way to fix it.
Locked