Page 1 of 3

[Done] Faster filter matching algorithm

Posted: Fri Oct 01, 2010 9:33 pm
by Wladimir Palant
In forum/viewtopic.php?p=38095#p38095 lyricconch proposed a different filter matching algorithm that can significantly speed up filter matching in Adblock Plus. Filter matching is pretty fast already but it is always possible to improve. According to my measurements we cat get almost 50% improvement on Firefox 3.6 and more than 100% improvement on Firefox 4. The cost of that is slightly slower initialization, I measured exactly 3.5 ms difference in both Firefox versions - this is not worth mentioning compared to the total initialization time of Adblock Plus, and I plan to cache this data anyway so that it won't slow down browser startup. So, sounds like a great win and simple as well.

There is a catch however: with the new algorithm a different category of filters is designated as "slow". Right now a filter needs "eight unbroken text characters" to be fast. This requirement is replaced by "it should be possible to extract a keyword from a filter". A keyword should consist of three or more alphanumerical characters (% is also allowed to include things like %D1%82%D0%B5%D1%81%D1%82). And: the keyword needs to end in the filter. Some examples of what this means:

*/foo/bar_* contains two keywords: "foo" and "bar"
*foo/bar* contains no keyword because the actual address it matches could be "http://example.com/adfoo/barbas" - so the actual keywords would be "adfoo" and "barbas"
*.html| contains the keyword "html" (the anchor after it prevents that the keyword continues)
*/html^ also contains the keyword "html", separator placeholder (which could be address end, a slash or ? in the real address) also makes sure the keyword doesn't continue

The result is that filter subscription will need to adapt - while it is generally advisable to add some context when blocking on a keyword not all filter subscriptions currently do that consistently. Which is why I am scheduling this for Adblock Plus 1.4, should be enough time to adapt to the changes. I plan to land this change soon after Adblock Plus 1.3 release so that it will be possible to use development builds for testing.

Re: [Roadmap] Faster filter matching algorithm

Posted: Fri Oct 01, 2010 9:59 pm
by Michael
This sounds like a great algorithm not only to speed up filter processing, but also to encourage good filter practices. Does the keyword have to be unique within the subscription?

Re: [Roadmap] Faster filter matching algorithm

Posted: Fri Oct 01, 2010 10:07 pm
by Wladimir Palant
No, the uniqueness requirement goes away - and with it the non-deterministic behavior of the slow filters mark. However, it is better if the list doesn't contain lots of filters that use the same keyword and there is no other keyword to choose.

Re: [Roadmap] Faster filter matching algorithm

Posted: Fri Oct 01, 2010 10:54 pm
by Michael
Is there a patched version of Adblock Plus available for testing?

Re: [Roadmap] Faster filter matching algorithm

Posted: Fri Oct 01, 2010 11:12 pm
by Wladimir Palant

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 12:55 am
by IceDogg
Sorry if stupid question, but should this be tested by just users like myself?

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 2:43 am
by Wladimir Palant
Not yet. It's just a quick patch to test the idea, not finished yet. There will be time for end user testing after this change lands in the development builds.

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 4:58 am
by fanboy
I'm trying to decode the "slow filters", using the experimental xpi, Why is it so large? Given that the standard 8char was fine till now. Annoys me somewhat.

Code: Select all

-120x600c
-130x600
-160X600b
-235x58p
-468x060
-fleshlight
.ad.footer
.adbanner
.adcontent
/160x600
/468X60t
/468by60
/468x060
/468x280
/468x60a
/468x60b
/468x60l
/468x60p
/468x60w
/468x80b
/468x80g
/470x030
/480x030
/728x90D
/728x90F
/728x90b
/728x90c
/728x90n
/760x120
/AD_Rotator
/AdHandler
/AdPostInject
/AdaptvAdserver
/BelowNewsAd
/GetAdvertImage
/GetArticleAdvert
/OASAdConnector
/ShowFlashAd
/ad-boxes
/ad-html
/ad-indicator
/ad-layer
/ad-leaderboard
/ad-middle
/ad-skyscraper
/ad-square
/ad-topban
/ad-vertical
/ad/468x
/ad/bann
/ad/code
/ad/frame
/ad/google
/ad/realclick
/ad728x15
/ad?sponsor
/adHeader
/adManagementAdv
/adStreamJS
/ad_350x
/ad_banner
/ad_bottom
/ad_common
/ad_footer
/ad_label
/ad_leaderboard
/ad_onbit
/ad_slide
/ad_wrapper
/adclick
/adcompany
/adengage
/adfetch
/adfshow
/adfunction
/adimage
/adinclude
/adjuggler
/adlayer
/admedia
/adpeeps
/ads160x600
/adsRotator
/adsWrapperIntl
/adsensev
/adtext4
/adtology
/adtrack
/adunitt
/advertis$image,script,subdocument
/advision
/awebanner
/banner160x
/banner468
/bannerad
/bannerframeopenads
/bannerserv
/banniere
/bizbannerre
/blogads
/eroadvertorial
/facebooksex
/footertextad
/geitonpop
/getadvertiserimage
/googlead
/headerAdvert
/iplgadshow
/js.ad/s
/js.ng/c
/js.ng/p
/js.ng/s
/js.ng/t
/js/adFrequency
/js/adlink
/js/adsense
/jsAdScript
/mr_tb_ad.
/netspiderads
/pagepeel
/promotools
/relatedads
/rotatingtextad
/sd/ad.php
/showadvert
/showbanner
/skyframeopenads
/spacedesc
/sponsoradds
/sponsorads
/sponsoredlinks
/st?ad_type
/valueclick
/virtuagirl
/webMailAd
/wp_ad.js
100x200.gif
120x500.gif
120x600.gif
120x600.htm
120x600.jpg
120x600.png
120x600.swf
125x125.gif
125x600.gif
125x600.swf
133x394.gif
160x300.gif
160x600.gif
160x600.htm
160x600.jpg
160x600.php
160x600.png
160x600.swf
160x6001
175x175.gif
180x300.gif
180x600-
20ad.gif
234x60.gif
246x120.swf
250X250.swf
250x210.jpg
300x250.gif
300x250.htm
300x250.jpeg$domain=~pics.tata.ru
300x250.jpg$domain=~mydamnchannel.com|~fanhouse.com
300x250.png
300x250.swf
300x250B.swf
300x250ad
300x60.gif
302x52.swf
316x250.swf
320x250_
336x280.gif
336x280.jpg
450x55.jpg
460x70.jpg
468-60.gif
468-60.swf
46860.gif
468_60.gif
468_60_1
468x060.
468x060_
468x100_
468x60.gif
468x60.htm
468x60.jpg
468x60.php
468x60.swf
468x60_1
468x60_2.
468x60ad
468x60banner
468x60nologo.swf
468x80.gif
470x60.gif
470x60.jpg
470x60.swf
480x80.jpg
486x60.js
522x62.jpg
700200.jpg
700x200.gif
728x290.gif
728x90.gif
728x90.htm
728x90.jpg
728x90.php
728x90.png
728x90.swf
728x90_2.
750x80.swf
750x90.gif
790x250.gif
800x120.gif
80x468.jpg
972_70.gif
;sz=300x60
=298x40px
=468x60%
?adunitid
@@/BannerAd$domain=careerone.com.au
@@/adiframe$script,domain=theteacherscorner.net
@@/advertisem$script,domain=mircscripts.org|tvnz.co.nz|skimtube.com|multiup.org|ratebeer.com
@@://10.1.1.
@@_140x140$domain=guardian.co.uk
@@coremetrics$domain=qvc.com
Ad468x60
LargeFlashAd.js
_120x600a
_120x800A
_148x154
_160x600
_300x250v
_468x060
_728x90p
_728x90r
_728x90v
_ad_advert
_ad_footer
_ad_placeholder
_minibanner
ads/468x60
advert_bottom
advert_cent
advert_left
advert_right
affiliate/banner
anner/728x
banman.asp
banmanpro
banner-468x
banner.cgi
banner300x
banner_id=
banner_vert
bannerad.
bannerad_
bannerid=
d468x60-
dhtmlpop
house_ad
innerads
left-ads
link-exchange
paypopup
pop_under
popinads
popunder$script
server/AAMALL
server/CCID
sz=160x200
||x7.to/ad/

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 5:50 am
by Hubird
So using the experimental build it should be safe for filter list authors to start modifying / removing any slow filters ?

I guess I'm just looking for 100% confirmation that this is the way things are going to be :)

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 6:42 am
by fanboy
This new algorithm pretty much kills the dimensions filter :(

Code: Select all

-120x600c
-130x600
-160X600b
-235x58p
-468x060
/160x600
/468X60t
/468by60
/468x060
/468x280
/468x60a
/468x60b
/468x60l
/468x60p
/468x60w
/468x80b
/468x80g
/470x030
/480x030
/728x90D
/728x90F
/728x90b
/728x90c
/728x90n
/760x120
100x200.gif
120x500.gif
120x600.gif
120x600.htm
120x600.jpg
120x600.png
120x600.swf
125x125.gif
125x600.gif
125x600.swf
133x394.gif
160x300.gif
160x600.gif
160x600.htm
160x600.jpg
160x600.php
160x600.png
160x600.swf
160x6001
175x175.gif
180x300.gif
180x600-
20ad.gif
234x60.gif
246x120.swf
250X250.swf
250x210.jpg
300x250.gif
300x250.htm
300x250.jpeg$domain=~pics.tata.ru
300x250.jpg$domain=~mydamnchannel.com|~fanhouse.com
300x250.png
300x250.swf
300x250B.swf
300x250ad
300x60.gif
302x52.swf
316x250.swf
320x250_
336x280.gif
336x280.jpg
450x55.jpg
460x70.jpg
468-60.gif
468-60.swf
46860.gif
468_60.gif
468_60_1
468x060.
468x060_
468x100_
468x60.gif
468x60.htm
468x60.jpg
468x60.php
468x60.swf
468x60_1
468x60_2.
468x60ad
468x60banner
468x60nologo.swf
468x80.gif
470x60.gif
470x60.jpg
470x60.swf
480x80.jpg
486x60.js
522x62.jpg
700200.jpg
700x200.gif
728x290.gif
728x90.gif
728x90.htm
728x90.jpg
728x90.php
728x90.png
728x90.swf
728x90_2.
750x80.swf
750x90.gif
790x250.gif
800x120.gif
80x468.jpg
972_70.gif
;sz=300x60
=298x40px
=468x60%

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 10:21 am
by Hubird
I've been having issues with filters not matching when using the experimental build.

I use a filter suggested by the filter composer but when I refresh the page the item is still there :?

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 12:57 pm
by Hubird
Wladimir Palant wrote:I plan to land this change soon after Adblock Plus 1.3 release so that it will be possible to use development builds for testing.
The sooner the better really, filter list authors will want to make sure that any new filters are not slow. The easiest way to do this is to watch out for warnings in the filter composer.

I was using the experimental build but was altered to an updated version which I'm guessing does not use the new filter matching algorithm.

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 6:53 pm
by IceDogg
Wladimir Palant wrote:Not yet. It's just a quick patch to test the idea, not finished yet. There will be time for end user testing after this change lands in the development builds.
Ok, thanks for letting me/us know. Sounding like it might need that testing too. I hope it goes well.

Re: [Roadmap] Faster filter matching algorithm

Posted: Sat Oct 02, 2010 11:41 pm
by fanboy
Not sure why this is considered slow:

Code: Select all

||1.im.cz/ad/

vs

Code: Select all

http://1.im.cz/ad/
which is apprently better? :?

Re: [Roadmap] Faster filter matching algorithm

Posted: Sun Oct 03, 2010 3:45 pm
by Wladimir Palant
fanboy wrote:Not sure why this is considered slow:

Code: Select all

||1.im.cz/ad/
Because all keywords here ("1", "im", "cz", "ad") are too short, Adblock Plus needs at least 3 characters.
fanboy wrote:

Code: Select all

http://1.im.cz/ad/
which is apprently better? :?
Here "http" can be used as a keyword - which is only marginally better because it can be found in almost every URL. So probably better leave that filter as is - that should be a rare case where there is no meaningful way to fix it.