[Done] Extending filter syntax

Various discussions related to Adblock Plus development
Fox
Posts: 300
Joined: Sat Jun 10, 2006 3:05 pm
Location: Finland

Post by Fox »

$frame syntax was not added?
i did try this */ads/*$frame
here, no frames there, but it blocks that Cars -picture.
So it seems to be invalid rule and it's then like */ads/*

And is it intended that invalid syntax rules, like:
*/ads/*$imaggge
is like: */ads/*
Wladimir Palant

Post by Wladimir Palant »

It is */ads/*$subdocument
And unknown options are ignored - it is just the same as if you didn't specify them at all.
Fox
Posts: 300
Joined: Sat Jun 10, 2006 3:05 pm
Location: Finland

Post by Fox »

Thank you.
ecjs
Posts: 170
Joined: Sun Jun 11, 2006 7:39 pm

Re: Opinions requested: extending filter syntax

Post by ecjs »

Wladimir Palant wrote:I would like to specify what types of elements filters apply to.
Could you please tell us what are the different types we can use ? As I use a french build, I cannot use the french words I've found in Adblock.

There are these ones for instance :
*$link
*$object
*$background
*$script
*$img
*$style

Wladimir Palant wrote:Furthermore I would like to treat third-party images/scripts/etc specially.
Is it already usable ?
Wladimir Palant

Post by Wladimir Palant »

I'm sorry, this really needs to be documented. The possible type options are:

other
script
image
stylesheet
object
subdocument
document
link
background

They are *not* localized. You can combine them like this:

Code: Select all

*/ads/*$script,image,object
You can also negate them:

Code: Select all

*/ads/*$~stylesheet
Finally there is one additional option which is match-case:

Code: Select all

*/Ads/*$match-case
Will match "http://server.com/Ads/something.gif" but not "http://server.com/ads/something.gif". Third-party option is not yet implemented, there are some issues with it.
User avatar
chewey
Posts: 501
Joined: Wed Jun 14, 2006 10:34 pm
Location: somewhere in Europe

Post by chewey »

Wladimir Palant wrote:I'm sorry, this really needs to be documented. The possible type options are:
[...]
Ah, perfect, thanks for clarifying this - I actually
was going to ask the same thing. Now I can incude those
in my documentation.
Dr. Evil

Post by Dr. Evil »

I think the idea is good, but I'd think it get a problem if advertisers would use URIs containing a $ symbol and/or \d. (I think there's the same problem with the pipe at the end of URIs.) I guess some escaping syntax could be useful.

The market share of users with Adblock Plus is not very high, of course. But I remember the much-less-known/used extension Layerblock already had problems with some advertisers (layer-ads.de) always modifieing there Layers slightly so these weren't blocked anymore. As it wouldn't hurt the advertisers when they would change their URIs in any way, I'd think some will do.
Wladimir Palant

Post by Wladimir Palant »

The dollar sign $ is only "special" if followed by a list and the pipe | is only treated specially if found at the start or end of the filter - not a problem yet. Btw, I've already seen somebody using "*.html" in the address - without much success :)

Yet a way to escape things somehow would be good in fact, I'm looking for solutions here.
Dr. Evil

Post by Dr. Evil »

It just came to my mind you could use the same escape syntax like http. I've myself not yet decided if it's a good or a bad idea. I just thought, I'd share it with you.

In the end this would mean using this:

Code: Select all

%7c	|
%23	#
%24	$
%2a	*
%5c	\
%2b	+
%25	%
The problem with this is surely that only advanced users would know of it and that it'd always be a problem to find out what the hex value of e.g. the # sign was again.


Maybe just a different approach of adding the extra information would be the best. Instead of modifieing the filter directly use another textbox at the right of it called "special" or something where you type in these informations (@@, |, ##, $ or anything coming in the future). This could also be extended to a dialog that allows the novice user to use these options.
This doesn't solve the problem with * (which matches itself anyway) and \d+ though.
Wladimir Palant

Post by Wladimir Palant »

I can't use this escaping because URLs are already escaped in this way :) And anyways, it is far too complicated.

My idea so far was somewhere along these lines:

Code: Select all

{4} => .{4}
{4,8} => .{4,8}
{d4,8} => \d{4,8}
{d*} => \d*
{*} => \*
{$} => \$
{[a-z]} => [a-z]
{[a-z]2} => [a-z]{2}
{x7B}{x7D} => \{\}
I don't like creating a second regular expressions syntax (especially the last escaping sequence is all but perfect) but I should need some of these for new features - and plain regular expressions are too difficult to parse.
Dr. Evil
Posts: 194
Joined: Fri Sep 08, 2006 3:51 pm

Post by Dr. Evil »

Well, the easiest mechanism from a user point of view would be the old backslash thing, I think. It's not that easy to program since combinations like \\\\\\\d have to become \\\[0-9] but it's possible with a few lines of code...
Wladimir Palant

Post by Wladimir Palant »

We finally have documentation on the options: http://adblockplus.org/en/filters#options
User avatar
Lucas Malor
Posts: 72
Joined: Wed Aug 23, 2006 7:34 am
Contact:

Post by Lucas Malor »

I would ask if $third-party will be implemented in a future... above all I'm interested in filters like this:

Code: Select all

www.site1.com/someannoyingstuff$~third-party
so I could apply a specific filter only for one domain, speeding up the filtering for the other domains that don't need that filter.
Wladimir Palant

Post by Wladimir Palant »

Yes, I intend to implement third-party though maybe not for 0.7.2 (very short on time here). Hoever, it won't give you any speedup the way everything works right now.
User avatar
Lucas Malor
Posts: 72
Joined: Wed Aug 23, 2006 7:34 am
Contact:

Post by Lucas Malor »

Hoever, it won't give you any speedup the way everything works right now.
It's because 50 or 500 filters has the same filtering speed with the new system?
Locked