Page 1 of 4

[Done] Extending filter syntax

Posted: Fri Jun 09, 2006 12:53 am
by Wladimir Palant
I was thinking adding another extension to the filter syntax - annotations. I would like to specify what types of elements filters apply to. Furthermore I would like to treat third-party images/scripts/etc specially. Examples:

*banner*$third-party - block anything containing the word "banner" but only if it is third party (thanks to Antares for this feature request)
*banner*$image,object - block images and objects containing the word "banner"
*/ads/*$~style - block anything containing "/ads/" unless it is a stylesheet
/banner\d+/$image,third-party - block third-party images containing "banner" followed by a number

What do you think? Is it worth implementing?

Note: This is an advanced feature targeted at advanced users and filter list maintainers (especially the latter). However, adding general blocking filters for all third-party images, scripts etc (e.g. *$image,third-party) should be possible via menu.

PS: And a totally weird idea - *$image,domain=server.com would allow us to import entries from permission manager. This syntax would be inconsistent however, element hiding specifies domains in a different way.

Re: Opinions requested: extending filter syntax

Posted: Fri Jun 09, 2006 1:06 am
by VF
Wladimir Palant wrote:*banner*$image,object - block images and objects containing the word "banner"
Good idea. We already had some problems with yahoo putting an important css file in their banner directory.

Posted: Fri Jun 09, 2006 1:22 am
by IceDogg
I like the idea, will it slow page rendering any more at all?

Posted: Fri Jun 09, 2006 2:06 am
by Wladimir Palant
@IceDogg: I think this should have minimal impact on performance if at all. On the one hand we have an additional (cheap) check for type and/or third-party, on the other hand we might save an expensive regexp check.

@VF: Yes, that's exactly what I meant with */ads/*$~style :)

Posted: Fri Jun 09, 2006 2:16 am
by IceDogg
Thanks WP, That was the only concern I had with it. So, with that out of the way I say it's worth doing. IMO

Posted: Fri Jun 09, 2006 10:18 am
by Fox
if filter *$script,third-party blocks all third party scripts, then this is a feature i have always wanted :)

Posted: Fri Jun 09, 2006 10:32 am
by Wladimir Palant
@Fox: This feature is on the plan and it will be there regardless what. The question is only whether it should be generalized in this way.

Re: Opinions requested: extending filter syntax

Posted: Fri Jun 09, 2006 10:42 am
by chewey
Wladimir Palant wrote:I was thinking adding another extension to the filter syntax - annotations. I would like to specify what types of elements filters apply to. Furthermore I would like to treat third-party images/scripts/etc specially.
I like it :-)

Just one thing: Can we have negation too?
i.e. filter /foo/ except when it is third-party/image/script...

Posted: Fri Jun 09, 2006 10:49 am
by Fox
Is it then possible to use whitelisting with those, i mean...
@@|http://www.example.com/*$~object
So example.com is whitelisted, but first or third party objects are not,
and objects there are then blocked if blocking rules blocks them.

If that is possible, then maybe this helps those that think
@@|http://www.mozillazine.org/
should only whitelist mozillazine, but not google ads.
So if it's possible, then they can use something like
@@|http://www.mozillazine.org/*$~third-party
So mozillazine.org is then whitelisted, but not third party stuff,
and those are blocked if blocking rules blocks them.

Posted: Fri Jun 09, 2006 11:00 am
by Wladimir Palant
@chewey: Yes, negation is one of the examples - I suggested a tilde (~) before the annotation to negate it. Just not exactly sure what to do when regular and negated types are mixed: *$image,~object or even *$object,~object (in the first case "image" will probably have no effect and in the second the filter should be equivalent to "*").

@Fox: Yes, whitelisting filters can use all the same syntax, always.

Posted: Fri Jun 09, 2006 1:19 pm
by chewey
Wladimir Palant wrote:@chewey: Yes, negation is one of the examples
Aw bugger - that'll teach me to stop reading examples after the second line. Sorry for that.
Wladimir Palant wrote:Just not exactly sure what to do when regular and negated types are mixed: *$image,~object or even *$object,~object (in the first case "image" will probably have no effect and in the second the filter should be equivalent to "*")
Treat them strictly boolean. Sure, it allows for useless things to be done (as in your "object, ~object" example), but I think this is OK - Especially as this is an "advanced users only" syntax. You should definitely know what you do when using this.

However, I'm not sure your examples are logically consistent:

*banner*$image,object - block images and objects containing the word "banner"
This treats the separating "," as an OR.

/banner\d+/$image,third-party - block third-party images containing "banner" followed by a number
Here, according to your description, it is an AND.

Maybe we could extend the syntax to "...$image+third-party" for AND and "...$image, third-party" for OR?

Posted: Fri Jun 09, 2006 1:25 pm
by Wladimir Palant
@chewey: Yep, I'm aware of that. Any types listed are OR'ed. *Negated* types are AND'ed. third-party or ~third-party is always AND'ed to the rest.

As there is only one way to do this that makes sense - do we need to reflect this in the syntax?

Posted: Fri Jun 09, 2006 1:47 pm
by chewey
Wladimir Palant wrote:Any types listed are OR'ed. *Negated* types are AND'ed. third-party or ~third-party is always AND'ed to the rest.
Ah, I see.

Wladimir Palant wrote:As there is only one way to do this that makes sense - do we need to reflect this in the syntax?
I think so.

This breaks the "striclty boolean rule" I made up previously:

Although your definition is non-ambiguous, it changes the way a "," works according to its surroundings by intorducing additional (invisible!) boolean properties for some of the selectors. This can be rather confusing (it already did confuse me :-) ). This syntax would heavily rely on documentation (and people actually reading it - I'm not sure whether this is a bad thing... ;-) ) while producing unexpected results for users not aware of the special properties of some of the type-selectors.

My personal opinion is that the syntax should assist human-readability.
The status quo looks a little obfuscated to me.

Posted: Fri Jun 09, 2006 1:55 pm
by Guest
I do not know what I should think about such "features". On the one hand it would help filterset authors. But on the other hand it makes filterlist creation/filter possibilities much more complex. Already today creating filters is not easy, because many people does not use wildcards. But when this would be introduced, this would become a complete overkill; even skilled users will be confused.
And your aim is to keep adblock plus simple, not complex.

And because I like the idea, I have a suggestion:
Make a new column, where such options can be saved. Generally it would be clear or set to "always". But the user can edit this aditional field and set a limitation (e.g. "image" or "image,object"). And to make this feature also accessible to new users, you could build a wizard, which you has already planned, for creating new filters. There this limitations could be set by using checkboxes or similar.

Posted: Fri Jun 09, 2006 2:05 pm
by Peng
Anonymous wrote:And because I like the idea, I have a suggestion:
Make a new column, where such options can be saved. Generally it would be clear or set to "always". But the user can edit this aditional field and set a limitation (e.g. "image" or "image,object"). And to make this feature also accessible to new users, you could build a wizard, which you has already planned, for creating new filters. There this limitations could be set by using checkboxes or similar.
The thing about columns is that they take up space on every single line, even if they don't apply to that line.

And also, columns wouldn't work for filterset subscriptions, unless this column was just a UI to adding "$image,object" or whatever.