Page 1 of 2

[Done] More flexible anchors

Posted: Fri Jan 02, 2009 11:19 pm
by Wladimir Palant
A suggestion that came from the Mozilla Russia forum - make anchors more flexible so that one can omit the protocol and subdomains. It boils down to filters like "||example.com/foo.gif" that would match "http://example.com/foo.gif", "https://bar.example.com/foo.gif" but not "http://oneexample.com/foo.gif" or "http://redirect.com/?http://example.com/foo.gif". Sounds like a good generalization of the anchors, would help with the Malware Domains list for example.

Internally, a filter like "||example.com/" would be translated into /^[\w\-]+:\/\/(?:[^\/]+\.)?example\.com\//

Thoughts, opinions?

Posted: Mon Jan 05, 2009 7:14 pm
by Fox
I would like that.

Is filter like:
@@||example.com/

Then site whitelisting rule or item whitelisting rule.

Posted: Mon Jan 05, 2009 7:33 pm
by Wladimir Palant
@Fox: Site whitelisting rules should specify $document flag explicitly. It is done automatically for filters starting with http:// but that's mostly for backwards compatibility - I would rather not make it more complicated.

Posted: Mon Jan 05, 2009 7:39 pm
by MonztA
Fox wrote:I would like that.
I second that. :)

Posted: Fri Jan 09, 2009 9:07 pm
by Wladimir Palant
Just got a mail asking for improvement of the "Disable on foo.com" menu item - it shouldn't require disabling on each subdomain. So maybe add a third option there: "Disable on *.foo.com" that will add the filter "@@||foo.com/$document", what do you think? It should offer disabling on the effective first-level domain meaning *.foo.com if the user is on bar.foo.com and *.foo.co.uk if he is on bar.foo.co.uk.

I dislike adding more options there but this new option really cannot replace any of the existing options.

Posted: Wed Jan 21, 2009 11:50 am
by Ervin
How about |protocol|domain|path? This would even allow subdomain detection. So

Code: Select all

@@|https||
Would unblock any https://* URLs,

Code: Select all

||example.com|
would block any domain and subdomain of example.com, and

Code: Select all

|||*banner*
would block any URL that contains "banner" but not in the domain or protocol.

One thing remains. What if nonstandard port is used like http://www.example.com:8080/index.html

Posted: Wed Jan 21, 2009 12:02 pm
by Ervin
Ervin wrote:How about |protocol|domain|path?
Or "|protocol|domain|port|path". That would solve the port problem for an extra pipe.

Posted: Wed Jan 21, 2009 1:17 pm
by Wladimir Palant
Ervin, I think you are overcomplicating things now. I don't see a real use case for your suggestion.

Posted: Wed Jan 21, 2009 2:06 pm
by Ervin
Wladimir Palant wrote:Ervin, I think you are overcomplicating things now. I don't see a real use case for your suggestion.
Agreed, but I do think this syntax would be much cleaner. If you think in anchors, each pipe character would mean a specific boundary in the URL. Also this is a superset of the original suggestion. But this was just my thought.

Posted: Sat Jan 24, 2009 8:45 pm
by Stupid Head
Fox wrote:I would like that.
Me too.

Posted: Fri May 01, 2009 2:39 am
by Ares2
Is this on the ABP 1.1 to-do list? :)

Posted: Fri May 01, 2009 1:29 pm
by Wladimir Palant
Yes, it is.

Posted: Fri May 01, 2009 1:38 pm
by Fox
would it be good idea to have later || too.
later || would mean
: and /
then filter like:
.example.com||
would block these:
.example.com:8080
.example.com:81
.example.com/

But not:
.example.com.au

Posted: Tue May 05, 2009 8:14 am
by Wladimir Palant
Done: http://hg.mozdev.org/adblockplus/rev/cf31fbc930ab

@Fox: Not sure whether this should be done, will think about it.

Posted: Wed May 06, 2009 9:05 am
by Wladimir Palant
I was too eager to mark this as "done". There is still work left:

* "Disable on foo.com" should create a filter with flexible anchor
* Filter composer should be able to use flexible anchors (should that be the default for all suggestions?)
* Filter export from preferences should set ABP version to 1.1 if flexible anchors are found

Concerning flexible anchors at the end of the filter, I thought that the following definition would make sense:

foo|| means that "foo" should either be at the end of the address or it should be followed by a separator character. Separator characters are all characters but letters (need to recognize international letters somehow), digits, underscore, period, -, %. So || will be translated into something like:

([^\w\.\-%]|$)

So "||example.com||" will match "http://example.com/foo" and "http://example.com:1234/foo" but not "http://example.company.com/" and not "http://example.com.com/". Similarly "||example.com/foo||" will match "http://example.com/foo" and "http://example.com/foo/bar" but not "http://example.com/foobar".

Opinions?