More strict separators

Various discussions related to Adblock Plus development

More strict separators

Postby Ares2 » Sat Mar 19, 2011 5:09 pm

From the documentation:
Often you need to accept any separator character in a filter.[...]Separator character is anything but a letter, a digit, or one of the following: _ - . %


When looking at the available subscriptions and especially now that we have the new algorithm for the shortcuts that requires using non-alphanumeric delimiters, it could (not yet tested in terms of false-positives) make things a lot easier if there would be a separator that includes underscore, hyphen and dot. I guess % should remain except due to it's special meaning in html urls.

Example:
Code: Select all
/exampleads/*
/exampleads-
/exampleads.
_exampleads.
_exampleads_
?exampleads=

All of those structures are not theoretical, but pretty common all around. Also, looking at the subscriptions, you can find many examples of these "duplicates" proving that. Yet only the first and the last could be combined to ^exampleads^ currently, which limits the the real-world use to few rare situations (like ports in urls).

Therefore I'm suggestion either modifying ^ or introducing a more inclusive separator.
Ares2
 
Posts: 1275
Joined: Fri Feb 15, 2008 1:47 pm

Re: Feature request: Separator without exceptions

Postby Weirdo » Sun Mar 20, 2011 1:35 am

*BUMP*
I'm not a fan of posts not being replied.
Weirdo
 

Re: Feature request: Separator without exceptions

Postby Hubird » Sun Mar 20, 2011 2:11 am

Weirdo, I have no objections to posts being bumped if need be but you could at least give people time to respond (you bumped it the same day).
User avatar
Hubird
 
Posts: 2850
Joined: Thu Oct 26, 2006 2:59 pm
Location: Australia

Re: Feature request: Separator without exceptions

Postby Ares2 » Sun Mar 20, 2011 3:03 am

Weirdo wrote:*BUMP*
I'm not a fan of posts not being replied.

Can you please stop abusing my post to get your point across?

Now to be constructive: You have posted your Songbird question on Friday evening Central European Time, Wladimir is known to be off during the weekend, so please wait until Monday 12:00 UTC - bumping is not going to achieve anything.
Ares2
 
Posts: 1275
Joined: Fri Feb 15, 2008 1:47 pm

Re: Feature request: Separator without exceptions

Postby Wladimir Palant » Mon Mar 21, 2011 1:29 pm

@Ares2: The main use case for separators is domain names, hence this list of exceptions (dots, hyphens and underscores are allowed in domain names). Now this is suboptimal for other scenarios of course, we already discussed a very similar proposal in the Mozilla Russia forum. My main concern is that once I add another separator it will turn out to be suboptimal in other scenarios - and we will continue serving special cases. Not to mention that multiple kinds of separators will be confusing.

Let's move this to Future Development and see what other people think.
Wladimir Palant
ABP Developer
 
Posts: 8395
Joined: Fri Jun 09, 2006 6:59 pm
Location: Cologne, Germany

Re: More strict separators

Postby Wladimir Palant » Mon Mar 21, 2011 1:34 pm

I think the most straightforward thing to implement would be ^^ separators (with filters like ^^example^^) that are more strict than the single separator symbol - hyphen, underscore and dot would count as a separator as well. Question: would that be enough to cover the majority of problematic cases? Or are more special character classes required (in which case we should rather consider something more generic)?
Wladimir Palant
ABP Developer
 
Posts: 8395
Joined: Fri Jun 09, 2006 6:59 pm
Location: Cologne, Germany

Re: More strict separators

Postby Ares2 » Mon Mar 21, 2011 6:03 pm

Well, in combination with the new shortcut rules, we could finally create general rules that deserve the name: It's clear that if we add a rule like "/adsframe.", we actually consider "adsframe" a bad word that can be filtered, there just isn't a way to express that yet without falling back to regexp.

Wladimir Palant wrote:Question: would that be enough to cover the majority of problematic cases?

There is one common real-world thing that wouldn't be covered - adsframe2, but I really don't know if it would be good or too much if digits are included as well (also, the shortcut creation would have to be limited to letters only).

Wladimir Palant wrote:Or are more special character classes required (in which case we should rather consider something more generic)?

Wouldn't that be a problem with the shortcuts? Are you thinking of viewtopic.php?t=1069 again?
Ares2
 
Posts: 1275
Joined: Fri Feb 15, 2008 1:47 pm

Re: More strict separators

Postby Wladimir Palant » Mon Mar 21, 2011 6:09 pm

Yes, I am not too keen on adding generic character classes. If we can avoid it we should.
Wladimir Palant
ABP Developer
 
Posts: 8395
Joined: Fri Jun 09, 2006 6:59 pm
Location: Cologne, Germany

Re: More strict separators

Postby LorenzoC » Tue Mar 22, 2011 2:10 pm

I guess this topic is again related to performance.
I mean, what is the difference in performance between having:
.adserver_
.adserver/*
.adserver-
.adserver.
/adserver.
-adserver/*
etc etc
compared to something like:
!adserver!
where ! is the hypothetic character that includes all the above cases?
LorenzoC
 

Re: More strict separators

Postby Wladimir Palant » Tue Mar 22, 2011 3:08 pm

No real performance difference. This is mostly about maintainability of filter lists.
Wladimir Palant
ABP Developer
 
Posts: 8395
Joined: Fri Jun 09, 2006 6:59 pm
Location: Cologne, Germany


Return to Adblock Plus development

Who is online

Users browsing this forum: No registered users and 2 guests