More strict separators

Various discussions related to Adblock Plus development
Post Reply
Ares2
Posts: 1275
Joined: Fri Feb 15, 2008 12:47 pm

More strict separators

Post by Ares2 »

From the documentation:
Often you need to accept any separator character in a filter.[...]Separator character is anything but a letter, a digit, or one of the following: _ - . %
When looking at the available subscriptions and especially now that we have the new algorithm for the shortcuts that requires using non-alphanumeric delimiters, it could (not yet tested in terms of false-positives) make things a lot easier if there would be a separator that includes underscore, hyphen and dot. I guess % should remain except due to it's special meaning in html urls.

Example:

Code: Select all

/exampleads/*
/exampleads-
/exampleads.
_exampleads.
_exampleads_
?exampleads=
All of those structures are not theoretical, but pretty common all around. Also, looking at the subscriptions, you can find many examples of these "duplicates" proving that. Yet only the first and the last could be combined to ^exampleads^ currently, which limits the the real-world use to few rare situations (like ports in urls).

Therefore I'm suggestion either modifying ^ or introducing a more inclusive separator.
Weirdo

Re: Feature request: Separator without exceptions

Post by Weirdo »

*BUMP*
I'm not a fan of posts not being replied.
User avatar
Hubird
Posts: 2850
Joined: Thu Oct 26, 2006 2:59 pm
Location: Australia
Contact:

Re: Feature request: Separator without exceptions

Post by Hubird »

Weirdo, I have no objections to posts being bumped if need be but you could at least give people time to respond (you bumped it the same day).
Ares2
Posts: 1275
Joined: Fri Feb 15, 2008 12:47 pm

Re: Feature request: Separator without exceptions

Post by Ares2 »

Weirdo wrote:*BUMP*
I'm not a fan of posts not being replied.
Can you please stop abusing my post to get your point across?

Now to be constructive: You have posted your Songbird question on Friday evening Central European Time, Wladimir is known to be off during the weekend, so please wait until Monday 12:00 UTC - bumping is not going to achieve anything.
Wladimir Palant

Re: Feature request: Separator without exceptions

Post by Wladimir Palant »

@Ares2: The main use case for separators is domain names, hence this list of exceptions (dots, hyphens and underscores are allowed in domain names). Now this is suboptimal for other scenarios of course, we already discussed a very similar proposal in the Mozilla Russia forum. My main concern is that once I add another separator it will turn out to be suboptimal in other scenarios - and we will continue serving special cases. Not to mention that multiple kinds of separators will be confusing.

Let's move this to Future Development and see what other people think.
Wladimir Palant

Re: More strict separators

Post by Wladimir Palant »

I think the most straightforward thing to implement would be ^^ separators (with filters like ^^example^^) that are more strict than the single separator symbol - hyphen, underscore and dot would count as a separator as well. Question: would that be enough to cover the majority of problematic cases? Or are more special character classes required (in which case we should rather consider something more generic)?
Ares2
Posts: 1275
Joined: Fri Feb 15, 2008 12:47 pm

Re: More strict separators

Post by Ares2 »

Well, in combination with the new shortcut rules, we could finally create general rules that deserve the name: It's clear that if we add a rule like "/adsframe.", we actually consider "adsframe" a bad word that can be filtered, there just isn't a way to express that yet without falling back to regexp.
Wladimir Palant wrote:Question: would that be enough to cover the majority of problematic cases?
There is one common real-world thing that wouldn't be covered - adsframe2, but I really don't know if it would be good or too much if digits are included as well (also, the shortcut creation would have to be limited to letters only).
Wladimir Palant wrote:Or are more special character classes required (in which case we should rather consider something more generic)?
Wouldn't that be a problem with the shortcuts? Are you thinking of forum/viewtopic.php?t=1069 again?
Wladimir Palant

Re: More strict separators

Post by Wladimir Palant »

Yes, I am not too keen on adding generic character classes. If we can avoid it we should.
LorenzoC

Re: More strict separators

Post by LorenzoC »

I guess this topic is again related to performance.
I mean, what is the difference in performance between having:
.adserver_
.adserver/*
.adserver-
.adserver.
/adserver.
-adserver/*
etc etc
compared to something like:
!adserver!
where ! is the hypothetic character that includes all the above cases?
Wladimir Palant

Re: More strict separators

Post by Wladimir Palant »

No real performance difference. This is mostly about maintainability of filter lists.
Post Reply