[Done] Protecting subscription downloads

Various discussions related to Adblock Plus development
Wladimir Palant

Post by Wladimir Palant »

No, the space before "Checksum" doesn't matter. The regular expression used to recognize the checksum is:

Code: Select all

/!\s*checksum[\s\-:]+([\w\+\/]+)/
User avatar
Stupid Head
Posts: 214
Joined: Sat Aug 26, 2006 8:11 pm
Location: USA

Post by Stupid Head »

Not a big fan of Perl, so I thought I'd post these alternatives as a reference. DATA is a [add: UTF-8] string of the entire adblock.txt [add: with normalized unix linebreaks]. For some reason, there are two trailing equal signs in the checksums from PHP and Python, but not from Perl.

PHP:

Code: Select all

echo "! Checksum: ".rtrim(base64_encode(md5($DATA, true)), "=")."\n"
Python (2.5 and later):

Code: Select all

import base64, hashlib
print "! Checksum: " + base64.b64encode(hashlib.md5(DATA).digest()).rstrip("=")
If there are no problems, I'm going to add the checksum to my list soon.
Last edited by Stupid Head on Thu Oct 30, 2008 10:09 pm, edited 3 times in total.
What, me worry?
Wladimir Palant

Post by Wladimir Palant »

Yes, Digest::MD5 has its own base64 variation, without the "=" signs at the end. That's documented.

The checksum generators look correct - but they should also normalize line breaks (strip CR aka "\r" and replace multiple LF aka "\n" symbols in a row by one). That's to prevent irrelevant changes to the file (converting between different line ending styles, inserting empty lines) from changing the checksum.

And, of course, the generators should always be applied to UTF-8 encoded data. Maybe I should extend the reference script with a check for valid UTF-8.
User avatar
Stupid Head
Posts: 214
Joined: Sat Aug 26, 2006 8:11 pm
Location: USA

Post by Stupid Head »

So Adblock Plus assumes utf-8... That explains why Korean text displays correctly in ABP even though it's served as Latin-1 by the EasyList server.
What, me worry?
Wladimir Palant

Post by Wladimir Palant »

I was actually wondering about that as well. That is not something I did, apparently XMLHttpRequest uses UTF-8 as default. You should be able to set a different character set explicitly - which doesn't change the fact that the checksum has to be calculated for the UTF-8 representation of the text.
lovelywcm
Posts: 5
Joined: Sun Jun 22, 2008 7:20 am
Location: Beijing, China

Post by lovelywcm »

Although unlikely, the checksum itself can be changed or simply removed.

Is it possible to sign a list with gpg, then verify it after download (include maintainer's public key along with ablockplus.xpi)?
Wladimir Palant

Post by Wladimir Palant »

This is not about malicious manipulations - if you worry about that you should use HTTPS (yes, StartSSL has HTTPS certificates for free). The point is simply to make sure that various antiviruses, firewalls and bad proxy servers don't interfere with the download. That's also the reason why the checksum is the very first "filter" - if the download is cut off the checksum will still be there.
lovelywcm
Posts: 5
Joined: Sun Jun 22, 2008 7:20 am
Location: Beijing, China

Post by lovelywcm »

Well, Google code where ChinaList is hosted don't allow anonymous users to download a raw file via https.

Thanks God, as far as now ChinaList hasn't been attacked, so I don't need to move it to another place.
Locked