Page 1 of 2

[Done] Protecting subscription downloads

Posted: Wed Jul 02, 2008 8:31 am
by Wladimir Palant
Apparently, corrupted subscription downloads aren't too uncommon - be it because of bad proxies, firewalls or whatever. Adblock Plus should make sure the download doesn't get altered - invalid downloads have to be rejected.

What I want to implement: if Adblock Plus finds a line like "! Hash: 123456" in a subscription, the hash value should be an MD5 hash of the subscription without this line and with all line breaks converted to Unix format. If hash validation fails the download will be considered as failed.

I think this is the best we can do short of serving subscriptions from HTTPS - the user should not get corrupted filters. Of course the hash line should be placed at the beginning of the file, so that it gets through even if the file is cut off.

Re: Protecting subscription downloads

Posted: Wed Jul 02, 2008 9:26 am
by chewey
Very good idea indeed. And easily integratable with my existing list generation automation :-)

Some sort of validator (via a web form?) might be useful though.

Posted: Wed Jul 02, 2008 2:13 pm
by Dr. Evil
hm... why not just check whether the Content-Length header and the length of the downloaded string* match?
* (converted back to a byte stream probably, but I think that's necessary for md5 as well)
chewey wrote:And easily integratable with my existing list generation automation :-)
What are you using for that? (I use my Adblock Plus Filter Uploader extension, another extension that adds rudimentary synchronization and a php script on the server)

Posted: Wed Jul 02, 2008 3:14 pm
by Wladimir Palant
Dr. Evil wrote:hm... why not just check whether the Content-Length header and the length of the downloaded string* match?
Because I don't trust proxy servers. I already found out that some of them messed up the Expires header which caused Adblock Plus to download subscriptions hourly. It would be easy for them to corrupt the download but "fix up" Content-Length header - or simply remove it.

PS: In some cases a firewall "censored" the contents of the download - I doubt that it changed the file length when it did that.

Posted: Wed Jul 02, 2008 3:16 pm
by chewey
Dr. Evil wrote:
chewey wrote:And easily integratable with my existing list generation automation :-)
What are you using for that?
A homebrew shell script, nothing really woth sharing because it is very specific to my situation.

When I have made changes in my "list development" SeaMonkey profile, I just type adftp in a shell.

This automatically extracts a properly formatted file from the patterns.ini, adds a time stamp, makes
a copy to my "filter list history", gzips the final adblock.txt and pushes it to my web server.

I'm thinking about including some kind of comment management as well, so I can archive reasons
for the changes - I don't do that as of yet.

Posted: Wed Jul 02, 2008 3:23 pm
by Wladimir Palant
chewey, will a Perl script to insert the MD5 hash do it for you? I'll have to write it anyway, for EasyList.

Posted: Thu Jul 03, 2008 1:55 am
by chewey
Wladimir Palant wrote:chewey, will a Perl script to insert the MD5 hash do it for you?
I would've gone for an extension of my multiple hacky shell pipes. ;-)

But it might be a good idea to rewrite my ugly shell stuff in Perl anyway, so yeah, sounds useful.

Posted: Thu Jul 03, 2008 7:57 am
by Wladimir Palant
Well, you can run Perl from a shell script if you don't want to rewrite everything ;)

Posted: Sun Oct 26, 2008 11:47 am
by Dr. Evil
I don't know if this is a good idea or not, but I thought I'd throw it in here...
The last bytes of a gzip file contain a crc32 hash (of the uncompressed data) and the file size. Firefox doesn't care about this when decoding, but if you stripped the "Content-Encoding" header from the channel and did the decoding yourself (or rather pass it to nsIStreamConverterService yourself), you could enforce these fields to match.

Posted: Mon Oct 27, 2008 11:06 am
by Wladimir Palant
I don't think decoding gzip data myself is a good idea. And I doubt it is possible to verify the checksum after the data was already uncompressed. But maybe Firefox can be changed to make the checksum available even if it doesn't consider it...

Posted: Mon Oct 27, 2008 5:15 pm
by Dr. Evil
Wladimir Palant wrote:I don't think decoding gzip data myself is a good idea.
It's not that much work. I'm doing the opposite (encoding) in the Filter Uploader.
But maybe Firefox can be changed to make the checksum available even if it doesn't consider it...
I wouldn't know any other way besides a faked http header. And that doesn't sound very clean.

Posted: Tue Oct 28, 2008 7:36 am
by Wladimir Palant
It is really not about the amount of work - with so many quirks around HTTP and broken server implementations I trust the browser (which had decades of development put into it) with things like this much more than I trust myself.

Posted: Wed Oct 29, 2008 4:26 pm
by Wladimir Palant
Done: http://hg.mozdev.org/adblockplus/rev/5fbd5e590515

There is also a reference script to add a checksum to a filter subscription: http://hg.mozdev.org/adblockplus/file/t ... hecksum.pl

I also want to make "Export filters" insert a checksum automatically. This checksum shouldn't be considered for "Import filters" (people who simply back up their filters might also change the file manually) but will be considered if the file is uploaded as a filter subscription.

Edit: All downloads from easylist.adblockplus.org now get the checksum added automatically.

Edit2: "Export filters" now inserts a checksum - http://hg.mozdev.org/adblockplus/rev/0ca1488c074a

Posted: Wed Oct 29, 2008 5:45 pm
by rick752
Cool!

Hopes this resolves those corrupt subscription downloads now :D

Posted: Wed Oct 29, 2008 6:34 pm
by Ares2
Just wanted to say addChecksum.pl is working fine here. :D

It doesn't matter if I remove the space before 'Checksum', does it?