Not a big fan of Perl, so I thought I'd post these alternatives as a reference. DATA is a [add: UTF-8] string of the entire adblock.txt [add: with normalized unix linebreaks]. For some reason, there are two trailing equal signs in the checksums from PHP and Python, but not from Perl.
Yes, Digest::MD5 has its own base64 variation, without the "=" signs at the end. That's documented.
The checksum generators look correct - but they should also normalize line breaks (strip CR aka "\r" and replace multiple LF aka "\n" symbols in a row by one). That's to prevent irrelevant changes to the file (converting between different line ending styles, inserting empty lines) from changing the checksum.
And, of course, the generators should always be applied to UTF-8 encoded data. Maybe I should extend the reference script with a check for valid UTF-8.
I was actually wondering about that as well. That is not something I did, apparently XMLHttpRequest uses UTF-8 as default. You should be able to set a different character set explicitly - which doesn't change the fact that the checksum has to be calculated for the UTF-8 representation of the text.
This is not about malicious manipulations - if you worry about that you should use HTTPS (yes, StartSSL has HTTPS certificates for free). The point is simply to make sure that various antiviruses, firewalls and bad proxy servers don't interfere with the download. That's also the reason why the checksum is the very first "filter" - if the download is cut off the checksum will still be there.