How reliable are Mozilla's performance measurements? · 2011-04-07 15:38 by Wladimir Palant

One of the replies to my previous blog post prompted me to take a closer look at the raw data from the add-on performance measurements. Can the numbers displayed on the blame list be trusted at all?

The good news: they are not entirely wrong. However, there are obvious issues:

Add to this the two issues I already mentioned last time (results being skewed on Windows for some extensions that aren’t being extracted correctly and extensions being tested uninitialized) and you get the idea. There is probably a large number of Top 100 extensions that either didn’t make the list or got a better score due to testing framework bugs. And some other extensions probably got a worse score because of being run unpacked which in reality never happens in Firefox 4.

Now to the actual data. Since I couldn’t compare Adblock Plus results I found two extensions on the list where the current version was released before March 26th, so both test runs tested the same extension version. I didn’t consider earlier test runs because they were performed with Firefox 3.6. FlashGot (50% slowdown) was the first extension at the top of the list to meet the criteria, and Download Statusbar (14% slowdown) is located near the middle of the list.

Test run Reference time (no extensions) FlashGot 1.2.8.5 Download Statusbar 0.9.8
Windows 7 on March 26th 548.89 617.89 +12.6% 625.0 +13.9%
Windows 7 on April 2nd 541.89 617.63 +14.0% 625.63 +15.4%
Windows XP on March 26th 399.79 473.11 +18.3% 482.47 +20.7%
Windows XP on April 2nd 401.21 471.32 +17.5% 489.05 +21.9%
Mac OS X on March 26th 694.79 1677.47 +141.4% 706.21 +1.6%
Mac OS X on April 2nd 699.58 1706.16 +143.9% 722.05 +3.2%
Fedora Linux on March 26th 498.37 642.53 +28.9% 593.89 +19.2%
Fedora Linux on April 2nd 495.95 621.21 +25.3% 588.89 +18.7%

The most interesting part here are the Mac OS X results of course. I’m not sure where the horrible FlashGot performance on OS X comes from. The test log contains a bunch of error messages that indicate a bug in the extension, probably triggered by the unusual configuration of the test machines (if I read it correctly, FlashGot is trying to write to the temp directory which is probably forbidden). It would have been useful if AMO provided extension developers with links to the test logs, finding and fixing such bugs is otherwise a very difficult task.

Download Statusbar results on OS X are also interesting — they are unrealistically low. Either Firefox on OS X does some things radically different than the Windows/Linux version or the extension is simply broken and doesn’t do anything. Or maybe that’s another case where an extension is being tested while disabled. No idea.

If you take OS X out of the equation however it is notable that the slowdowns caused by extensions don’t seem to be proportional to Firefox start-up times at all, they are actually almost the same on all platforms. Consequently, you get higher percentages on platforms where the reference startup time is lower. This even stays true on different hardware: when I tested Adblock Plus on my laptop the slowdown introduced by Adblock Plus was the same as on the Talos machines (even somewhat lower) while Firefox start-up time almost doubled. Under these conditions, does it even make sense to set these slowdowns in proportion to Firefox startup times? Wouldn’t it make more sense to give users an idea of the absolute scale?

All that said, the results of the two runs are remarkably similar. So once all the issues are fixed the numbers on the “blame list” shouldn’t be off by more than 2%. But it would have been nice if the obvious issues were fixed before going public with the results.

Tags:

Comment [4]

  1. Taras Glek · 2011-04-07 20:41 · #

    Hey,
    So the numbers you are testing with are warm startup. The really interesting startup numbers are cold startup which require you to flush the OS caches. Best way to achieve that is to reboot…or put firefox on a removable drive + unplug/replug it between test runs.

    Reply from Wladimir Palant:

    Yes, that’s another point that I didn’t even mention yet – Talos is measuring warm startup. Cold startup might even be a more common scenario however and it will be radically different. I would guess that extension initialization times would increase only very little for cold startup (at least when testing properly – with the extension inside a single XPI file) while Firefox startup times increase quite a bit. Which is one more reason this “X% slower startup” metric is misleading.

  2. Haploid · 2011-04-08 02:17 · #

    It seems intuitive that the slowdowns caused by extensions would be proportional to the startup time of Firefox. So I can see how someone would choose that as a metric to publish. Mining statistical data is an artform in itself, and I’m skeptical that you can really choose a single easy-to-understand metric to express a complex set of variables like this.

    One has to give props to the Mozilla guys for trying to move towards a benchmarking system for addons. With a bit more transparency and community feedback, I’m sure it will be a useful tool.

    Reply from Wladimir Palant:

    Intuitive – maybe. It doesn’t make it true however. The proportion between extension initialization and Firefox startup time varies wildly depending on platform, hardware and cold/warm startup. The current metric makes the overhead sound scarier than it really is, which makes an informed decision hard.

    I am all for moving forward. But I would like to see a little more consideration and some respect for the efforts of the add-on developers. These test results have been published as a fact, not work in progress. There was close to none useful information for the add-on authors on where these results come from and how to improve them (see FlashGot as an extreme example). Of course the results have been picked up by the media and already caused reputation damage for some add-ons. And now it turns out that the results have obvious bugs, of the kind that should have been recognized if anybody ever bothered to verify them. Sorry, I call that irresponsible.

  3. Marc · 2011-04-08 04:07 · #

    Whilst start-up performance has some importance, many reports in the media seem to be fudging this under a “slow extensions” headline. These figures only refer to warm start-up, and in all honesty this is less important than the effect the extension[s] have on run-time performance in my opinion.

    If an extension consistently slowed start-up by 20% due to pre-execution optimisations, but only caused a run-time penalty of 2%[, or a performance improvement due to blocking content], then this is better behaviour than a poorly optimised program that runs slower at run-time instead (say, a 20% consistent run-time hit). Surely this prior is more important to user experience.

    They seem to have this issue stood upon its head.

    Clearly their metrics are dodgy too, as you state in your analysis when they conclude that an add-on is appreciably slowing start-up time despite it being disabled.

  4. Haploid · 2011-04-08 12:00 · #

    @Marc Well, a few years ago I moved away from Firefox 3.something to Chromium for the reason that Firefox startup was ridiculously slower. Even without addons, so it was mostly a Firefox core problem, but I just wanted to indicate that startup time is a real factor.

    @Wladimir Granted, the Mozilla team should have shown more consideration for the addon developers. The volume and quality of addons available for Firefox are the single strongest selling point for the browser, and I hope they will remember that as they further develop their benchmarking suite.

Commenting is closed for this article.