Thoughts on using asm.js for performance bottlenecks in browser extensions · 2013-06-26 13:55 by Wladimir Palant
The advent of asm.js seems to make these considerations obsolete. That C++ library can be compiled with Emscripten into asm.js. The same code could then run on any platform and would be fast in both Firefox and Chrome. And the best of it: it would still run fine in older Firefox versions, just not quite as fast. So far the theory, now it’s time to dive into the gory details.
Trying out Emscripten
On the first glance, Emscripten is a collection of Python scripts that can be “installed” simply by checking out the repository. Unfortunately, the requirements listing goes on listing several other packages that need to be installed: LLVM, Clang and node.js. Installing these on Linux is simple enough but the versions available for my Linux distribution were too old. Not a big issue for me of course but enforcing these requirements to build Adblock Plus is a certain way to have no contributors.
emcc -O2 -s ASM_JS=1 test.cpp generates rather big files, at least 60 kB even for trivial code. A quick glance shows that most of it is boilerplate code, including lots of code to adapt to various environments: node.js, web page, web worker, shell. Can’t the target environment (shell is probably most appropriate for extension code) be specified at compile time? I couldn’t find any setting to do that.
As soon as you touch functionality provided by C++ the code size balloons up. In particular, doing anything with
std::string generates more than 500 kB of code. This probably means that the standard library is off-limits for most extensions, the extension package will otherwise simply grow too big. Unfortunately, this makes the entire solution significantly less convenient.
Next I had to figure out how other extension code can communicate with the asm.js code. Emscripten documentation lists two options: the less convenient approach via
Method.cwrap() and the more convenient
The important question for me was: how do you pass in a string into the compiled code? Turned out that the
string parameter type is somewhat counter-intuitively mapped to
char* in C++ rather than
A look at the raw asm.js
Looking into the specification, the biggest challenge here seems to be working with the heap. Variables in asm.js can only use primitive integer and floating point types. Anything beyond that (arrays, structured types, strings) needs to be allocated in the typed array representing the heap. And “allocation” means manual memory management, you essentially have to reimplement the
The other gotcha: heap size is fixed. asm.js code has to get along with a single buffer and cannot change its size (that affects Emscripten-generated code as well of course). Problem with that: I have no way of telling how much memory Adblock Plus will need, it depends largely on the user settings. It might be possible to automatically create new asm.js modules with different heaps if the memory in the original module runs low — coordinating between different modules handling the same data won’t be easy however.
Finally, something that might not be a big deal for other people but is one for Adblock Plus: there are no hash tables. Not using them in Adblock Plus is impossible, they are an essential part of the algorithms. This means however that hash tables need to be reimplemented in asm.js. And I definitely wouldn’t consider that fun.
All in all, making asm.js work to speed up extensions seems to be a complicated task. It’s hard to imagine that Emscripten will find much use in extensions at the moment, the generated code is simply way too large and the string conversion costs too much performance. What I would rather expect to work would be a compiler for a relatively simple superset of asm.js, breaking it down to regular asm.js. Not much is needed to make asm.js usable in hand-written code: type declarators for variables, structured types and namespaces/objects seem to be the most important omissions. A few generic helpers to manage the heap automatically and use hash tables should allow usable code to be produced then. Still, not being able to allocate additional memory is something that I see as a critical issue that needs to be addressed in the standard.
Commenting is closed for this article.