Features Download
From: Alon Zakai <alonzakai <at> gmail.com>
Subject: Prototype of an LLVM IR => C compiler ("c backend")
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Monday 7th April 2014 23:55:51 UTC (over 3 years ago)
Hi everyone,

I wrote a small proof-of-concept "C backend" (see later for why the quotes
are there). It allows compiling LLVM IR into C. It works on top of the
Emscripten asm.js backend, basically doing an AST-level transform on the
asm.js output and turning that into C, so the process is

C/C++  =>  LLVM IR  =>  asm.js  =>  C

Hence the quotes before, it is not currently written as an LLVM backend.
However, if there is interest, this quick hack could be refactored into a
backend. Basically Emscripten's current backend, which emits asm.js, could
be refactored to emit either asm.js or C (all that is needed is to allow
customization of the output code in the areas where asm.js and C look
different; the differences are almost all purely superficial).

The goal of the project was to check for feasibility - while the Emscripten
backend emits asm.js which is very parallel to C, it has a particular form
that might in theory prevent efficient compilation to C. For example, all
memory accesses are inside a single large array. The good news is that it
looks like those issues are not showstoppers, and performance is quite good
as well:

benchmark    x slower than original
copy                  1.03
corrections           1.00
fannkuch              1.17
fasta                 0.81
memops                1.00
primes                1.01
skinning              1.06
box2d                 1.04
zlib                  1.19

Numbers are how much slower the C backend output is, when compiled
natively, compared to the original source also compiled natively. So 1.03
means the C backend output is 3% slower, etc.

I'm not sure why fasta becomes 19% faster, that is quite puzzling (I
verified the output is correct, so it isn't just running a different code
path), but the other results show slowdowns between 0%-19%, with something
like an average 10% slowdown. So the C++->LLVM IR->asm.js->C route
preserves performance very close to the original, and in theory this could
allow things like compiling c++11 code to C which can then run on platforms
without c++11 support (like the xbox).

Some limitations:

 * asm.js is a 32-bit arch, so the output is 32-bit. It compiles with -m32
on 64-bit systems though.
 * Emscripten's output uses the Emscripten system headers, portable libc,
etc., and those are used in the output as well. It connects to the native
libc for actual printf and stuff like that, but handles almost all libc
stuff itself. This could be changed though.
 * The generated C code is standalone, it can't be linked with other C or
C++ code.

The Emscripten LLVM backend is still a work in progress and not upstream,
but if there is interest in a C backend based on it, we could work towards
that and hopefully upstream both at some point. What the proof of concept
shows so far is that overall this compilation approach works well enough to
be the basis for a C backend with decent performance.

Code is in emscripten's 'c_backend' branch,

Thoughts&feedback welcome, I hope this is interesting to some people.

- Alon
CD: 43ms