Title: Clang versus GCC: binary size Date: 2017-07-08 Modified: 2017-07-08 Category: Blog Slug: clang-vs-gcc-binary-size Tags: c, clang, gcc Summary: In which I compare sizes (of binaries) and find, as ever, that mileages vary
Currently, there is hardly a less resolvable argument than the old 'GCC versus Clang' one, unless you count Emacs versus Vim (or perhaps Linux-based versus BSD-based). The reasons for this range from the constant work on both compilers to try and one-up each other, the fact that much of the debate is social rather than technical, and the fact that benchmarks frequently lie or aren't representative of your particular use case. Despite this, the argument continues, and several sites (such as Phoronix) do regular performance shoot-outs between the two compilers whenever one or the other has a new release.
These shoot-outs tend to be focused around speed, and to nobody's great
surprise, are both highly-varied and tend to be quite narrow margin-wise.
Recently, however, I've been looking into compiling the smallest possible
binaries. There are several reasons for this in my particular case, but in
general, small is beautiful, and can even be faster. Due to mostly social
issues (availability of tooling), I tend to prefer Clang, and seemingly just to
please obsessives like me, it even has an -Oz
optimization option, designed
to produce even smaller binaries than -Os
would. On this basis, I've been
using it for my own work up to now. However, I'm not a diehard Clang fanboy, and
have switched between the two several times in the last few years.
Not too long ago, I decided to try and compile the smallest possible
decompressor (for reasons that aren't too significant to any of this),
eventually finding miniLZO. However, in the process of doing this testing
with various decompressors, I found that Clang with -Oz
did not necessarily
produce the smallest binary; in some cases, GCC would win, and by a non-trivial
amount, even though it has no corresponding -Oz
flag. A little light
searching showed me that others had stumbled across similar outcomes.
With this new information, I decided that some testing of it might be in order.
This is partly to see how much of a difference it can make (turns out that the
answer is 'quite a bit'), and whether GCC or Clang tend to do better (turns out
that the answer is 'Clang, but sorta-kinda'); but it is also to bring some
attention to this, and hopefully get some answers from people who know a bit
more about both compilers (and maybe to even see some improvements, who knows).
Coming from a position of relative ignorance, I can't really speculate why any of my results came out the way they did. However, I hope that maybe someone might be able to chime in and tell me. With that in mind, here is my experiment and what happened.
As per Morgan Spurlock, there shall be a few rules:
wc -c
(i.e. the number of bytes in the
resulting executable or library).gcc --version
) and Clang version 4.0.1 (as per clang
--version
).The particular codebases I chose are as follows:
8589333
I chose these for several reasons: they're the kind of thing you'd want to make
small, they are of varying sizes and do different things, and I happen to like
them. For each of these, I will compile them using both GCC and Clang, given the
rules above, and check their byte counts. As a further test, I will also compile
them with Clang using -Oz
to see if this makes any difference.
So, without further ado, here are the results. I've bolded the best result.
| Codebase | gcc -Os
| clang -Os
| clang -Oz
|
| ------------------- | ----------- | ------------- | --------------|
| Lua interpreter | 212560 | 215432 | 221472 |
| Lua statlib | 368324 | 366828 | 369980 |
| Chibi interpreter | 202768 | 62752 | 60376 |
| Chibi dynalib | 867520 | 655552 | 620360 |
| miniLZO tester | 14384 | 14360 | 14360 |
| MuPDF executable | 35488120 | 35858928 | 35858928 |
| cmus executable | 344016 | 344816 | 344752 |
| SQLite amalgamation | 646704 | 726272 | 697872 |
Here is the same data, as a delta from the best:
| Codebase | gcc -Os
| clang -Os
| clang -Oz
|
| ------------------- | ----------- | ------------- | --------------|
| Lua interpreter | 0 | 2872 | 8912 |
| Lua statlib | 1496 | 0 | 3152 |
| Chibi interpreter | 142392 | 2376 | 0 |
| Chibi dynalib | 247160 | 35192 | 0 |
| miniLZO tester | 24 | 0 | 0 |
| MuPDF executable | 0 | 370808 | 370808 |
| cmus executable | 0 | 800 | 736 |
| SQLite amalgamation | 0 | 79568 | 51168 |
And the same deltas, only this time as percentages of the best:
| Codebase | gcc -Os
| clang -Os
| clang -Oz
|
| ------------------- | ----------- | ------------- | --------------|
| Lua interpreter | 0% | ~1.4% | ~4.2% |
| Lua statlib | ~0.4% | 0% | ~0.9% |
| Chibi interpreter | ~235.8% | ~4% | 0% |
| Chibi dynalib | ~39.8% | ~5.7% | 0% |
| miniLZO tester | ~0.2% | 0% | 0% |
| MuPDF executable | 0% | ~1% | ~1% |
| cmus executable | 0% | ~0.2% | ~0.2% |
| SQLite amalgamation | 0% | ~12.3% | ~7.9% |
At a glance, we basically have every possible outcome represented here: in some
cases, GCC is the best (Lua interpreter, SQLite amalgamation, MuPDF and cmus); in
some cases, Clang with -Os
is (Lua static library); in some cases, Clang
with -Oz
is (Chibi in general); and sometimes, both -Os
and -Oz
are
the same (miniLZO tester). The differences between Clang and GCC also vary,
ranging from a difference of less than a percentage point (cmus) to over three
times bigger (Chibi interpreter). However, when Clang loses, it doesn't lose as
badly as GCC: if we compare -Os
performance, Clang only added about 12% at
worst (about 8% if we look at -Oz
), while GCC inflated the Chibi interpreter
by a hilarious 235% or so. The results also shows that -Oz
isn't always better
than -Os
; however, the difference is pretty marginal (less than 1% in all cases).
There does not appear to be any obvious reason for these differences, although it is broadly true that Clang appears to compile slightly larger executables in most cases. However, GCC suffers the biggest blowup in size on an executable (and a pretty thin one actually, considering that most of Chibi's interpreter functionality comes from its dynamic library). Library-wise, Clang appears to be better, but as I only tested two libraries (one static, one dynamic), this result may not be typical or representative.
It's worth mentioning that these tests were done only on x86_64 (because that's
what Sebastian happens to be), and on such platforms, space is rarely an issue.
What these results would look like on platforms where space could be of
concern (such as ARM) is unknown (although I might end up doing that on
something like my microserver or tablet just to see what happens). Additionally,
I didn't compare these for speed with the typical settings used by each of these
projects (usually -O2
); it would be instructive to do this, but this
requires considerably more complex experimental design, to which I am currently
not up.
Overall, we have a tie, but at least for these cases, I believe GCC loses. The worst inflation for Clang is only between 8 and 12 percent, while for GCC, it's over threefold; additionally, GCC's worst blowup is on a very simple bit of code, but Clang's worst one is on a fairly complex single-file amalgamation.
What I believe this highlights above all is that you should test your assumptions thoroughly. Unless you're feature-bound to a single one of GCC or Clang (and the number of such features is relatively small, and tends to impact very few projects), you should try both, with different settings, and see which gives the best results. This has been known to be true of optimizations for speed for a very long time; these results suggest that much the same can be said for size.
This also highlights the importance of writing portable code. If you don't rely on features that are unique to a specific compiler (or that step outside the standard generally), you can always switch to another one if you find it produces better results, whether on your particular platform or in general. You can chalk this up to the benefits of portability, in addition to everything else that this brings. While GCC and Clang are neck-in-neck with regards to a lot of things, this can only be said in general: your specific results may vary, and from what I've seen here, potentially by a whole damn lot.