clang-vs-gcc-binary-size.md 10 KB

Title: Clang versus GCC: binary size Date: 2017-07-08 Modified: 2018-07-01 Category: Blog Slug: clang-vs-gcc-binary-size Tags: c, clang, gcc Summary: In which I compare sizes (of binaries) and find, as ever, that mileages vary

EDIT: Thanks to a friendly GCC developer, it turns out that the difference in sizes between GCC and Clang on Chibi was down to debug information. Apparently, GCC adds much more debug info than Clang on the settings being used in Chibi's makefile, which produced the considerable increase in size. Thank you for the clarification - and sorry it took me so long to make this edit!

Currently, there is hardly a less resolvable argument than the old 'GCC versus Clang' one, unless you count Emacs versus Vim (or perhaps Linux-based versus BSD-based). The reasons for this range from the constant work on both compilers to try and one-up each other, the fact that much of the debate is social rather than technical, and the fact that benchmarks frequently lie or aren't representative of your particular use case. Despite this, the argument continues, and several sites do regular performance shoot-outs between the two compilers whenever one or the other has a new release.

These shoot-outs tend to be focused around speed, and to nobody's great surprise, are both highly varied and tend to be quite narrow margin-wise. Recently, however, I've been looking into compiling the smallest possible binaries. There are several reasons for this in my particular case, but in general, small is beautiful, and can even be faster. Due to mostly social issues (availability of tooling), I tend to prefer Clang, and seemingly just to please obsessives like me, it even has an -Oz optimization option, designed to produce even smaller binaries than -Os would. On this basis, I've been using it for my own work up to now. However, I'm not a diehard Clang fanboy, and have switched between the two several times in the last few years.

Not too long ago, I decided to try and compile the smallest possible decompressor (for reasons that aren't too significant to any of this), eventually finding miniLZO. However, in the process of doing this testing with various decompressors, I found that Clang with -Oz did not necessarily produce the smallest binary; in some cases, GCC would win, and by a non-trivial amount, even though it has no corresponding -Oz flag. A little light searching showed me that others had stumbled across similar outcomes. With this new information, I decided that some testing of it might be in order. This is partly to see how much of a difference it can make (turns out that the answer is 'quite a bit'), and whether GCC or Clang tend to do better (turns out that the answer is 'Clang, but sorta-kinda'); but it is also to bring some attention to this, and get some answers from people who know a bit more about both compilers (and maybe to even see some improvements, who knows).

Coming from a position of relative ignorance, I can't really speculate why any of my results came out the way they did. However, I hope that maybe someone might be able to chime in and tell me. With that in mind, here is my experiment and what happened.

Setup

As per Morgan Spurlock, there shall be a few rules:

  1. Only pure C will be considered (I have no interest in C++).
  2. GCC and Clang will both be given identical optimization flag sets.
  3. Binary size will be measured using wc -c (i.e. the number of bytes in the resulting executable or library).
  4. I will test using Sebastian (my Arch main work machine), using GCC version 7.1.1 (as per gcc --version) and Clang version 4.0.1 (as per clang --version).

The particular codebases I chose are as follows:

  • The Lua interpeter and static library, version 5.3.4
  • The Chibi Scheme interpreter and dynamic library, commit 8589333
  • miniLZO test program, version 2.10
  • MuPDF executable, version 1.11 (for a real OS, not Windows)
  • cmus executable, version 2.7.1
  • SQLite amalgamation, version 3190300

I chose these for several reasons: they're the kind of thing you'd want to make small, they are of varying sizes and do different things, and I happen to like them. For each of these, I will compile them using both GCC and Clang, given the rules above, and check their byte counts. As a further test, I will also compile them with Clang using -Oz to see if this makes any difference.

Results

So, without further ado, here are the results. I've bolded the best result.

| Codebase | gcc -Os | clang -Os | clang -Oz | | ------------------- | ----------- | ------------- | --------------| | Lua interpreter | 212560 | 215432 | 221472 | | Lua statlib | 368324 | 366828 | 369980 | | Chibi interpreter | 202768 | 62752 | 60376 | | Chibi dynalib | 867520 | 655552 | 620360 | | miniLZO tester | 14384 | 14360 | 14360 | | MuPDF executable | 35488120 | 35858928 | 35858928 | | cmus executable | 344016 | 344816 | 344752 | | SQLite amalgamation | 646704 | 726272 | 697872 |

Here is the same data, as a delta from the best:

| Codebase | gcc -Os | clang -Os | clang -Oz | | ------------------- | ----------- | ------------- | --------------| | Lua interpreter | 0 | 2872 | 8912 | | Lua statlib | 1496 | 0 | 3152 | | Chibi interpreter | 142392 | 2376 | 0 | | Chibi dynalib | 247160 | 35192 | 0 | | miniLZO tester | 24 | 0 | 0 | | MuPDF executable | 0 | 370808 | 370808 | | cmus executable | 0 | 800 | 736 | | SQLite amalgamation | 0 | 79568 | 51168 |

And the same deltas, only this time as percentages of the best:

| Codebase | gcc -Os | clang -Os | clang -Oz | | ------------------- | ----------- | ------------- | --------------| | Lua interpreter | 0% | ~1.4% | ~4.2% | | Lua statlib | ~0.4% | 0% | ~0.9% | | Chibi interpreter | ~235.8% | ~4% | 0% | | Chibi dynalib | ~39.8% | ~5.7% | 0% | | miniLZO tester | ~0.2% | 0% | 0% | | MuPDF executable | 0% | ~1% | ~1% | | cmus executable | 0% | ~0.2% | ~0.2% | | SQLite amalgamation | 0% | ~12.3% | ~7.9% |

Analysis

At a glance, we basically have every possible outcome represented here: in some cases, GCC is the best (Lua interpreter, SQLite amalgamation, MuPDF and cmus); in some cases, Clang with -Os is (Lua static library); in some cases, Clang with -Oz is (Chibi in general); and sometimes, both -Os and -Oz are the same (miniLZO tester). The differences between Clang and GCC also vary, ranging from a difference of less than a percentage point (cmus) to over three times bigger (Chibi interpreter). However, when Clang loses, it doesn't lose as badly as GCC: if we compare -Os performance, Clang only added about 12% at worst (about 8% if we look at -Oz), while GCC inflated the Chibi interpreter by a hilarious 235% or so. The results also shows that -Oz isn't always better than -Os; however, the difference is pretty marginal (less than 1% in all cases).

There does not appear to be any obvious reason for these differences, although it is broadly true that Clang appears to compile slightly larger executables in most cases. However, GCC suffers the biggest blowup in size on an executable (and a pretty thin one actually, considering that most of Chibi's interpreter functionality comes from its dynamic library). Library-wise, Clang appears to be better, but as I only tested two libraries (one static, one dynamic), this result may not be typical or representative.

It's worth mentioning that these tests were done only on x86_64 (because that's what Sebastian happens to be), and on such platforms, space is rarely an issue. What these results would look like on platforms where space could be of concern (such as ARM) is unknown (although I might end up doing that on something like my microserver or tablet just to see what happens). Additionally, I didn't compare these for speed with the typical settings used by each of these projects (usually -O2); it would be instructive to do this, but this requires considerably more complex experimental design, to which I am currently not up.

Conclusion

Overall, we have a tie, but at least for these cases, I believe GCC loses. The worst inflation for Clang is only between 8 and 12 percent, while for GCC, it's over threefold; additionally, GCC's worst blowup is on a simple bit of code, but Clang's worst one is on a fairly complex single-file amalgamation.

What I believe this highlights above all is that you should test your assumptions thoroughly. Unless you're feature-bound to a single one of GCC or Clang (and the number of such features is relatively small, and tends to impact relatively few projects), you should try both, with different settings, and see which gives the best results. This has been known to be true of optimizations for speed for a long time; these results suggest that much the same can be said for size.

This also highlights the importance of writing portable code. If you don't rely on features that are unique to a specific compiler (or that step outside the standard generally), you can always switch to another one if you find it produces better results, whether on your particular platform or in general. You can chalk this up to the benefits of portability, in addition to everything else that this brings. While GCC and Clang are neck-and-neck for a lot of things, this can only be said in general: your specific results may vary, and from what I've seen here, potentially by a whole damn lot.