How to speed up clang-tidy with unity builds

At Method Park, building software in C++ using Qt/QML is one of our go-to stacks for embedded devices with a touch GUI. When starting with one of our earliest customers, QML was not even a thing yet, so it was C++ all the way down. Today, C++ still plays a significant role in many of our projects.
With thorough build automation and verification of every change we make in our code, using clang-tidy for running static analysis against our code has proven to be an invaluable part of our pipelines. Unfortunately, as you might have already experienced, for any non-trivial amount of code, clang-tidy tends to be responsible for the largest chunk of time spent on CI checks. Parallelization and more computing power can only help so much, and at the end of the day, it will be a trade-off decision.

The problem with clang-tidy

Since we cannot usually share customer code in our blog, I’m going to showcase this on herbstluftwm, an open-source window manager for X11 written in about 16k lines of C++ code. This is clearly not a huge amount of code yet enough to feel the need for faster builds. The baseline for our measurements here is commit 470c9757, using clang-tidy-10 in a Docker container based on Ubuntu 20.04.

With the conventional approach of exporting the CMake compile commands and then running the run-clang-tidy.py helper script, my machine takes around 1m22s to check all 53 C++ modules. With only 2 parallel jobs, this is chosen to stay close to what GitHub projects usually have available on platforms like Travis CI or GitHub Actions.

CMake to the rescue

Intuitively, clang-tidy is likely to spend a large amount of its execution time looking at headers, both from within the project and system headers. And these headers are mostly the same for each module. So why not try to avoid the duplication of work here? In November 2019, CMake version 3.16 came out with newly added support for unity builds. The idea of unity builds is to batch up all modules in a single compilation unit instead of compiling each source file individually and linking them together in the end. Naturally, this is expected to consume more peak memory, and it also throws any benefits of incremental builds out of the window. The latter does not really matter for automated CI builds running in a clean workspace anyway. And for dealing with memory usage, CMake thankfully offers a configurable maximum batch size for splitting the unity build into smaller chunks to fit into the available resources.

So how to do this? Add the following flags to your CMake invocation, and you’re good to go:

-DCMAKE_UNITY_BUILD=ON
-DCMAKE_UNITY_BUILD_BATCH_SIZE=0

The JSON compilation database should now contain only a single compile invocation for a generated file that includes all the source files. If you have preprocessor flags defined for individual source files (e.g., with set_property()), then CMake might be forced to split those files off to specifically pass the requested flags. For herbstluftwm, this happens for two files, which does not change much in the grand scheme of things, so we accept it.

Now that we’ve turned 53 clang-tidy invocations into just 3, the entire clang-tidy duration is down to an impressive 18 seconds, compared to almost one-and-a-half minutes before. And in addition to that, you get a lower resource usage, since almost everything is basically running on a single core, freeing up compute for other CI tasks. With that in mind, a more technically correct comparison (yet less pragmatic) would measure both with no parallelization at all, which yields 22 seconds for the unity build vs. 2m37s for the conventional approach.

Caveats

The sad news is that compiling your program in a unity build might not work at first. If you’ve been lax with scoping your definitions, name collisions can occur even if normally everything compiles just fine and without warnings. We ran into only one such issue in herbstluftwm, where two modules defined the same static global variable, which in itself was an undesired state that was ripe for refactoring anyway.

Of course, the benefit of the entire approach depends mostly on how much the conventional setup is parallelized. If you are already distributing your clang-tidy runs over multiple free-tier CI machines or have your own machine with many cores, you will get diminishing returns on a unity build setup. Give it a shot and let us know in the comments how it worked out for you!