Software Exploring: Reduce Compilation times with extern Template

Originally published on Simplify C++!, Arne Mertz’s blog on clean and maintainable C++.

Over the past few years, the compilation times of C++ projects have increased significantly, despite the availability of fast computers with multiple CPU/cores and more RAM.

This escalation can be attributed largely to:

The shift of certain elaborations from run-time to compile-time via templates and constexpr.
The rise in the number of header-only libraries.

While the first factor is unavoidable and desirable, the second can be seen as a questionable trend primarily driven by the convenience of distributing header-only libraries. Since I’m myself guilty of having developed a few header-only libraries, however, I won’t delve into this issue here :-)

In some scenarios, build times can be mitigated using techniques such as enhancing modularity, disabling optimizations, using the pimpl idiom, leveraging forward declarations, and using precompiled headers, among others.

Additionally, C++11 introduced extern template declarations (n1448) that, to some extent, can help speed up compilation times. This concept is akin to extern data declaration and directs the compiler to refrain from instantiating the template in the current translation unit.

How Does extern Template Declaration Work?

The simplest way to figure out how extern template declarations work is to reason over a code snippet. Consider these files:


// bigfunction.h

template<typename T>
void BigFunction()
{
    // function body
}

// f1.cpp

#include "bigfunction.h"

void f1()
{
    ...
    BigFunction<int>();
}

// f2.cpp

#include "bigfunction.h"

void f2()
{
    ...
    BigFunction<int>();
}

This code will lead to the generation of the following object files (on Linux, you can verify this using the nm utility):


> nm -g -C --defined-only *.o

f1.o:
00000000 W void BigFunction<int>()
00000000 T f1()

f2.o:
00000000 W void BigFunction<int>()
00000000 T f2()

Subsequently, when these two object files are linked together, one instance of BigFunction<int>() will be discarded (indicated by the “W” symbol type that nm puts near the function). Thus, the compilation time spent on generating BigFunction<int>() multiple times becomes futile.

To mitigate this redundancy, the extern keyword can be employed:


// bigfunction.h

template<typename T>
void BigFunction()
{
    // function body
}

// f1.cpp

#include "bigfunction.h"

void f1()
{
    ...
    BigFunction<int>();
}

// f2.cpp

#include "bigfunction.h"

extern template void BigFunction<int>();

void f2()
{
    ...
    BigFunction<int>();
}

Resulting in:


> nm -g -C --defined-only *.o

f1.o:
00000000 W void BigFunction<int>()
00000000 T f1()

f2.o:
00000000 T f2()

The same principle extends to template classes, utilizing this syntax:



// bigclass.h

template<typename T>
class BigClass
{
    // class implementation
};

// f1.cpp

#include "bigclass.h"

void f1()
{
    ...
    BigClass<int> bc;
}

// f2.cpp

#include "bigclass.h"

extern template class BigClass<int>;

void f2()
{
    ...
    BigClass<int> bc;
}

Missing Pieces

Unfortunately, it's not as straightforward as it appears.

For instance, when attempting to compile the aforementioned code with optimization enabled (e.g., -O2 on gcc or clang), the linker might report that BigFunction<int>() is undefined. Why?

The problem is that when compiling f1.cpp with the optimization enabled, the template function is expanded inline at the point of the function call instead of being really generated, so when the linker encounters the f2 object file, it can’t locate the expanded function anymore.

To address this, you can utilize the nm utility to check the symbols exported by the object files, and verify that the issue here is the inline expansion of the function:


> nm -g -C --defined-only *.o

f1.o:
00000000 T f1()

f2.o:
00000000 T f2()

in f1.o the symbol is missing due to optimization, and in f2.o the symbol is missing due to the extern clause.

If you’re using gcc, you can get further evidence of this by trying:


// bigfunction.h

template<typename T>
void __attribute__ ((noinline)) BigFunction()
{
    // body
}

Here, the gcc-specific attribute noinline prevents inline expansion, so that the linker can find it and not complain anymore.

A Global Approach

The gcc-specific attribute noinline is obviously not the ultimate solution to our problem.

A point worth noting here is that the strategy to reduce compilation time is relative to an entire project, and so is the usage of the extern template clause.

A project-wide strategy to capitalize on the extern template mechanism while ensuring that all necessary code is generated for linking might involve:

Including a header file with the extern template clause in every translation unit where the template appears.
Adding a source file to the project containing explicit instantiation.


// bigfunction.h

template<typename T>
void BigFunction()
{
    // function body
}

extern template void BigFunction<int>();


// bigfunction.cpp

#include "bigfunction.h"

template void BigFunction<int>();

// f1.cpp

#include "bigfunction.h"

void f1()
{
    ...
    BigFunction<int>();
}

// f2.cpp

#include "bigfunction.h"

void f2()
{
    ...
    BigFunction<int>();
}

This approach is also applicable when the template function/class is part of a third-party library. In such cases, adding your own header file including the library that introduces the extern template clause suffices.


// third_party_bigfunction.h

template<typename T>
void BigFunction()
{
    // function body
}

// bigfunction.h

#include <third_party_bigfunction.h>

extern template void BigFunction<int>();

// bigfunction.cpp

#include "bigfunction.h"

template void BigFunction<int>();

// f1.cpp

#include "bigfunction.h"

void f1()
{
    ...
    BigFunction<int>();
}

// f2.cpp

#include "bigfunction.h"

void f2()
{
    ...
    BigFunction<int>();
}

Summary

Reducing compile times using extern template is a project scope strategy. One should consider which are the templates most expensive that are used in many translation units and find a way to tell the build system to compile them just once.

But let’s consider for a moment what we’ve done in the previous paragraph.

We had a template function/class. To minimize the build time we decided to instantiate it only one time for a given template parameter. In doing so, we had to force the compiler to generate exactly one time the function/class for the given template parameter, preventing the inline expansion (and possibly giving up a run-time optimization). However, if the compiler decided to inline a function, chances are that it was not so big, meaning, after all, we don’t save so much build time by compiling it only once.

Anyway, if you’re determined to save both the goats and the cabbages, you can try to enable the link time optimization flags (-flto on gcc): it will perform global optimizations (e.g., inlining) having visibility of the whole project. Of course, this, in turn, will slow down the build process, but you’ll get your function template inlined but instantiated only once.

Bottom line: programming inevitably involves trade-offs between conflicting aspects, and you should measure carefully whether a template function is slowing down your build (because e.g., it’s instantiated with the same parameter in many compilation units) or your run-time execution (because e.g., it’s called in just one location but in a tight loop) and – above all – consider your priorities.

After all, the observation “premature optimization is the root of all evil” and the rule that immediately follows “measure before optimize” can also be extended to compile times. By carefully measuring the impact of extern template on both build times and run times, you can make informed decisions to strike a balance between optimization for compilation and execution.

At the end of the day, it is inevitable that we decide whether to optimize for compilation or execution. After all, that’s exactly what I wrote at the very beginning of this article: one of the methods to speed up build time is to turn off optimizations :-)

Software Exploring

Wednesday, October 25, 2023

Reduce Compilation times with extern Template

How Does extern Template Declaration Work?

Missing Pieces

A Global Approach

Summary

No comments:

Post a Comment