Pruning Dynamic Rebuilds With libabigail

MongoDB

#Engineering#EngineeringBlog

Complex C++ projects frequently struggle with lengthy build times. Splitting a project into multiple dynamically-linked components can give developers faster incremental rebuilds and shorter edit-compile-test cycles than relying on static linking, especially when there are a large number of test binaries. However, build systems usually do not realize all of the possible gains in dynamic incremental rebuilds due to how they handle transitive library dependencies. Red Hat's ABI introspection library libabigail offers one possible path to eliminating unnecessary transitive re-linking for some classes of source modifications.

The problem

Consider the following toy project containing two libraries: libserver and libclient. The server library libserver depends on the client library libclient for wire protocol code, and both the client and server support library implementations which depend on a library of common utilities libcommon. The client and server executables each use the associated support libraries.

We can see a more complete picture of our dependency graph by considering the header files and source files that are used to build these libraries, as well as the intermediate targets such as object files. We will assume that each library has one header and one source file. The dependency graph now looks like the following:

Finally, we assume a build system that can use content signatures to skip rebuilds when a dependency is regenerated with identical results. A build system that only uses timestamps cannot capitalize on the technique outlined below because regenerated dependencies always have newer timestamps.

In this environment, what will be rebuilt if we make a meaningful change to libcommon.hpp or libcommon.cpp, and ask for the client and server binaries to be built?

Well, changing libcommon.hpp is a disaster!

We need to recompile libcommon.cpp, generating a new libcommon.o, and therefore a new libcommon.[a|so]. Similarly, since both libclient.cpp and libserver.cpp depend on libcommon.h, they need to be recompiled and the associated libraries rebuilt. Since the libserver and libclient support libraries were relinked, the executables are now out of date, so they also get relinked. The only work we avoided doing was recompiling client.cpp and server.cpp, since they don’t directly depend on libcommon.hpp. Ouch. Well, that’s C++ for you. Perhaps C++ modules will improve this situation, but we don’t live in that world yet.

The following diagrams demonstrate this graphically, where:

  • The darkest red box is the entity which was directly changed.

  • The intermediate red indicates an entity that is rebuilt because one of its direct dependencies is seen by the build system as changed.

  • The lightest red is an unaltered entity that is seen as out of date by the build system due to a change in an implicit dependency like a header inclusion.

Changing just libcommon.cpp isn’t much better. We avoid needing to recompile lib{client,server}.cpp, but we still do a bunch of relinking. Here is how that looks for a static build:

Note that we have cheated a bit in our diagram: in a static build lib{client,server}.a don’t really depend on libcommon.a. Instead, server and client depend on it directly. So the fact that libcommon.a changed doesn’t require us to re-run the archiver for libclient.a and libserver.a. But drawing it that way makes the diagram a lot messier.

The dynamic build is actually worse here, because in that case we do need to relink libclient.so and libserver.so since their link-time dependency libcommon.so is newer/changed.

Potentially, you might get away with not relinking client and server in the dynamic case, since the relink of libclient.so and libserver.so may very well produce identical results in this situation, and a content signature based build system would notice that. But in practice, libcommon.so is likely also going to show up on the link line for client and server, since otherwise static builds won’t work. So unless you have written or generated varying library dependency lists for static and dynamic builds, libcommon.so is very likely to be on the link line for client and server too, making them additional link-time casualties.

An insight

Let us imagine that the change to libcommon.cpp was something small and innocuous, maybe fixing a typo in an internal string constant that gets logged. In a small example like this, it isn’t too painful that we needed to relink so many things. But in a larger project it can definitely hurt. It feels wrong to do so much linking for such a little change. Especially in the dynamic build, a small change deep in the library dependency graph can lead to a long chain of transitive relinking, even when many of the libraries are completely unaltered. Can we do better?

With static linking, no, not really. That updated string constant needs to exist in both executables, so we really need to relink them so that the new string constant is extracted from libcommon.a.

With dynamic linking, it turns out that we can do better. The key observation is to consider what would happen if we rebuilt only libcommon.so, and intentionally didn’t relink the other dynamic libraries or executables (even though the build system thinks we should), and then tried to run the executables. Would they work, and work as expected?

For the case of our proposed private string constant modification in libcommon.cpp, the answer is a definite yes. Changing that internal string constant didn’t alter the Application Binary Interface (ABI) of libcommon.so in any way, and when the executables are run, the updated value of the string constant will be reflected in the output because the string constant wasn’t copied into the executables: it lives in the now replaced libcommon.so.

We got away with this because our change didn’t alter the ABI of libcommon.so. If we had made a change that altered its ABI, rebuilt libcommon.so, and then tried running the executables without relinking them, we would probably be looking at a very subtle runtime crash. No fun.

In theory then you could minimize relinking by individually naming targets to build when you knew that you had ABI affecting changes. But in practice that is clearly error prone and just a terrible idea. But if we had a tool that could tell us when the ABI of a library had changed, then we could teach our build system how to invoke this tool as it did its dependency walk, and automatically skip any unnecessary relinks in the case of ABI preserving modifications.

A solution

Fortunately, Red Hat has provided just such a tool as part of their new library libabigail. As they describe it:

the project aims at providing a library to manipulate ABI corpora, compare them, provide detailed information about their differences and help build tools to infer interesting conclusions about these differences.

The abidw tool that comes with libabigail reads a shared library, consults the associated ELF and DWARF information which together encode all information relevant to the ABI, and emits an XML document that describes the library ABI.

Taking advantage of the flexibility of SCons, we can augment it to invoke abidw on a library immediately after we build it and compute a hash of the resulting ABI XML, then store that hash in a file alongside the library. When another target declares that it links to the library, we tell SCons to record a dependency on the ABI hash file instead of a dependency on the library itself. As a result, if the library is relinked but its ABI doesn’t change, then the ABI hash file will have the same contents. Since SCons uses content signatures to detect whether targets are out of date, the ABI hash file is seen as up to date, even if it was regenerated. Since that dependency is seen as up to date, the depending target is also considered up to date. ABI preserving modifications to a library no longer cause dependents to relink!

The following diagram updates our original to include the associated ABI hash files and strips out some now unhelpful sources and headers. We now also distinguish between a dependency relationship (solid line), and a links-to/requires relationship (dashed line):

Now, if we make an ABI-affecting change to libcommon.so, we see that libclient.so and libserver.so are relinked. But libclient.so and libserver.so only use libcommon.so internally, so their ABI has not changed. The client and server executables do not need to be relinked:

On the other hand, if we make a change to libcommon.so that does not affect the ABI, then nothing else gets relinked:

Correctness

Is it safe to do this? We believe it to be. We have not yet thought of any cases where it is not. Additionally, a false positive (e.g. a claim that ABI changed when it didn’t) only costs us a missed optimization. A false negative would be harmful, but would represent a serious bug in libabigail. Additionally, we are currently only offering this facility as an opt-in for developer builds; the builds that we ship to customers do not use it.

There are a few important correctness issues to be aware of, however:

  • There is a poor interaction with the -gsplit-dwarf flag for debug fission. libabigail uses the elfutils library for its DWARF processing, and elfutils as yet doesn’t know how to reach out to the .dw{o,p} files that -gsplit-dwarf and the associated tooling creates. Since libabigail relies on the DWARF info to identify the ABI, running abidw on a library built from objects built with -gsplit-dwarf gives incorrect results. So you can’t use both ABI driven linking and debug fission at the same time. Presumably, this limitation will be lifted as support for the new DWARF 5 standard is incorporated into elfutils.

  • The libabigail library is new, and we have had several instances where it crashed when working with our libraries. Dodji Seketeli, the author of libabigail, has been very helpful and responsive addressing those crashes, but you will need to have a fairly bleeding edge version of abidw available if you want this technique to work well in practice.

  • Taking full advantage of the technique requires that symbol visibility annotations be correctly applied to type and function definitions, and typically that all code can be built with -fvisibility-hidden. Otherwise, entities which do not actually form part of the interface to the library are still exported and therefore are counted as part of the ABI by libabigail, leading to spurious relinking.

Performance

Is this solution performant? Unfortunately, the answer right now is a resounding “it depends”.

The mongo::Status class is compiled into a library on which almost all other libraries and executables in the MongoDB server project depend. After making a non-ABI-altering edit to its implementation file, a rebuild of the all target on my machine is 40% faster when using abidw to skip relinks than when not. That is a fairly compelling win, but it is also the best case scenario.

The worst case scenario is pretty bad. There is a large cost to running abidw on each library. For some complex libraries it can be quite slow: running abidw on the SpiderMonkey JS engine takes upwards of 30 seconds. In terms of total compute time, a full relink with abidw takes about twice as much total CPU time as a full relink without. Another way to look at it is that running abidw is about as expensive as linking a second time. So using abidw may be worth it if you are working deep in the link graph, and your work admits a high degree of link avoidance, but it may not be worth it if you are doing work that will cause lots of ABI changes. Unfortunately it is hard to know up front which you are likely to do. On the other hand, if you have subsets of the tree that change very infrequently, the cost for those subsets is amortized over many builds.

Overall, further work on performance is likely required.

Or, maybe it isn’t…

If proper header discipline is fully honored in a codebase, where every ABI relevant function or object has a unique declaration in a header, it should be impossible to make a source code change that leads to an ABI variation in a library but does not cause all dependent libraries or programs to be rebuilt. In such a world, it should be then possible to weaken the build system rules for linking shared libraries to induce an order-only relationship, rather than a strict dependency. That would entirely obviate the need for using libabigail to detect ABI variation. Is it reasonable to expect such discipline? Are there ways to mechanically enforce it? Are there ways to subvert ABI compatibility despite such discipline? Would C++ modules offer that capability? Depending on the answers to those and similar questions, it might be better to invest time implementing that approach and associated tooling, rather than relying on ABI metadata.

Future directions

If using ABI metadata does prove to be the correct approach, there are some areas where the current implementation could be improved:

  • Per the discussion above, general improvements to the speed of abidw are necessary. Other potential avenues to improve performance might include writing a compiler or linker plugin that could emit an ABI description analogous to that produced by abidw concurrently with executing the link step, obviating the need for a second pass by abidw.

  • Our current practice of using pipes and file redirection in the Command body of the SCons tool is somewhat dangerous. We could change it to actually just emit the full XML into the .abidw file via the abidw --out-file option, and allow the SCons internal signature generation mechanisms to compute the hash. However, this would end up writing hundreds of megabytes of information we don’t actually care about to disk, and clutter up the SCons cache. Potentially, adding a compression option to abidw would be an effective remediation. The total size of abidw data generated by a full build of MongoDB is 203 MB, but a simple gzip of each file brings it down to 14 MB.

  • Another option to improve performance would be to eliminate XML generation entirely. We are currently doing a lot more work than needed because we are generating XML that is then just feed right into MD5 to compute a signature. If the libabigai library had an abihash program that just emitted a signature directly, we could probably somewhat improve the performance of the tool.

  • We could probably eliminate even more rebuilds by not just detecting whether the ABI changed, but by using other libabigail utilities like abidiff to do ABI compatibility detection and only relink when there was a non-ABI compatible change in a dependency. This would allow new functions to be added to the library without requiring libraries that do not depend on those new functions to relink. We will investigate this in the future, but it probably would require a significantly more complex build system integration.

  • The libabigail library currently only works on ELF platforms. It might be possible to make it work on macOS because its debug info is also DWARF, but it would require significant effort to make it work with the MACH-O parts of the binary. There is definitely no support for Windows, though I’m curious as to whether the import library generated as part of DLL construction may contain enough information to identify the ABI. If you know, please reach out and let me know, or reply on the StackOverflow question on that topic.

Conclusion

Overall, has this approach to relink avoidance been successful? Our current view is that the tool is not enough of a consistent win to deploy as the default for developer builds, but that the potential gains are compelling enough that we will continue to pursue the performance improvements and future directions outlined above.

Should libabigail prove to be the right approach, we intend to invest time into the SCons integration to address some of the deficiencies and limitations identified above. If those issues can be resolved satisfactorily, we ultimately hope to see the tool merged into the SCons mainline. We also hope to work with the libabigail maintainers to further improve its feature set and performance.

And, finally, even if the specific approach of using abidw to skip relinks proves non-viable, we are pleased with the insights that were incidentally developed regarding header discipline and linking, which may ultimately provide a zero cost way to achieve the same goal.

If you are interested in experimenting with the tool, it is available as an Apache 2 licensed drop-in tool for the SCons build system. Thoughts, feedback, and bug reports are most welcome.

I’d like to thank Dodji Seketeli of Red Hat for writing libabigail, for his prompt responses to all of the issues that I have opened over the past year, and for his help reviewing this blog post. I’d also like to thank Mathias Stearn for his review of this post and thoughts about using header discipline to entirely eliminate the need for ABI metadata.