Succeeding With ClangFormat, Part 3: Persisting The Change
If you’ve been following our series on succeeding with ClangFormat, you already know all about why we did it and
the steps we took
to ensure the migration went well. In this concluding post, we’ll talk about how to succeed after the integration and reformat are complete. We learned some valuable lessons about what happens in the immediate aftermath of bringing ClangFormat into our system and have been refining our workflows ever since. Here’s a look at our occasionally bumpy road and how you might have a smoother one.
Rescuing stranded changes
In the previous post, we reformatted and committed changes to existing code on what we called “flag day.” But while that was most of the work, it wasn’t all of it. Post flag day, you need to have a way for developers to migrate changes from pre-format topic branches to post-format. Because the most recent commit to master was a huge number of format-only changes to just about every file, the typical process of rebasing a topic branch onto master would produce complicated, noisy, and confusing diffs.
Conceptually, the process we need is
almost
a
rebase
, but in place of applying the change patch for each commit atop master, which would yield those messy merge conflicts, we can just run ClangFormat on all the updated files in the topic branch and overwrite the corresponding ones in master. We can safely do this as long as we can guarantee that formatting changes will be the only differences between the files in the topic branch and in master -- which we can do, by rebasing our topic branch atop the commit immediately prior to the reformat commit. Here’s that process, distilled into an algorithm:
Given a topic branch T, reformat commit R, and hash P, the commit prior to R, and,
Given you have manually rebased the orphaned branch O atop hash P, yielding hash T
Validate that branch T has been rebased atop hash P
For each commit A in the history of P through T:
Checkout commit A
Reformat files touched by A with ClangFormat
Commit changes, yielding B
Checkout reformat commit R
Move each file of change B onto top of reformat commit R
Commit changes as C
Third
Fourth
Here's that algorithm, diagrammed:
Initial state
Orphaned branch O rebased atop P, yielding T
Files for each commit A in T get formatted...
...and replace the corresponding file in R
Finished reformatting rebase
We implemented this as a one-off script initially, because stranded branches was a condition we did not plan for! We invested a lot of time and energy into how to handle flag day, and how to use ClangFormat day to day, but we overlooked how to get over the initial reformat bump. We considered reverting the reformat commits to deal with the unexpected hiccup, but after a few minutes’ panic, cooler heads prevailed, and we realized we could forge ahead. We quickly coded up our script, and opened a
ticket to enhance clang_format.py
to do this work as a subcommand. It was the biggest snafu for us in the project. (Learn from our error, and your reformat will be even easier than ours.) By now we’ve
completed the enhancement
, and you can now run
clang_format.py reformat-branch T R
(using T and R from the algorithm above) to easily bring a stranded topic branch forward after a reformat commit.
Holding the line and daily development
So far I’ve only addressed the issues that go into formatting your code base once. All this work is for naught if the tool does not get added to the development process. This is where analysis of your workflow is key.
Editing code
Most developers write code using IDEs or text editors, and we use a wide variety at MongoDB: Vim, Emacs, Visual Studio, Eclipse, Sublime, XCode, and Geany. Due to the open source nature of ClangFormat, the LLVM repository contains plugins for Vim, Emacs, Sublime, and Visual Studio. There are also 3rd-party plugins freely available for many other editors too. The editor integration is fantastic since it allows developers to avoid having separate steps to edit and format code. Many of these editors even support a “format file on save” option so that code is always formatted correctly without changing how a developer works. This has made our developers very happy since it saves them time and effort.
Our continuous integration system
Evergreen
(which we discussed in a
recent post
) makes sure code is formatted as planned by
using clang_format.py in validate mode
.
Checking in code
Code format is best enforced pre-checkin via a tool like gerrit so that only conforming code is ever checked in. For historic reasons, though, we use Rietveld and all developers can directly push to our repos. So we achieve an equivalent safety valve after checkins via Evergreen. This suffices for us because we treat any scons lint failure as a compile break and addressing the problem is as simple as running the tool. On the rare occasions this happens it gets fixed quickly, generally by the responsible developer but otherwise by a teammate who catches the error. In the future we’d like to catch nonconforming code before it ever gets checked in.
Reviewing code
Finally, the last problem is to ensure code formatting was applied to any changes a developer submits for code review. Because our Rietveld code review tool does not provide any pluggable server side hooks, we added a hook instead to our code review upload tool. When a developer runs the code review upload tool, it runs ClangFormat against their changes to validate it is formatted correctly. We want developers to be confident that any changes they review from their peers match the style guidelines. This way they do not have to spend time checking for things that a tool can fix, and can focus on content. In addition, since all the code uses the same format, it is easier for developers to read and review the code.
Tool versioning
You have to ensure that everyone uses the same version of ClangFormat, or eventually you will run into config incompatibilities, and maybe even silent formatting discrepancies.
At present, we require all developers to use version 3.6.0 of ClangFormat (the most recent version at the time of our initial reformatting). Our script
clang_format.py
validates the installed version of ClangFormat, and if 3.6.0 is not found, it installs the right version. Since developers at MongoDB use several different Linux distributions (Arch, Fedora, OpenSUSE, & Ubuntu), we even build our own copy of ClangFormat against a common ancestor of glibc (2.5) on RHEL 5.5.
Impact on project
The end result of all this work was better then we expected. Developers remark it has changed how they write code. One of the managers told me, “I can no longer tell about the craftsmanship of the code based on the formatting.”
Two weeks prior to flag day, I nervously presented the proposed formatting changes and the rollout plan to the team in a standing-room-only meeting. We walked everyone through how this would affect their day-to-day development, as well as the changes to our official coding style. We got applause and cheers when we announced that we were changing the format around else to follow the Google style instead the Stroustrup style.
Conclusion
In short, employing an automated tool for code formatting has fantastic benefits. While the initial integration is expensive, it pays for itself in the long term in reduced costs writing and reviewing code. None of this is specific to our first project, where we brought systematic formatting to MongoDB Server. We already subsequently applied the same pattern to our Legacy C++ Driver and New C++ 11 Driver that run in Travis CI so every pull request is checked for style.
While running ClangFormat is easy, the hard work and payoff is integrating it into your processes -- and ensuring that the code base never slips backward. Developers will happily use the tool if it helps them be more productive, and assures them that everyone is held to the same standard.
August 19, 2016