Succeeding With ClangFormat, Part 1: Pitfalls And Planning

MongoDB

#Engineering#EngineeringBlog

Last year, MongoDB began using ClangFormat to apply a globally consistent format to our C++ codebase, and has maintained that uniformity ever since. The most important factor in our success wasn’t deciding on the particular format or handling git issues. It was making sure it was effortless for developers to produce properly formatted code, and integrating automated checks at every phase of our dev process.

I was the developer in charge of designing our ClangFormat implementation and integrating it into our process, as well as “chief cat herder” to achieve consensus on code format. Planning and rolling out the use of a formatting tool is not too hard; but it requires forethought, coordination, and a commitment to enabling and enforcing its use. It can be time consuming, but the end result is that everyone has only one format to grok. After, every moment of time wasted on code formatting or discussion thereof is eliminated. Maybe you know entirely different types of developers than I do, but in my experience, that's a lot of time saved.

The difficulty of maintaining consistent formatting

MongoDB is a large open source code base with over a half-million lines of code, scores of full-time developers, and many community contributors. But even with smaller projects, most developers discover the problems of working without an agreed upon format the very first time they work on a team. This irritation can lead to religious arguments over the merits of various formatting choices; but mature engineers know that a standard is more important than which standard.

A formatting process which is both manual and insufficient is doomed to be abandoned.

An unfair burden on code reviews

It’s not just a matter of having a code formatting standard. MongoDB has had one for far longer than our time using ClangFormat. It’s just that enforcing it was an extraordinary burden on developers. While introspecting our code review practices, we concluded that a significant portion of time was spent on formatting, first to prepare the code for review, and then during the review itself. This included minutiae like the spacing around parentheses, width of the code, header file include order, and many other rules. We also found that style enforcement was inconsistent and -- in spite of a clear style guide -- often varied over files belonging to different components due to personal tastes. Considering the hassle, it’s no wonder the formatting strayed from the standard over time.

Blame the system, not the tool (or the engineers)

This phenomenon appears widespread. Many teams think of inconsistent formatting as unfortunate but unavoidable. Considering that code formatting is a purely mechanical process, for which just about every editor can provide automatic assistance via plug-ins, it seems odd that this should be the case. The tools to format code exist, they are highly configurable, and they do their jobs quite well.

When we made our first attempt at using such tools, we failed, and learned that the key to success lies not only in the choice of tool, but also in padding every inch of your development workflow with it.

Five years ago, we added support for Artistic Style to our build system. We formatted our existing code and -- for a while at least -- used the tool to clean up code before commits. But because running Artistic Style was manual, it was often skipped. This meant more re-formatting runs over time, which made for a noisy commit history. Beyond the clutter, though, we had another problem: trust. We were nervous about automated code formatting and so we implemented only a few rules. That meant we had no guarantee of code conformance. A formatting process which is both manual and insufficient is doomed to be abandoned.

Putting the learning to work

Given the poor results from our first attempt at automating our way to a consistent code format, it was another 4 years before we looked at this issue again. By then, our engineering team had grown 400%, entire new teams had come into existence, and the downsides of the inconsistencies had grown far more painful. It was time to try again, but this time we took a holistic look at how we interact with code in our development process. Working one phase at a time, we analyzed the proper way to involve a formatting tool at each step.

ClangFormat

Beyond introspecting the process-related problems with our last attempt, we also decided on a new tool. ClangFormat is an LLVM tool for formatting code that utilizes the clang tokenizer to parse C, C++, Objective C, Java, and JavaScript files. Because it uses the clang tokenizer, it supports the same C++ constructs as the clang compiler, which we use. This ensures the formatter does not constrain how we use C++. Other tools with separate parsers don’t offer that guarantee.

You want to come prepared with solutions for the most obvious ways that your project could be a worthless boondoggle.

Given our ongoing interest in the ecosystem of clang tools (clang-tidy, clang-modernize, etc.), ClangFormat is a no-brainer for us. For that same reason, we have more confidence in the long-term viability of the project than other tools. And as a part of a compiler toolkit, it can use the same front end components to lex code, which appears to give it a deeper understanding of what it is formatting. For example, it can wrap long lines whether the lines consist of code, strings, comments, or arrays -- something which astyle has no way of doing.

Doing it right

Running a tool like ClangFormat is relatively straightforward. The challenging part is actually rolling it out across a large development team, and ensuring that all checked in code adheres to the format going forward. Doing that right requires thoughtful planning.

Getting buy-in...after you've done your homework

With a potentially divisive topic that will affect every coder on your team, nothing could be more crucial than getting buy-in. And if it seems to a developer that you are bringing a new tedious, manual process into their lives, they will resist it. On the other hand, an automated, seamless process that makes it easier to write and review code will sound much better -- that’s why you're not going to call any meetings or start making any grand claims until you've done enough advance work. When you present the idea, you want to have solutions for the most obvious ways that your project could be a worthless boondoggle instead of a path to the promised land, and you want to make it clear that there will be mechanisms in place to handle input and feedback.

For our project, a fellow developer and I wrote up a scope document outlining the project. We then presented it to the small team of development leads just as we would with any planned feature. We outlined goals and non-goals, identified assumptions, dependencies, and questions, and provided a proof-of-concept. Having demonstrated that we had done our homework, we were given the go-ahead to build out a functional specification.

Planning

What code formatting rules to use? More importantly, who chooses the rules? What about the existing code -- do you reformat it all? One fell swoop, or ad hoc, as the files are edited? What about uncommitted changes made before the big reformat? Finally, how do you hold the line against natural entropy in a code base?

You can’t answer these questions without first understanding the particulars of your chosen tool. While our roadmap is specific to ClangFormat, the principles apply equally for any code formatting tool, such as StyleCop, gofmt, or Jindent. Every tool has its limitations and assumptions, and it might not be able to perfectly accommodate your ideal format. If that turns out to be the case, compromise. Avoid workarounds, extra steps, or patches to the source of the formatter, unless the issue is a core one, in which case you should consider a different tool if possible. Add customization only as a last resort; it is not core to your work, will be expensive to maintain, and will likely impede adopting new versions of the tool.

Up next

At this point, you’ve gotten a look at how we embarked on the journey of implementing a code formatting workflow. Actually getting it done, though, took us through a series of decisions on what, when and how. In part 2 of this series, we'll discuss specifics of the choices we made, leading up to the day we reformatted the entire codebase.