The Story of Reformatting 100k Files at Google in 2012

A Mysterious Meeting

Back in September 2012, I was a junior engineer at Google, working on Bazel (Google’s build tool, also known internally as Blaze). One day, a mysterious calendar invite landed in my inbox. It was sent by two engineers in the US, and I was invited along with my team lead.

I quickly recognized the names: Rob Pike and Russ Cox. Though I hadn't worked with them, I knew them by reputation: Russ Cox because I enjoyed reading his blog posts, and Rob Pike because, well… he’s famous. During the meeting, Rob and Russ shared their ambitious plan: to reformat every single Bazel BUILD file in Google’s codebase and enforce this formatting with a presubmit script.

Code Formatting

At that time, code formatters were not as common. Popular tools like Python’s Black, Clang Format, or Prettier didn't exist yet. The main example of a formatter enforced everywhere was gofmt. As Russ and Rob worked in the Go team, they wanted to replicate this success for Bazel’s BUILD files.

BUILD files had inconsistent formatting, mixing various indentation styles without a standard guide. It was often easy to spot copy-pasted blocks of code because their formatting stood out. Past discussions about style guides showed there were lots of disagreements among engineers. The previous attempt to write a formatting tool had limited adoption. It had some issues and it was hard to get adoption.

Russ had already developed a new version of the tool, called Buildifier, using Go. Writing a formatter can be very difficult in some cases, but the complexity depends on a few things:

  • How to deal with comments? Buildifier attaches the comments to a nearby node in the syntax tree (the data structure is not super fine-grained, so in some cases, Buildifier will move the comment).
  • How to split lines? Buildifier splits lines in a few hard-coded cases, but it doesn’t care about line length. This is a significant simplification.
  • How to preserve existing line splits? Buildifier preserves some of them in the syntax tree, in a few specific places.
  • Do we need partial formatting? Buildifier always reformats the entire file, not just the lines that were modified.

Because of the design choices, Buildifier was fairly simple. The implementation looked promising, but we had to consider how to roll it out. Russ tested it on the entire codebase and was able to reformat all files in a few minutes. He was asking us for the permission to roll out the tool and enforce it as a presubmit check.

The idea of reformatting every BUILD file sounded crazy and very disruptive. With around 10,000 engineers changing BUILD files every day in the codebase, enforcing a strict formatting rule seemed scary. Could we reject every commit that doesn't match the output byte for byte? Could we relax the presubmit rules or make the tool opt-in? Russ insisted: yes, it had to be strictly enforced for everyone. No exceptions, no settings in the tool, no personal preferences.

After a few days, the plan was formally approved.

Rollout

I got to help with the rollout in a few places. For example, I made changes to the grammar to formalize the build language, and I wrote a style guide -- so I decided, once and for all, how BUILD files should look like. I also needed to document which lists could be safely reordered since one notable feature of Buildifier is its ability to sort lists in certain cases (e.g., in a build, the list of source files is sometimes order-independent).

Of course, we had to integrate Buildifier in every major code editor. If the code is formatted on save, no one will get bad surprises when trying to submit the code. In the codebase, some tools generated BUILD files. Tools shouldn’t try to generate formatted code by themselves; they should delegate it to the formatting tool, which acts as a source of truth and may change over time.

How do you test if the formatter doesn’t break code? The first idea is to compare the syntax trees, but this doesn’t work when we reorder list items. Bazel has a very convenient feature, called Bazel Query, which can output the information about a package. I taught Bazel which lists are order-independent, so we could use Bazel Query to check that the reformatting didn’t impact Bazel understanding of a file. On top of that, Google has a nice testing infrastructure. It’s of course possible (but slow, and computationally expensive) to run the tests after this kind of change.

How do you submit changes to 100,000+ files? At that scale, lots of tools break. So Google has a tool to split a large change in smaller changes that can be submitted independently. In case of conflict, the tool was instructed to revert any conflicting file. To bypass code approvals, the changes were sent to a “global approver”, able to approve changes anywhere in the Google codebase.

Aftermath

Contrary to my expectations, this was relatively uneventful. Almost no one complained about the new requirement. Almost no one complained about the new formatting style.

I remember the previous lengthy discussions about indentation, parentheses position, etc. without any consensus. Yet, when Buildifier was rolled out, people didn’t actually care about the style decisions. They just enjoyed the uniformity.

“The formatting issue is a problem that should not exist and that we care enough about to spend our own engineering time eliminating it. I understand that it does not seem a priori that automated formatting would make a significant difference. I didn't truly understand it myself until we eliminated it from the Go code review and code editing processes. Hopefully in six months or a year, when everything is converted and we look back at this, the benefits will be clearer in retrospect.” — Russ Cox

So, was it worth it? Yes, a thousand times. This experience taught me the value of uniformity and the power of automated tools to improve productivity.

The benefits of reformatting became apparent very quickly. Buildifier didn’t just reformat our BUILD files; it radically changed our approach to code maintenance and large-scale changes. Before Buildifier, large-scale changes in BUILD files were extremely hard to perform. Refactoring tools tried to detect and match the existing formatting style, but they couldn't work well (in particular in presence of code comments and multiline expressions) and there were lots of pushback in code reviews. With Buildifier, such changes became routine. This enabled lots of enhancements in Bazel that were previously considered impossible. This allowed us to fix legacy design issues.

In particular, this made it later possible for me to replace the Bazel build language, from Python to Starlark.

Updates

  • Credits also go to Nilton Volpato who wrote the original version of Buildifier, helped the transition and maintenance to the new implementation.
  • Why not enable the new formatter without updating all the files? This would lead to long diffs whenever someone modifies a file, making the change hard to review. The cost would be spread over all the engineers in the company.
  • The number of files was actually 193k at the beginning of the migration and 216k at the end. Migrations like this have to deal with the codebase growth happening at the same time.
  • It was in 2012. The title previously said 2011.

    Discuss

    Comments are closed, but there are discussions about the article in other places, e.g.

  • Discussion on HackerNews. In particular, Russ Cox (rsc) adds more background information.
  • Discussion on Reddit.
  • Discussion on lobste.rs.
code