Safe Software Deployments: The Goldilocks Gauge

Mark Porter

Once upon a time, software was written to magnetic tapes or burned onto CDs and sent to customers through the mail. It was an expensive, time-consuming distribution process — and one that didn’t lend itself to updates. You either got it right or wrong. In fact, these shipments were so high-stakes that the final CD or tape was called “the golden master.” As a result, software companies would typically ship new versions of their software only every two to three years.

It was a terrifying time for developers. These “Big Bang” deployments meant that one bug could cost a company millions. Imagine recutting 100,000 tapes. And a single developer could be responsible for the company not making its quarterly numbers.

These deployments were too big.

Today, we live in a world in which software can be continuously improved. Developers no longer have to wait years to see their work in the hands of users. In fact, some software goes into production as it's being written. Think of Eclipse hooked directly up to unit tests, integration tests, and a CI/CD pipeline. But this comes with its own set of problems. For one, this amounts to integration testing in production and therefore requires incisive instrumentation — at least if you want to see problems as they arise, or if you want the ability to back out of the new code without damaging user data. Additional complexity comes in the form of feature flags to toggle between code paths. These require more work and should be removed once a new feature is rolled out and stable. Occasionally, removing the scaffolding to support this style of continuous nano-deployment can activate unknown bugs.

In my personal experience at big and small companies, this is just as bad as big bang releases. There comes a tiny unit size of deployment where the overhead of the system and the cognitive load on the teams actually increases.

These deployments are too small.

As you might have guessed by now, the Goldilocks Gauge is all about finding the pace and size of deployment that is just right; the perfect amount that keeps the engineering team in flow, that state where everybody is working at top productivity and any cognitive load is about the business value they are trying to produce and the complexity of the software and data needed to produce that value. How do I define that amount? It’s a quantity of innovation that is small enough to hold in your head, but large enough to deliver measurable value.

Let me give an example. At one of my previous employers, we used to average about 90 deployments a week. It wasn’t enough. The tech team was more than 2,000 people, and deployment on each team was often once a quarter (or worse). As a result, code wasn’t being tried out fast enough in production, slowing down the delivery of customer value. The deployments were often so complicated that debugging required many people and many hours. That’s not what you want for a live-side app used by millions of people. You want deployments that are small enough to quickly debug, and shipped often enough that people still have all the context in their heads.

Years before this, it had been even worse, with only about 10 services, deploying once per quarter or less. Getting to 90 deployments a week was a great achievement. So we can summarize that “small deployments, shipped often” is the goal. This isn’t a surprise to most of you.

But, sadly, even though we now had a lot more services and most deployed regularly, the main services were still monoliths and deployed way too infrequently. And that ‘monolith’ word leads me to another problem. In addition to having deployments be small and often, you want to limit the number of people who work on each one. It’s just another kind of complexity — an even more subtle one. A monolith has lots of lines of code, lots of dependencies, and lots of people working on it. When you only have one release once per quarter, and there are 100 people working on the service, every one of those people likely has multiple code changes in there. The complexity builds and builds — and becomes larger than anybody can hold in their head.

Complexity is the enemy. That complexity can be the complexity of the code itself or the complexity of the human relationships and knowledge needed to write and maintain it. Just like you want to have each piece of code depend on a small number of others, you want the same for the people in your organization. Some of you may be familiar with the Dunbar Number, which refers to the maximum number of people with whom you can establish and maintain relationships. The Dunbar number also refers to how many people are in each of your circles of friendship: there is a tight circle to whom you relate quite easily, an intermediate group that you’re still quite comfortable with, and larger groups made up of acquaintances.

I’m going to take some liberties with Dunbar’s research and say that in some ways, this applies to teams of software developers as well. Teams need to be small enough to foster familiarity and maintain context, which leads to trust. Teams need to engage with units of work that are simple and easy to understand. These units need to be small enough to hold in one person’s brain, so that when they get an error, they can go right back in and fix it — or know exactly who to go to. Familiarity, trust, and small units of work create the conditions for rapid problem resolution.

Of course, you then build up these small teams into larger units, all producing software in harmony — with loose coupling, but tight alignment. This is a critical part of complexity management, which also includes clean architectures and coding best practices.

So what did we do? We broke the code and the databases down into smaller and smaller pieces. The teams grew the number of services by a factor of ten, from 40 to 400. And we made our teams smaller, with each team being independent but also being part of larger groups. Over the next year, we went from 90 deployments a week to more than 1,100, with each smaller team now deploying their software multiple times a week.

We increased the velocity of innovation and reduced downtime by 75% at the same time.

These deployments were just right. They were the right size, shipped at the right rate, with the right number of people involved in each one.

And, just as Goldilocks was happy when she found the porridge that was just right, our engineers, product managers, and executives were happier with deployments that were just right. Because the one thing that makes everybody at a tech company happy is getting new code and features into the hands of end users faster and with less drama. Of course, the Goldilocks Gauge is not possible without the 180 Rule and Z Deployments, both of which help eliminate the fear of deployment. Combined, they help create a system of safe software deployment. I’ll be sharing the final element of this system in my next post, where I’ll explain my “Through the Looking Glass” theory of aligning your development, staging, and production environments.

Of course, your systems may vary, and may even be better than what I’ve come up with. I’d love to hear about your experiences and tricks for safe deployments. Reach out to me on LinkedIn or on Twitter at @MarkLovesTech. You can also see a collection of these blogs, as well as a bunch of other things I’ve found useful, at marklovestech.com.