The Dichotomy of Delivery

My first real job in software was working on a team that shipped box software. Every two years the team cut a “major release” which was burned onto physical CDs and DVDs and shipped to actual customers for installation on their own hardware. We were able to release updates, but the process was difficult and costly. These were mostly handled as biannual “update pack” releases, containing all of the feature improvements and bug fixes that had landed and been verified in that time period.

Delivering software in this way affected everything we did as a team on a day-to-day basis. The requirements for each feature needed to be ready early enough for engineers to estimate the work items, so we could make sure everything would fit into a release. Every feature being implemented across dozens of subteams had to be ready at precisely the same time for final qualifications and testing by the QA teams. All UIs needed to be ready by another deadline and finalized for translation into hundreds of different languages.

The cost of something not being ready in time was huge. It would either miss the release and need to wait years for the next one, or cause a delay to everything else that had been promised to customers for months. The final months before the release was cut was obviously a stressful time on the team. Bugs were marked “won’t fix” relentlessly in daily triage meetings. Cheaper, dirty solutions were imagined and implemented with reckless abandon. The show must go on. The product had to ship.

Thankfully the shift from box software to services has made more rapid delivery practical. Online services can be deployed to all users in seconds, rendering large batch releases a thing of the past. Taking this transformation to the limit results in continuous delivery, a set of practices designed around releasing each individual change to users as it is completed.

Is faster always better? Will the trend of releasing more often continue in perpetuity? I argue that (like all things), software delivery exists on a spectrum and the optimal frequency for delivery lies somewhere in the middle. The exact optimum will vary based on several factors specific to each team, target platform and customer base. Every team will need to examine these factors and decide the optimal release frequency on their own.

Reductio ad absurdum

To take frequent delivery to the limit, let’s imagine the fastest possible method of shipping software. I can’t think of a way to release software more often than it is written. The smallest discrete shippable unit is probably a revision of a file, at the filesystem level. Every time a developer hits the save key, we are left with a set of source code that could, theoretically be released to a production environment.

The general “shift-left” movement of the industry has even created a vast array of tooling designed to build, test and deploy software at this speed to aid in the development and debugging process. So why not take this straight to production? This question is cringe-inducing in many ways. Zero validation has been done on software at this point. It may not even compile, let alone work correctly or be safe to release to end users.

The best argument against these safety and correctness concerns is probably that a sufficiently correct test suite could be inserted into this pipeline to prevent bad code from being deployed. Anything that passes these suites is safe to roll out, and anything that fails will be automatically prevented. Automation of this kind is almost always a win. Removing humans from the validation loop wherever possible is a great way to lower costs and ensure correctness. The key part of that last sentence however, is “wherever possible” - it is not always possible or desirable to remove humans from the verification loop.

Code needs to be reviewed for style in ways we have not been able to fully automate yet. New tests for new behavior need to be reviewed for correctness. Just because the existing suite passes doesn’t mean changes to the suite are correct. Everyone makes mistakes. Reviewing code does not stop all mistakes, but it is an accepted best practice that helps stop the majority.

Other factors

Like all thins, there is probably a limit to the frequency at which software should be deployed in order to balance safety and productivity. The optimal deployment rate lies somewhere in the middle of the spectrum between pushing on every “save” and annual releases. The sections below discuss what other factors need to be considered when determining how often a team should push software.

Feedback time

One important metric for the operation of any complex system is the amount of time for feedback to circulate. Long feedback loops are the enemy of any process engineer and can lead to bewildering, non-desirable behavior. Software operation is no different. The time between when a software release is “cut” and when it is determined to be stable has huge implications on the ideal release rate. Where possible this feedback time should be cut down, but some factors may be outside the control of an engineering team.

Consider a piece of client-side enterprise software that is installed and upgraded on customer schedules. No matter how much you would like your customers to install the new patch today, their own business requirements and processes determine their uptake rate. The same is true for software delivered to other software teams as a library, mobile applications and firmware or device drivers.

Some software, on the other hand, can be updated and rolled out on-demand at the discretion of the supporting engineering team. Examples of this include web applications, APIs and other centralized applications that users interact with remotely. In these cases, users interact with whatever version of software is running, which is controlled by the engineering and operations teams. Users cannot select their version, or rollback independently.

Releasing software faster than its users will interact with it leads to unnecessary complexity. When a user finally discovers an issue in their version of software, fixes must be applied to every subsequent release. A proliferation of supported, running versions leads to a proliferation in headaches, complicating support for both the engineering team and customers.

The time it takes to reach confidence in a given release of software should be considered in deciding the optimal release frequency for a team. In many cases this confidence can only be achieved after the software is in use by real customers. This means customer uptake rate is a key factor in determining release frequency.

Rollout Time

How long a rollout actually takes to complete is also an important factor in deciding how often a team should deploy their code. A codebase that can build and test in seconds or minutes is a much different beast from one that takes hours or even days to build and validate. Deployments to a small set of servers behind a fast network are also much different from global deployments across thousands or hundreds of thousands of servers or devices.

The relationship between change frequency and rollout times lead to a few different regimes. When your release time is much, much less than the average time between commits, you can simply neglect this factor and assume you will never have conflicting releases. Each commit to master can be built, tested and released independently as it is merged.

When you have a large number of developers working on a system, the rate of changes may exceed how fast they can be rolled out. In this case, it can be desirable to batch up these changes into logical, releasable units. One method is rolling releases, where the next release starts with whatever changes have been merged since the last one completed. Other strategies are daily or weekly releases, where a schedule determines the next push.

Risk

The 2019 DORA Report shows that there is no tradeoff between release speed and safety. High performing teams release faster and more reliably than lower performing teams. The obvious goal here is to strive to be as high-performing a team as possible. There is unfortunately a temptation to copy the external practices of such a team, rather than doing the work to understand what makes it high performing. I suspect release frequency is one of these areas.

Releasing faster, by itself, does not make a team high performing or add stability to a service. Teams need to be high performing to release frequently and safely. Automated testing, monitoring and observability all need to be in place before on-demand deployments become safe. When a team has reached this level of discipline and performance, delivery speed can naturally increase along with reliability.

This is not to say teams must slow down their releases in order to improve reliability. Small releases are easier to roll-out safely, make debugging issues easier and are generally simpler to rollback than larger releases. On the other hand, releasing frequently without sufficient automation, monitoring and observability will only multiply the complexity of your production environment and lead to more issues.

Finally, some fields of software may have hard reliability requirements that necessitate slow, progressive rollouts across geographical and business segments. It may take hours or days to collect the necessary production metrics to verify a release across a small percentage of users. These types of releases are carefully observed by teams of operators and engineers, and designed to optimize for the highest possible levels of reliability, potentially at the cost of velocity.

Automation

The easier it is to push software, the more rapidly it can be done. If a release takes a large amount of manual work by multiple engineers, each deployment will consume valuable time that the engineering team could be spending on something else.

Fully automated “deploy-on-green” systems that require no manual work are the ideal, but may not be possible for every team. Here are a few things that might require manual work, even when the delivery process itself is automated:

While all or many of these things could be done on a rolling basis in a continuous delivery setting, it may be more effficient to batch them up. In addition, users might find larger sets of changes easier to digest and understand: a small breaking change every day or week can feel worse psychologically than a large set of changes every month or quarter.

All of these other business factors must be considered and balanced when determining how often releases should go out.

Conclusion

Teams should generally release as quickly as they possibly can, given their unique safety constraints. If the time it takes to deploy and verify code exceeds the average rate of incoming code, changes should be batched into logical, predictable units such as hourly or weekly releases. Teams should always strive to increase the frequency at which they can release safely through automation, observability and monitoring. Finally, business requirements driven by end users may dictate or influence the release policies used by a team.