- With metrics teams must remember Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure.”
- Low-performing teams take a hit on stability when they try to increase their deployment frequency simply by working harder.
- Driving improvements in the metric may lead to taking shortcuts with testing causing buggy code or producing brittle software quickly.
- A high change failure rate may reduce the effectiveness of the other metrics in terms of measuring progress toward continuous delivery of value to your customers.
Since 2014, Google’s DevOps Research and Assessment (DORA) team has been at the forefront of DevOps research. This group combines behavioural science, seven years of research, and data from over 32,000 professionals to describe the most effective and efficient ways to deliver software. They have identified technology practices and capabilities proven to drive organisational outcomes and published four key metrics that teams can use to measure their progress. These metrics are:
- Deployment Frequency
- Lead Time for Changes
- Mean Time to Recover
- Change Failure Rate
In today’s world of digital transformation, companies need to pivot and iterate quickly to meet changing customer requirements while delivering a reliable service to their customers. The DORA reports identify a range of important factors which companies must address if they want to achieve this agility, including cultural (autonomy, empowerment, feedback, learning), product (lean engineering, fire drills, lightweight approvals), technical (continuous delivery, cloud infrastructure, version control) and monitoring (observability, WIP limits) factors.
While an extensive list of “capabilities” is great, for software teams to continually improve their processes to meet customer demands they need a tangible, objective yardstick to measure their progress. The DORA metrics are now the de facto measure of DevOps success for most and there’s a consensus that they represent a great way to assess performance for most software teams, thanks to books like Accelerate: The Science of Lean Software and DevOps (Forsgren et al, 2018) and Software Architecture Metrics (Ciceri et al, 2022).
But when handling metrics, teams must always be careful to remember Goodhart’s law: “When a measure becomes a target, it ceases to be a good measure.” The danger is that metrics become an end in themselves rather than a means to an end.
Let’s explore what this might look like in terms of the DORA metrics — and how you can avoid pulling the wool over your own eyes.
For the primary application or service you work on, how often does your organisation deploy code to production or release it to end users?
At the heart of DevOps is an ambition that teams never put off a release simply because they want to avoid the process. By addressing any pain points, deployments cease to be a big deal, and your team can release more often. As a result, value is delivered sooner, more incrementally, allowing for continuous feedback from end users, who then shape the direction of travel for ongoing development work.
For teams who are currently only able to release at the end of a biweekly sprint or even less often, the deployment frequency metric hopefully tracks your progress toward deployments once a week, multiple times a week, daily, and then multiple times a day for elite performers. That progression is good, but it also matters how the improvements are achieved.
What does this metric really measure? Firstly, whether the deployment process is continuously improving, with obstacles being identified and removed. Secondly, whether your team is successfully breaking up projects into changes that can be delivered incrementally.
As you celebrate the latest increase in deployment frequency, ask yourself: are our users seeing the benefit of more frequent deployments? Studies have shown that low-performing teams take a big hit on stability when they try to increase their deployment frequency simply by working harder (Forsgren, Humble, and Kim, 2018). Have we only managed to shift the dial on this metric by cracking the whip to increase our tempo?
Lead Time for Changes
For the primary application or service you work on, what is your lead time for changes (that is, how long does it take to go from code committed to code successfully running in production)?
While there are a few ways of measuring lead times (which may be equivalent to or distinct from “cycle times,” depending on who you ask), the DORA definition is how long it takes from a feature being started, to a feature being in the hands of users.
By reducing lead times, your development team will improve business agility. End users don’t wait long to see the requested features being delivered. The wider business can be more responsive to challenges and opportunities. All this helps improve engagement and interplay between your development team, the business, and end users.
Of course, reduced lead times go hand in hand with deployment frequency. More frequent releases make it possible to accelerate project delivery. Importantly, they ensure completed work doesn’t sit around waiting to be released.
How can this metric drive the wrong behaviour? If your engineering team works towards the metric rather than the actual value the metric is supposed to measure, they may end up taking shortcuts when it comes to testing and releasing buggy code, or code themselves into a corner with fast but brittle approaches to writing software.
These behaviours produce a short-term appearance of progress, but a long-term hit to productivity. Reductions in lead times should come from a better approach to product management and improved deployment frequency, not a more lax approach to release quality where existing checks are skipped and process improvements are avoided.
Mean Time to Recover
For the primary application or service you work on, how long does it generally take to restore service when a service incident or a defect that impacts users occurs (for example, unplanned outage, service impairment)?
Part of the beauty of DevOps is that it doesn’t pit velocity and resilience against each other but makes them mutually beneficial. For example, frequent small releases with incremental improvements can more easily be rolled back if there’s an error. Or, if a bug is easy to identify and fix, your team can roll forward and remediate it quickly.
Yet again, we can see that the DORA metrics are complementary; success in one area typically correlates with success across others. However, driving success with this metric can be an anti-pattern – it can unhelpfully conceal other problems. For example, if your strategy to recover a service is always to roll back, then you’ll be taking value from your latest release away from your users, even those that don’t encounter your new-found issue. While your mean time to recover will be low, your lead time figure may now be skewed and not account for this rollback strategy, giving you a false sense of agility. Perhaps looking at what it would take to always be able to roll forward is the next step on your journey to refine your software delivery process.
It’s possible to see improvements in your mean time to recovery (MTTR) that are wholly driven by increased deployment frequency and reduced lead times. Alternatively, maybe your mean time to recovery is low because of a lack of monitoring to detect those issues in the first place. Would improving your monitoring initially cause this figure to increase, but for the benefit of your fault-finding and resolution processes? Measuring the mean time to recovery can be a great proxy for how well your team monitors for issues and then prioritises solving them.
With continuous monitoring and increasingly relevant alerting, you should be able to discover problems sooner. In addition, there’s the question of culture and process: does your team keep up-to-date runbooks? Do they rehearse fire drills? Intentional practice and sufficient documentation are key to avoiding a false sense of security when the time to recover is improving due to other DevOps improvements.
Change Failure Rate
For the primary application or service you work on, what percentage of changes to production or releases to users result in degraded service (for example, lead to service impairment or service outage) and subsequently require remediation (for example, require a hotfix, rollback, fix forward, patch)?
Change failure rate measures the percentage of releases that cause a failure, bug, or error: this metric tracks release quality and highlights where testing processes are falling short. A sophisticated release process should afford plenty of opportunities for various tests, reducing the likelihood of releasing a bug or breaking change.
Change failure rate acts as a good control on the other DORA metrics, which tend to push teams to accelerate delivery with no guarantee of concern for release quality. If your data for the other three metrics show a positive trend, but the change failure rate is soaring, you have the balance wrong. With a high change failure rate, those other metrics probably aren’t giving you an accurate assessment of progress in terms of your real goal: continuous delivery of value to your customers.
As with the mean time to recover, change failure rate can—indeed should—be positively impacted by deployment frequency. If you make the same number of errors but deploy the project across a greater number of deployments, the percentage of deployments with errors will be reduced. That’s good, but it can give a misleading sense of improvement from a partial picture: the number of errors hasn’t actually reduced. Perhaps some teams might even be tempted to reduce their change failure rate by these means artificially!
Change failure rate should assess whether your team is continuously improving regarding testing. For example, are you managing to ‘shift left’ and find errors earlier in the release cycle? Are your testing environments close replicas of production to effectively weed out edge cases? It’s always important to ask why your change failure rate is reducing and consider what further improvements can be made.
The Big Picture Benefits of DevOps
Rightfully, DORA metrics are recognized as one of the DevOps industry standards for measuring maturity. However, if we think back to Goodhart’s Law and start to treat them as targets rather than metrics, you may end up with a misleading sense of project headway, an imbalance between goals and culture, and releases that fall short of your team’s true potential.
It’s difficult to talk about DORA metrics without having the notion of targets in your head; that bias can slowly creep in and before long you’re unknowingly talking about them in terms of absolute targets. To proactively avoid this slippery slope, focus on the trends in your metrics – when tweaking your team’s process or practices, relative changes in your metrics over time give you much more useful feedback than a fixed point-in-time target ever will; let them be a measure of your progress.
If you find yourself in a team where targets are holding you hostage from changing your process, driving unhelpful behaviours, or so unrealistic that they’re demoralising your team, ask yourself what context is missing that makes them unhelpful. Go back and question what problem you’re trying to solve – and are your targets driving behaviours that just treat symptoms, rather than identifying an underlying cause? Have you fallen foul of setting targets too soon? Remember to measure first, and try not to guess.
When used properly, the DORA metrics are a brilliant way to demonstrate your team’s progress, and they provide evidence you can use to explain the business value of DevOps. Together, these metrics point to the big-picture benefits of DevOps: continuous improvements in the velocity, agility, and resilience of a development and release process that brings together developers, business stakeholders, and end users. By observing and tracking trends with DORA metrics, you will have made a good decision that facilitates your teams and drives more value back to your customers.