Julian Canlas
Julian Canlas

Founder of the SEO content marketing agency, Embarque.
julian@embarque.io @jic94

Change Failure Rate Best Practices for Agile Teams

Do you know how well your development team is performing? Do all deployments go off without a hitch? If not, it’s time to incorporate performance metrics such as the Change Failure Rate or CFR into your toolkit. CFR is a metric for measuring how often deployments or software updates fail.

If your team is deploying frequently, a high CFR can spell trouble. Therefore, it’s important to keep users in the loop with beautiful status pages by Instatus.

In this article, we define what CFR is, how to calculate and track it, comparing your team’s performance against others, and how to lower CFR without impacting deployment schedules.

What is Change Failure Rate (CFR)?

Have you ever felt the need for speed? Sometimes success seems like it hinges on how quickly you can push out new updates and that it’s the only way to become a great development team. But that’s not always true. Quality matters as much, if not more, than deployment speed.

CFR measures the percentage of changes that cause service interruptions, outages, patches, or rollbacks. These incidents require developers’ intervention, and pulls them away from the next code release. Tracking CFR is the best way to learn about deployment quality.

Writing code isn’t easy, and creating code without bugs is even more challenging. If your CFR does not meet organizational targets, your code review and deployment process needs attention. Conversely, if the CFR is low, you can increase deployment frequency.

How to Measure Change Failure Rate?

Calculating CFR is easy once you have proper monitoring systems. Any deployment that requires remediation or results in degraded service for users counts as a change failure. To calculate CFR, divide the number of failures by the total number of deployments over a period of time.

Here’s the formula to calculate CFR:

# of change failures / total # of deployments = change failure rate %

Rating your Team’s Performance

Once you’ve calculated CFR, compare it against the baseline to better understand your team’s performance:

  • Low Performance – 45% to 60% CFR
  • Medium Performance – 15% to 45% CFR
  • High Performance – 0% to 15% CFR
  • Elite Performance – 0% to 15% CFR

If your team’s CFR is between 15% to 60%, it’s time to improve change quality.

Tracking CFR Over Time

CFR can be tracked over time by adjusting the time period. For example, to track CFR over six months and see if the team is improving or regressing, you can break the metric into two-month increments:

Jan - Mar: 33 failures / 100 deployments = 33% CFR

Mar - May: 22 failures / 100 deployments = 22% CFR

May - July: 14 failures / 100 deployments = 14% CFR

What does this result tell us? This team started on the lower end of medium-performance with a CFR of 33% or a third of deployments requiring remediation. However, over 6 months, there was a dramatic improvement, with the team reducing CFR to 14% or the elite performance category. Most deployments became high quality and no longer required rollbacks or hotfixes.

The overall six-month CFR for this team was:

Jan - July: 69 failures / 300 deployments = 23% CFR

If this team continues at this rate, it will achieve a CFR of less than 15% within the next six months.

Understanding the Big Picture

It’s also possible to combine CFR with three other performance-based metrics, known as DORA, to quantify how well your team is performing.

  • Deployment Frequency
  • Mean Time to Recovery (MTTR)
  • Lead Time for Changes
  • Change Failure Rate

The four DORA metrics are interconnected and improving the performance for one metric can degrade results for another, especially if the relationship isn’t understood. MTTR and CFR reveal information about the stability of your software and the strength of your incident response. At the same time, Deployment Frequency and Lead Time for Changes indicate how efficient your team is.

CFR is closely linked to Deployment Frequency. A balance between these two metrics is vital, but if your team over-emphasizes Deployment Frequency, developers may feel rushed to finish updates or push deployments. In such a scenario, code review and testing might also be rushed or skipped altogether. This can result in high-performing teams under the Deployment frequency metric, but it also means a high CFR percentage.

To get the full picture, you want to monitor all four DORA metrics together to understand your team’s performance over time. When your team decides to improve one DORA metric, take a calculated approach.

Improving Your Change Failure Rate

If you want to improve your CFR, you must determine what’s causing failures. Most deployments fail because of one of three reasons:

  1. Deployment Errors
  2. Poor Testing
  3. Code Quality

Beyond CFR, it’s essential to find the root cause of a failure. Once you’ve collected enough data and categorized failures as either a deployment error, poor testing, or code quality, you can begin to address the problem. Remember to have a system, like Instatus, to update your user base on any service disruptions. Status pages maintain user confidence and trust while your team resolves these issues.

1. Deployment Errors

Most failed deployments are the result of human error. The best way to reduce such failures is through deployment automation. Incorporate deployment automation tools like Jenkins or Electric Flow as they will help meet the demands of continuous integration and deployment by removing the human element.

2. Poor Testing

You should always be automating testing. Automated testing provides enormous returns for your team as it removes the need for slow and expensive manual testing. This means developers have more time to write high-quality code and work on other creative efforts. Storybook, Jest, and Postman are some great options to get started with automated testing.

3. Code Quality

To improve code quality, ensure you have a code review process in place. Junior developers should be mentored and have all their written code reviewed. This is not a punishment, and since your team is only as strong as your weakest link, make it a positive experience. Senior developers should systematically review code and ask lots of questions.

Final Thoughts on Change Failure Rate

Your customers expect a high-quality product, and you don’t want to disappoint. A great way to stay on top of everything by monitoring your team’s deployments through CFR.

Ideally, you want zero rollbacks or service interruptions, but the unexpected can happen no matter how good your code review, testing, and deployment automation is. That’s why it’s important to keep users informed with status pages from Instatus, while aiming for a 15% or less CFR. Find the balance between CFR and Deployment Frequency that works for you, and success will follow.

Instatus status pages
Hey, want to get a status page?

Get a beautiful status page that's free forever.
With unlimited team members & subscribers!

Check out Instatus

Start here
Create your status page or login

Learn more
Check help and pricing

Talk to a human
Chat with us or send an email

Statuspage vs Instatus
Compare or Switch!

Updates
Changesblog and open stats

Community
Twitter, slack, now and affiliates

Policies·© Instatus, Inc.