Julian Canlas
Julian Canlas

Founder of Embarque. julian@embarque.io

Change Failure Rate Best Practices for Agile Teams

Do you know how well your development team is performing? Do all deployments go off without a hitch? If not, it’s time to incorporate performance metrics such as the Change Failure Rate or CFR into your toolkit. CFR is a metric for measuring how often deployments or software updates fail.

If your team is deploying frequently, a high CFR can spell trouble. Therefore, it’s important to keep users in the loop with beautiful status pages by Instatus.

In this article, we define what CFR is, how to calculate and track it, comparing your team’s performance against others, and how to lower CFR without impacting deployment schedules.

What is Change Failure Rate (CFR)?

Have you ever felt the need for speed? Sometimes success seems like it hinges on how quickly you can push out new updates and that it’s the only way to become a great development team. But that’s not always true. Quality matters as much, if not more, than deployment speed.

CFR measures the percentage of changes that cause service interruptions, outages, patches, or rollbacks. These incidents require developers’ intervention, and pulls them away from the next code release. Tracking CFR is the best way to learn about deployment quality.

Writing code isn’t easy, and creating code without bugs is even more challenging. If your CFR does not meet organizational targets, your code review and deployment process needs attention. Conversely, if the CFR is low, you can increase deployment frequency.

How to Measure Change Failure Rate?

Calculating CFR is easy once you have proper monitoring systems. Any deployment that requires remediation or results in degraded service for users counts as a change failure. To calculate CFR, divide the number of failures by the total number of deployments over a period of time.

Here’s the formula to calculate CFR:

# of change failures / total # of deployments = change failure rate %

Rating your Team’s Performance

Once you’ve calculated CFR, compare it against the baseline to better understand your team’s performance:

  • Low Performance – 45% to 60% CFR
  • Medium Performance – 15% to 45% CFR
  • High Performance – 0% to 15% CFR
  • Elite Performance – 0% to 15% CFR

If your team’s CFR is between 15% to 60%, it’s time to improve change quality.

Tracking CFR Over Time

CFR can be tracked over time by adjusting the time period. For example, to track CFR over six months and see if the team is improving or regressing, you can break the metric into two-month increments:

Jan - Mar: 33 failures / 100 deployments = 33% CFR

Mar - May: 22 failures / 100 deployments = 22% CFR

May - July: 14 failures / 100 deployments = 14% CFR

What does this result tell us? This team started on the lower end of medium-performance with a CFR of 33% or a third of deployments requiring remediation. However, over 6 months, there was a dramatic improvement, with the team reducing CFR to 14% or the elite performance category. Most deployments became high quality and no longer required rollbacks or hotfixes.

The overall six-month CFR for this team was:

Jan - July: 69 failures / 300 deployments = 23% CFR

If this team continues at this rate, it will achieve a CFR of less than 15% within the next six months.

Understanding the Big Picture

It’s also possible to combine CFR with three other performance-based metrics, known as DORA, to quantify how well your team is performing.

  • Deployment Frequency
  • Mean Time to Recovery (MTTR)
  • Lead Time for Changes
  • Change Failure Rate

The four DORA metrics are interconnected and improving the performance for one metric can degrade results for another, especially if the relationship isn’t understood. MTTR and CFR reveal information about the stability of your software and the strength of your incident response. At the same time, Deployment Frequency and Lead Time for Changes indicate how efficient your team is.

CFR is closely linked to Deployment Frequency. A balance between these two metrics is vital, but if your team over-emphasizes Deployment Frequency, developers may feel rushed to finish updates or push deployments. In such a scenario, code review and testing might also be rushed or skipped altogether. This can result in high-performing teams under the Deployment frequency metric, but it also means a high CFR percentage.

To get the full picture, you want to monitor all four DORA metrics together to understand your team’s performance over time. When your team decides to improve one DORA metric, take a calculated approach.

Improving Your Change Failure Rate

If you want to improve your CFR, you must determine what’s causing failures. Most deployments fail because of one of three reasons:

  1. Deployment Errors
  2. Poor Testing
  3. Code Quality

Beyond CFR, it’s essential to find the root cause of a failure. Once you’ve collected enough data and categorized failures as either a deployment error, poor testing, or code quality, you can begin to address the problem. Remember to have a system, like Instatus, to update your user base on any service disruptions. Status pages maintain user confidence and trust while your team resolves these issues.

1. Deployment Errors

Most failed deployments are the result of human error. The best way to reduce such failures is through deployment automation. Incorporate deployment automation tools like Jenkins or Electric Flow as they will help meet the demands of continuous integration and deployment by removing the human element.

2. Poor Testing

You should always be automating testing. Automated testing provides enormous returns for your team as it removes the need for slow and expensive manual testing. This means developers have more time to write high-quality code and work on other creative efforts. Storybook, Jest, and Postman are some great options to get started with automated testing.

3. Code Quality

To improve code quality, ensure you have a code review process in place. Junior developers should be mentored and have all their written code reviewed. This is not a punishment, and since your team is only as strong as your weakest link, make it a positive experience. Senior developers should systematically review code and ask lots of questions.

Change Failure Rate Best Practices

There are existing guidelines and strategies that organizations can adopt to minimize the risk of failure during the implementation of changes. By following these best practices, organizations can ensure smoother transition, reduced downtime, and improved overall efficiency in their operations.

1. Proper Data Collection

It's vital to thoroughly collect and tag data to ensure effective integration of the CFR system with your IT processes. Define the precise scope of the changes you wish to implement and outline the specific areas that need attention.

Additionally, establish the criteria for measuring failure and success for aspects of your CFR that may require further optimization.

2. Avoid “fix-only” deployments

Fix-only deployments" in Change Failure Rate (CFR) refer to a practice where changes are implemented solely to address issues or failures that have already occurred without proactively identifying and resolving potential underlying problems.

Excluding "fix-only" deployments from the calculation of the CFR will provide a clearer picture of the stability of your IT system, free from the influence of remediation efforts. If it's impossible to leave out “fix-only” deployments, define its number solely for remediation and do not include it in the calculation of the CFR.

3. Measure change failure, not deployment failure

Deployment failure happens when workflows, code, and updates fail to be successfully deployed into the production environment. This will indicate the quality of your Continuous Integration/Continuous Deployment (CI/CD) pipeline. It shows how well your code gets from development to production.

Whereas, CFR has a broader concept that includes not only deployment failures but also any negative impact that changes might have on the production environment. It takes into account both unsuccessful deployments and any incidents that arise from those changes.

To correctly calculate the change failure rate, you need to connect incident data with deployment data to better understand the impact your changes have on the stability of your software in the production environment.

4. Exclude External Incidents

Excluding external links provide a more accurate and meaningful measure of the success and stability of code changes. The change failure rate is a metric used to track how often code changes result in failures or issues in the production environment.

Including external links, such as those pointing to third-party services or APIs, in the calculation can skew the results and give an inaccurate representation of the actual code quality.

By excluding external links from the change failure rate calculation, the focus remains on the code changes directly developed and deployed by the team.

5. Understanding the limitations of DORA metrics

While DORA metrics can provide valuable insights, they should not be the sole measure of a team's success. It is important to note that it only serves as a starting point for understanding team performance.

For a more comprehensive view, you should consider additional performance indicators to account for the specific complexities of the IT systems and development processes.

Not limiting yourself to the DORA metrics can help you gain a better understanding of the team’s capabilities and areas for improvement. This would lead to more effective strategies for enhancing software delivery and development practices.

Final Thoughts on Change Failure Rate

Your customers expect a high-quality product, and you don’t want to disappoint. A great way to stay on top of everything by monitoring your team’s deployments through CFR.

Ideally, you want zero rollbacks or service interruptions, but the unexpected can happen no matter how good your code review, testing, and deployment automation is.

That’s why it’s important to keep users informed with status pages from Instatus, while aiming for a 15% or less CFR. Find the balance between CFR and Deployment Frequency that works for you, and success will follow.

Instatus status pages
Hey, want to get a free status page?

Get a beautiful status page that's free forever.
With unlimited team members & unlimited subscribers!

Check out Instatus

Start here
Create your status page or login

Learn more
Check help and pricing

Talk to a human
Chat with us or send an email

Statuspage vs Instatus
Compare or Switch!

Updates
Changesblog and open stats

Community
Twitter, now and affiliates

Policies·© Instatus, Inc