If you saw smoke coming from your oven or a fire on your stove, you wouldn’t just stand there and watch it get out of control, you’d jump in and stop it. The same applies to software development and DevOps. If there’s a problem, it needs to be fixed in the quickest time possible to prevent user complaints.
The real trouble with software, however, is that it can take a while before anyone becomes aware a problem exists. This is why there are systems in place like mean time to detect (MTTD).
To help you learn more, Instatus have put together this article, diving into everything you need to know about MTTD. You’ll see how it can be added to your DevOps skills stack to improve efficiencies and solve problems quicker.
Mean time to detect (MTTD), also known as mean time to discover, is a key metric used in software and IT incident management. It essentially measures the amount of time it takes for a relevant team to discover a problem and attend to it. It’s a useful KPI as it allows DevOps teams to understand how quickly bugs and errors are discovered in their software.
Additionally, MTTD doesn’t concern itself with the size of the problem or the resolution that needs to happen. The problem could be as small as a 404 or as large as a complete server failure.
As well as mean time to detect, there are also a variety of other incident management metrics tracked within DevOps teams, including:
Metrics like mean time to resolution is part of the DORA framework - a set of key DevOps metrics used by high-performing teams. DORA escapes the scope of this article, but it’s helpful to know if you work in the DevOps industry.
The key goal of any DevOps team is to keep their MTTD as low as possible. If not, other important areas of the business and customer experience could be affected.
As a DevOps team, ensuring internal processes and strategies are optimized and efficient is a key goal. This also applies to your incident management systems and how quickly you can identify a problem.
By measuring MTTD and keeping it low, you and your team can identify and resolve problems faster. Because the more MTTD data you collect, the better you’ll be able to understand how your team identifies problems. Thus, allowing you to streamline the process further, saving valuable team resources.
MTTD isn’t a metric that focuses on specific problems. But, teams can angle the metric to identify problems that can be detrimental to the business’s bottom line. This allows teams to focus on resolving issues that could result in profit loss.
MTTD not only saves money by resolving issues that impact revenue, but it also speeds up detection time in your DevOps teams, conserving resources. Less time and money is being spent finding issues, which means more resources can be used resolving issues and improving the overall product.
Not only does mean time to detect measurement help internal processes, it also helps the end user. If the issue is causing a negative experience in the live environment, customers will experience friction when using your product or service. By attending to these problems quicker, you’ll improve the customer experience exponentially.
This also can help ease the burden on your support team. By resolving front-end issues fast, they’ll receive fewer support tickets and questions from users. Couple this with a status page, and your support team should have an easier time during downtime. You can use Instatus to create stunning status pages in seconds. These are superb for keeping users informed during downtime and helps to relieve the burden on your support team. Build your first status page for free now.
If you’re only looking to add MTTD as a measurement into your software development or IT team, you’re in luck, as it’s relatively easy to do.
The table below demonstrates how this equation would work in a real-world situation. Let’s say over the course of 4 days, you identify 4 issues. One issue was spotted in 20 minutes, one in 10, another in 50, and the last in 30. By adding them all together and dividing by the total number of issues recorded, you’ll get a time of 27 minutes and 30 seconds. This is our MTTD.
Your equation would look something like this:
(20+10+50+30) / 4 = 27.5
Day | Issue Happened | Issue Spotted | Time |
Monday | 7 am | 7:20 am | 20 minutes |
Tuesday | 2:30 pm | 2:40 pm | 10 minutes |
Wednesday | 11:20 am | 12:10 pm | 50 minutes |
Thursday | 3 pm | 3:30 pm | 30 minutes |
Mean Time to Detection | 27 minutes and 30 seconds |
Of course, you can add complexities and additional ad-hoc tests following this simple analysis. Some teams might want to prune the data of any outliers where detection times were abnormally high.
Others might want to record the types of problems, so they can categorize times more specifically. This can help companies identify how quickly revenue-impacting issues are discovered and solved. Thus, preventing profit loss.
If you’re now convinced MTTD is a measurement you need to include in your DevOps or software team, it can be helpful to understand how to implement the best MTTD practicest.
If you’re going to be implementing MTTD, you need to have a clear understanding of how the relevant teams are going to detect issues. You can do this by creating a strategy, letting each team know what authority they have when approaching a problem.
This prevents teams getting stuck in limbo, wondering whether they’re the ones who should be identifying and remedying problems in the first place. This is especially important if your organization faces IT errors that vary widely in their nature.
Measuring MTTD is one thing. Improving on it is another. If you’re taking MTTD into account, you need to optimize it at every possible moment. This is especially true for DevOps teams where internal fitness is a determining factor of success.
By looking at the data over time, you can identify incidents where MTTD was incredibly low and investigate why that was the case. You can then use strategies discovered in these situations and apply them to the wider business.
Making sure your team has the right incident management tools is essential. There are several systems you should have in place to ensure relevant individuals are communicated with effectively throughout the entire process.
One cohort that should be taken into consideration is your users. If they’re experiencing a problem you haven’t identified, your support team could get bombarded with tickets. To prevent this, use Instatus to place an instant status page that lets users know what’s happening.
They’re simple, beautiful, and provide users with everything they need to know about the problems and status of the page. This reduces support resources and bolsters the user experience.
MTTD is a useful metric that informs teams how quickly bugs, errors, and other issues are being discovered. It can help software teams, especially DevOps, optimize their incident management protocols and put out fires more efficiently.
As a result, customers receive less downtime, teams waste less time fumbling around looking for problems, and businesses can make more money by saving on resources.
Of course, downtime is going to be experienced regardless of how quickly a problem is discovered. This is why it pays to have a clean and effective status page to keep your customers in the loop. Instatus provides beautiful status pages you can overlay when your team is resolving a problem, and you can create your first one free right here.
Get a beautiful status page that's free forever.
With unlimited team members & unlimited subscribers!
Start here
Create your status page or login
Learn more
Check help and pricing
Talk to a human
Chat with us or send an email
Statuspage vs Instatus
Compare or Switch!
Updates
Changes, blog and open stats
Community
Twitter, now and affiliates