Mean Time Between Failures (MTBF): How to Calculate and Increase It

Mean Time Between Failures (MTBF)

Mean time between failures (MTBF) calculates the average time between failures of a piece of repairable equipment and can be used to estimate when equipment may fail unexpectedly in the future, or when it needs to be replaced.

MTBF is also used as a measure of performance, availability and reliability of systems, and to help with scheduling maintenance, inventory planning and system design.

Discover below what MTBF means, why it matters, and how to calculate, use and improve it.

The meaning of Mean Time Between Failures

Mean time between failures is the average – or mean – time that elapses from one unplanned breakdown to the next, under normal operating conditions.

It is an indication of how long a electrical or mechanical system typically operates before failing.

MTBF is generally calculated over a period of time that includes multiple failures – either multiple failures of a single asset or single failures of multiple assets of the same type – so that an arithmetic mean or average of the time between disruptions can be determined.

MTBF is most often expressed in hours and the larger the MTBF value for a system, the longer it is likely to keep working before it fails.

Mean time between failures is a metric that’s only used for repairable systems. In other words, MTBF is only relevant for machines or equipment that can be fixed and put back into operation after a failure occurs.

For non-repairable systems, the equivalent metric, Mean Time to Failure (MTTF) is used as a measure of reliability. Once a non-repairable asset fails it is considered to have reached the end of its useful life.

Why MTBF is important

Many businesses depend on a large number of inter-connected systems to create their products and deliver their services.

Knowing how reliable these systems and their components are helps businesses run more efficiently and profitably, with minimal downtime and damage.

Because MTBF is a basic measure of a system’s reliability, it can be used in a variety of important business decisions.

Reliability is defined as the absence of unplanned downtime, and MTBF measures how often a piece of equipment stops performing as expected, and so is an important measure of reliability.

Availability is related to reliability and is a measure of how much of the time a system is performing correctly, when it needs to be. MTBF can be used with Mean Time to Repair (MTTR) to calculate availability for a system.

Preventive maintenance can be scheduled more appropriately using MTBF, by aiming to complete routine maintenance before the next failure in order to prevent unplanned downtime, or as part of reliability-centred maintenance, that aims to maximise overall system reliability.

Inventory levels can be more effectively managed when tracking MTBF, which can help predict which components and systems will fail and when, increasing the chances that technicians will have the right parts on hand when required.

Safety is one benefit of MTBF that is best understood in the context of critical systems, for example an aircraft. By measuring MTBF for components, we can reduce the chances of an unexpected failure of a critical system that could endanger the lives of everyone.

Mean Time Between Failures calculation

MTBF is calculated by dividing the total time a system was running correctly by the number of failures that happened in the same period of time.

The formula to calculate Mean Time Between Failures is as follows:

MTBF   =
Total uptime
Number of failures

To calculate Meantime Between Failure, we need two specific pieces of information:

1. Total uptime – The total amount of time that the system or components were operating correctly under normal conditions. Usually measured in hours.

2. Number of failures – The total number of times that the equipment broke down unexpectedly.

Note that the calculation of MTBF does not include any repair time, inspections, or planned downtime. It only looks at correct operation time under typical conditions.

The key points in time for calculating MTBF are illustrated on the below timeline:

Meantime Between Failures Timeline

The system is switched on, runs for a while, then fails unexpectedly. Time is taken to repair it, then the system is switched back on, runs for a while, and then fails unexpectedly again.

Uptime for the purposes of MTBF is calculated as the duration from the start of uptime to the start of the next unplanned downtime.

If you are looking at more than one asset, such as during component testing by manufacturers, then you need to look at the total operating time and failures across all components.

MTBF Examples

To help you understand more clearly how to calculate Mean Time Between Failures, here’s some specific examples.

Example 1 – Medical Equipment

Let’s say you have a very expensive piece of medical equipment – such as an EKG machine – in a large hospital that’s in use 16-hours a day, 7 days a week, measuring patients’ heart signals.

Over the last 6 months (26 weeks), the EKG machine has failed five times during normal operating hours, requiring downtime of four hours on each occasion to diagnose the issue and fix it.

We have a total time of 26 weeks x 7 days x 16 hours = 2912 hours minus the downtime of 5 occasions x 4 hours = 20 hours. So our total uptime is 2892 hours with 5 failures.

MTBF   =
2892 hours
5 failures
=   578.4 hours

This means that the average time between failures of this the machine is around 578 hours, or just over 5 weeks, under typical operating conditions.

Example 2 – Industrial Machinery

In this example, we have multiple pieces of equipment across our manufacturing facility – 150 conveyor belts – that are critical to operations and run 24-hours a day, 7 days a week moving parts around the factory.

Over the last four weeks, there have been 50 different issues with individual conveyor belts, requiring a total of 200 repair hours to get them up and running again.

We have a total time of 4 weeks x 7 days x 24 hours x 150 belts = 100,800 hours minus the 200 hours of repair time = 100,600 hours of uptime, with 50 failures in total.

MTBF   =
100,600 hours
50 failures
=   2012 hours

From this, we understand that our conveyor belts have typically run for around 2012 hours on average before failing, or around 12 weeks.

In this instance, because our data was collected over 4 weeks and our MTBF is greater than this period, it may be worth collecting MTBF data over a longer period to increase the accuracy of the estimate.

MTBF vs MTTR

Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR) are closely related figures that track the performance and availability of an asset over time.

Because MTBF does not include the time taken to repair a piece of equipment after a failure, it is a measure of how long a machine typically runs correctly before failing but doesn’t reflect how long it is out of operation.

MTTR is used to measure the average time it takes to repair the system after it has failed, which measures how long the equipment is offline due to unplanned maintenance.

In simple terms, MTBF is how long things go without breaking down and MTTR is how long it takes to fix them. In practice, however, it’s not quite that simple.

By keeping MTBF high relative to MTTR, the availability of a system is maximised.

MTBF vs MTTF

Mean Time Between Failures (MTBF) and Mean Time To Failure (MTTF) are also very similar metrics, and are often confused and used interchangeably.

The key difference between MTBF and MTTF is that MTBF applies to repairable systems, while MTTF is for non-repairable equipment.

Machines (or software) that can be repaired will have multiple failures over their lifetime, and so will have periods of time between failures, whereas non-repairable items, such as light bulbs, or SSDs, will function correctly for a period of time before failing permanently, and so only have one failure in their lifetime.

This means that sometimes MTTF is also used as a measure of useful life, but it is not accurate to use MTBF as an estimate of useful life, as repairable systems will have multiple failures over their working lifetime.

MTTF is calculated in a very similar way to MTBF, except that it involves multiple assets that have failed once, in order to calculate an average estimate of how long items of that type of asset will function as expected before failing.

Using MTFB to calculate failure rate

Failure rate is defined as how often a system or piece of equipment fails unexpectedly during normal operation.

The failure rate is a frequency metric, that tells us, for a given time period, how often an asset is likely to fail.

To calculate failure rate, we simply take the inverse of MTBF:

Failure rate   =
Number of failures
Total uptime

So for our EKG machine the failure rate would be 0.0017 per hour and for our conveyor belts 0.0005 per hour.

In these cases, it might be more meaningful to express the failure rates in days or even weeks.

  • EKG machine = 0.32 per week
  • Conveyor belts = 0.08 per week

Types of Meantime Between Failures

There are three main approaches to calculating Mean Time Between Failures.

For component or system manufacturers, testing of samples can be done to create an estimate of MTBF for the given asset.

A number of the items are put into “normal” operating conditions and run until they fail, giving values for total operating time and total number of failures that can be used to calculate an MTBF.

These figures are often provided in the instruction manuals for equipment, to give owners, operators and technicians a rough measure of the reliability of the machine.

The MTBF of a system or piece of equipment can also be predicted by analysing known factors.

However, these figures can only ever be rough estimates, because they can’t take into account the actual performance of a specific asset, under real-life operating conditions.

The best way to calculate an accurate MTBF for a specific piece of equipment or software is to measure its performance in day-to-day use in an organisation.

By tracking failures and operational time, a more accurate MTBF can be developed for a piece of equipment, based on actual experience and realistic operating conditions.

How to use MTBF

Once an MTBF has been determined for a system, it is generally used in one of two ways.

Firstly, it can be used retrospectively as a measure of reliability and availability, as discussed previously.

This information can be used to measure the decrease in reliability that can occurs as an asset ages and determine when a decision is made to replace a piece of equipment.

It can also be used in calculations of operational efficiency and performance and used to identify ways to decrease costs and increase output and profits.

Secondly, MTBF can be used as a predictor of future failures.

Business owners can develop estimates using MTBF figures for the optimal times for preventive maintenance to be carried out to avoid unplanned downtime.

They can also use MTBF to “look ahead” and have the necessary parts and skills available for when unexpected failures occur.

Because this is a forward-looking approach, it can only ever be approximate, and needs to take into account all factors affecting the situation and use appropriate predictive modelling methods.

Combining MTBF-based maintenance approaches, with other strategies, such as condition-based monitoring and programmed maintenance, will help avoid costly break downs.

Mean Time Between Failures - Woman Repairing Engine

MTBF across industries

MTBF can be used in a few different ways across industries.

Much of the time, MTBF is used for tracking and quantifying the reliability of equipment, in industrial facilities and factories for both discrete manufacturing and process industries.

Anywhere that repairable equipment is used in key processes or operations can benefit from MTBF:

MTBF data may also be used as an important factor in the insurance, finance, engineering, safety and regulatory industries.

Reliability is also an important consideration during the product design process, where MTBF estimates can help improve reliability before a product is even made.

MTBF can also be used as a measure of the reliability of software systems.

Software reliability is important in many industries, including industrial, military, commercial and finance applications.

By tracking how often software fails to perform as expected under normal use, we can calculate an estimate for MTBF, and use this to improve performance.

What is a good MTBF?

The time between failures of a system or piece of equipment is dependent on a number of factors, including:

  • The specific nature or configuration of the assets
  • The environment or conditions they’re operating in
  • The age of the equipment
  • Previous maintenance or failures
  • External factors that are not predictable or controllable

This means that there is no such thing as a “good” MTBF value.

Instead, what we need to focus on is calculating MTBF for our specific equipment or systems, to begin to develop an estimate of reliability.

Some manufacturers may provide estimates for MTBF in the documentation or specifications for their products, and these provide a good but very rough starting place for estimating MTBF.

You might also be able to glean a starting point for an MTBF from industry standards and other similar machines and businesses.

Over time, as a piece of repairable equipment operates, a business can collect data on its normal operational time and the number of failures to build up a picture of its reliability.

This data can then be used to assess when maintenance or replacement is required and to improve the overall performance of the system, by focusing on improving MTBF.

How to Increase MTBF

As mentioned, MTBF is a measure of reliability, and the more reliable our systems are, the more efficiently a business can operate.

By looking at the elements that contribute to the definition of meantime between failures, we can see how to increase MTBF – either we can reduce the number of failures or we can increase the total time the asset spends operating correctly.

1. Analyze the underlying cause of failures

Each time a piece of equipment occurs is a perfect opportunity to step back and look for any underlying causes of the failure that you can address.

Sure, it might have “just been” a worn out part or a random occurrence, but take the time to look for systemic issues that might have contributed to the failure, that you can address.

By digging deeper into the causes of failures, you can implement long-term solutions that may flow on to improve quality and performance across the entire business.

This increase in quality will help machines to keep operating for longer, increasing overall MTBF.

2. Utilize condition-based maintenance

A condition-based maintenance approach monitors the state of your machines and can provide early warning of impending failures.

By detecting changes in system performance or operation early, you can schedule maintenance at a convenient time and repair problems before they turn into unplanned downtime or cause collateral damage to the whole system.

This reduction in the number of failures will increase your overall MTBF.

3. Improve preventive maintenance

The time taken to repair a piece of equipment (the MTTR) might seem like a minor element in the calculation of MTTR, but the more you can reduce MTTR, the more your MTBF will improve.

By decreasing the amount of time that your systems are offline, you are increasing their overall availability and maximising your MTBF.

Drawbacks of MTBF

Although MTBF is a valuable metric to track that can provide important information about the performance of a system, there are a few issues to be aware of.

1. It’s only a statistic

MTBF can only ever be a statistical measurement, representing an average value of events that occurred in the past.

It can only provide an estimate of the likelihood of future failures, and only when used with appropriate statistical models.

Using MTBF to make predictions for a specific device has limited accuracy, and so is better used to estimate how many spares are needed to support a given number of assets, rather than to predict when a specific asset will fail.

2. It’s not a failure estimate

Even though MTBF measures the average time between failures, it is not actually an accurate estimate of how long it will be before the machine fails again.

Some also believe that it’s a measure of the point in time where the chance of a machine failing is equal to the chance of it not failing, on average, but again this is not true.

In fact, modelling using the “bathtub reliability curve” shows that the probability of an asset that has just failed lasting for a full period equal to its MTBF is just 37%.

In other words, the likelihood that a specific piece of equipment actually runs for the MTBF before failing is just 37%.

3. It can be distorted

As a statistic, it’s also important to collect enough data to ensure the accuracy of the calculation, as short time periods or few failures may lead to distorted and inaccurate MTBF figures.

The calculation of MTBF can also be skewed by biased selection of time periods or assets. For example, you could increase MTBF by starting your measurement shortly after a failure and ending just before a recent failure, but would it be accurate?

To avoid this potential corruption of MTBF, it’s important to have agreed standards in place for the process for measuring and calculating MTBF in a consistent and meaningful way.

Get clear on your definitions of “failure” and “operating time” and which components are included in the system to ensure your MTBF value is meaningful.

4. It doesn’t measure useful life

Some people get confused and think that MTBF is actually a measure of useful life.

Useful life is the period of time that follows initial machine deployment until it begins wearing out, whereas MTBF is a measure of the average time between failures.

For a repairable machine, it may fail multiple times within its useful life, so the two metrics are not generally closely related.

5. It’s not enough

Although it may be tempting to make MTBF the core of your maintenance metrics, it’s not enough to be meaningful on its own.

To ensure an appropriate, effective approach to asset management, it’s best combined with other techniques, such as condition-based maintenance and predictive maintenance, along with other metrics, such as mean time to repair, planned maintenance percentage and overall equipment effectiveness.

Mean Time Between Failures In Your Organisation

A primary goal for all businesses is to maximise output and minimise downtime and mean time between failures is a useful metric to assess the reliability of the systems that support your operations.

By tracking MTBF, you can keep a handle on unplanned breakdowns in your facilities, and work towards improving overall reliability, leading to higher quality products and services and increased resilience in your business.

And although it’s not sufficient on its own, MTBF provides an effective way to help your team focus on increasing the operational time of your assets.

Talk to us today about how NextService can help your business refine your field service operations to improve the MTBF of assets.

And download the NextService product guide below.

Download Brochure Here