Collecting failure data to calculate mean time between failures (MTBF) in order to determine accurate maintenance task intervals is wrong and should not be done. MTBF is a measure of reliability. It is a measurement of the time between two successive failure events.
Failures fall predominantly into two categories—age related and random. Typically, age related make up less than 20 percent of all failures while random make up 80 percent or more.
For age related failures, it is not MTBF, but rather useful life that is significant when attempting to determine maintenance task intervals to avoid failures. There is a point in a piece of equipment’s lifetime at which there is a rapid increase in its conditional probability of failure. The measurement between the point when the equipment is installed and the point where the conditional probability of failure begins to sharply increase is the “useful life” of the equipment. It is different than MTBF. The MTBF is defined as the average life of all the population of that item in that service.
If we want to prevent a failure from occurring, using traditional preventive maintenance, we would intervene just prior to the end of the equipment’s “useful life,” not just prior to MTBF. Incorrectly using MTBF to determine the preventive maintenance interval will result in approximately 50 percent of all failures occurring before the maintenance intervention. In addition, approximately 50 percent of the remaining components that have additional life will receive unnecessary maintenance attention—in both cases, not a very effective maintenance program. Therefore we need to use “useful life” and not MTBF when looking at age related failures and determining the frequency of preventive maintenance tasks.
Random failures make up the vast majority of failures on complex equipment as research has shown. For example, consider the failure of a component. Assume that each time the component failed we tracked the length of time it was in service. The first time the component is put into service it fails after 4 years, the second time after 6 years, and the third time after only 2 years (4 + 6 + 2 = 12/3 = 4). We know that the average lifespan of the component is 4 years (its MTBF is 4 years).
However, we do not know when the next component will fail. Therefore we cannot successfully manage this failure by traditional time-based maintenance (scheduled overhaul or replacement). It is important to know the condition of the component and the life remaining before failure; in other words, how fast can the component go from being OK to NOT OK. This is sometimes referred to as the failure development period or potential failure to functional failure (P-F) interval.
If the time from when the component initially develops signs of failure to the time when it fails is 4 months, then maintenance inspections must be performed at intervals of less than 4 months in order to catch the degradation of the component condition. The inspection also must be performed often enough to provide sufficient lead time to fix the equipment before it functionally fails. In this case, we might want to schedule the inspection every 2 months. This would ensure we catch the failure in the process of occurring and give us approximately 2 months to schedule and plan the repair.
Failure prevention requires the use of some form of condition-based maintenance at appropriate inspection intervals (failure finding, visual inspections, and predictive technology inspections).
My experience has been that for every $1 million in asset value as many as 150 condition inspection points must be monitored. Gathering and analyzing condition monitoring data to identify impending failure for assets worth billions of dollars is practically impossible without the use of reliability software.
The reliability software you choose should be able to:
• collect equipment condition data from controls, sensors, data historians, predictive maintenance technologies, and visual inspections
•use single or multiple data points to analyze the data, applying defined rules and calculations to get a true picture of equipment health
•perform the calculations and conduct the analysis automatically
• present results visually through flashing alarms and trending graphs, identifying potential failures and recommending corrective actions—before the equipment fails. MT