New regulatory issues are fixing responsibility for equipment failure that results in loss of life. RCM analysis can help mitigate the risk. But streamlined methods fall short. Here is a look at the issues.
Reliability-centered maintenance (RCM) is a process used to determine what must be done to ensure that any physical asset or system continues to do whatever its users want it to do.
This process finds its roots in work done by the international commercial aviation industry. Driven by the need to improve reliability while containing the cost of maintenance, this industry developed a comprehensive process for deciding what maintenance work is needed to keep aircraft airborne. This process evolved steadily since its early beginnings in 1960. The early history is outlined in the section “Historical Overview.”
The SAE RCM Standard
Various derivatives of Nowlan and Heap’s original aviation-oriented RCM process have emerged since their report was published in 1978. Many of these derivatives retain the key elements of the original process. However, the widespread use of the term “RCM” led to the emergence of a number of processes that differ significantly from the original, but that their proponents also call RCM.
Many of these other processes either omit key steps of the process described by Nowlan and Heap, or change their sequence, or both. Consequently, despite claims to the contrary made by the proponents of these processes, the output differs markedly from what would be obtained by conducting a full, rigorous RCM analysis.
A growing awareness of these differences led to an increasing demand for a standard that set out the criteria any process must comply with in order to be called RCM. Such a standard was published by the Society of Automotive Engineers (SAE) in 1999 as Standard JA1011 Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes (Ref. 3). The evolution of the standard is outlined by Dana Netherton, chairman of the SAE RCM Committee, in the section “The Need for an RCM Standard.”
The elements of a true RCM process listed in the standard are presented here in the section “Key Attributes of Any RCM Process.” Subsequent sections of the standard list the issues that any true RCM process must address in order to answer each of the seven attribute questions “satisfactorily.”
According to the standard, “Reliability-Centered Maintenance (RCM)–Any RCM process shall ensure that all of the following seven questions are answered satisfactorily and are answered in the sequence shown below [emphasis added].” This means that if a process does not answer all the questions in the sequence shown (and which does not answer them satisfactorily in compliance with the rest of the standard), then that process is not RCM.
None of the streamlined processes comply fully with the requirements of section 5 of the SAE Standard. The implications of this point are discussed in more detail later.
Society has reacted to equipment failure and accidents producing serious consequences by enacting laws seeking to call individuals and corporations to account. An overview is presented in the section “Worldwide Regulatory Issues.”
Under these circumstances, everyone involved in the management of physical assets needs to take greater care than ever to ensure that every step taken in executing his official duties is beyond reproach. RCM processes that meet the SAE Standard provide a basis for prudent, responsible custodianship of physical assets.
The author and his associates have helped companies to apply true RCM on more than 1200 sites spanning 41 countries and nearly every form of organized human endeavor. We have found that when true RCM has been correctly applied by well-trained individuals working on clearly defined and properly managed projects, the analyses have usually paid for themselves in between two weeks and two months. This is a very rapid payback indeed.
However, despite this rapid payback, some individuals and organizations have expended a great deal of energy on attempts to reduce the time and resources needed to apply the RCM process. The results of these attempts are generally known as “streamlined” RCM techniques.
The main features of some of the most widely touted streamlined approaches to RCM are outlined in the following sections. In all cases, the proponents of these techniques claim that their principal advantage is that they achieve similar results to something which they call “classical” RCM, but that they do so in much less time and at much lower cost. However, not only is this claim questionable, but all of the streamlined techniques have other drawbacks, some quite serious. These drawbacks are also highlighted in the following discussion of the various streamlining methods: retroactive approaches, use of generic analyses, use of generic lists of failure modes, skipping elements of the process, analysis of only certain functions or failures, and analysis of only certain equipment.
The most popular method of streamlining the RCM process starts not by defining the functions of the asset (as specified in the SAE Standard), but starts with the existing maintenance tasks. Users of this approach try to identify the failure mode that each task is supposed to be preventing, and then work forward again through the last three steps of the RCM decision process to re-examine the consequences of each failure and (hopefully) to identify a more cost-effective failure management policy. (This approach is what is most often meant when the term “streamlined RCM” (Ref. 10) is used. It is also known as “backfit RCM” (Ref. 11) or “RCM in reverse.”)
Retroactive approaches are superficially very appealing, so much so that the author tried them himself on numerous occasions when he was new to RCM. However, in reality they are also among the most dangerous of the streamlined methodologies, for the following reasons:
- Retroactive approaches assume that existing maintenance programs cover just about all the failure modes that are reasonably likely to require some sort of preventive maintenance (PM). In the case of every maintenance program that I have encountered to date, this assumption is simply not valid. If RCM is applied correctly, it transpires that nowhere near all of the failure modes that actually require PM are covered by existing maintenance tasks. As a result, a considerable number of tasks have to be added. Most of the tasks that are added apply to protective devices, as discussed below. (Other tasks are eliminated because they are found to be unnecessary, or the type of task is changed, or the frequency is changed. The net effect is usually an overall reduction in perceived PM workloads, typically by between 40 and 70 percent.)
- When applying retroactive approaches, it is often very difficult to identify exactly what failure cause motivated the selection of a particular task, so much so that either inordinate amounts of time are wasted trying to establish the real connection, or sweeping assumptions are made that very often prove to be wrong. These two problems alone make this approach an extremely shaky foundation upon which to build a maintenance program.
- In reassessing the consequences of each failure mode, it is still necessary to ask whether “the loss of function caused by the failure mode will become evident to the operating crew under normal circumstances.” This question can only be answered by establishing what function is actually lost when the failure occurs. This in turn means that the people doing the analysis have to start identifying functions anyway, but they are now trying to do so on an ad hoc basis halfway through the analysis (and they are not usually trained in how to identify functions correctly in the first place because this approach usually considers the function identification step to be unnecessary). If they do not, they start making even more sweeping—and hence often incorrect—assumptions that add to the shakiness of the results.
- Retroactive approaches are particularly weak on specifying appropriate maintenance for protective devices. As stated by the author in his book Reliability-Centered Maintenance (Ref. 12), “at the time of writing, many existing maintenance programs provide for fewer than one third of protective devices to receive any attention at all (and then usually at inappropriate intervals). The people who operate and maintain the plant covered by these programs are aware that another third of these devices exist but pay them no attention, while it is not unusual to find that no one even knows that the final third exist. This lack of awareness and attention means that most of the protective devices in industry—our last line of protection when things go wrong—are maintained poorly or not at all.” So if one uses a retroactive approach to RCM, in most cases a great many protective devices will continue to receive no attention in the future because no tasks were specified for them in the past.
- Given the enormity of the risks associated with unmaintained protective devices, this weakness of retroactive RCM alone makes it completely indefensible. (Some variants of this approach try to get around this problem by specifying that protective systems should be analyzed separately, often outside the RCM framework. This gives rise to the absurd situation that two analytical processes have to be applied in order to compensate for the deficiencies created by attempts to streamline one of them.)
- More so than any of the other streamlined versions of RCM, retroactive approaches focus on maintenance workload reduction rather than plant performance improvement (which is the primary goal of function-oriented true RCM). Since the returns generated by using RCM purely as a tool to reduce maintenance costs are usually lower—sometimes one or two orders of magnitude lower—than the returns generated by using it to improve reliability, the use of the ostensibly cheaper retroactive approach becomes self defeating on economic grounds, in that it virtually guarantees much lower returns than true RCM.
Use of generic analyses
A fairly widely used shortcut in the application of RCM entails applying an analysis performed on one system to technically identical systems. In fact, one or two organizations even sell such generic analyses, on the grounds that it is cheaper to buy an analysis that has already been performed by someone else than it is to perform your own. The following paragraphs explain why generic analyses should be treated with great caution.
- Operating context. In reality, technically identical systems often require completely different maintenance programs if the operating context is different.
- For example, consider three pumps A, B, and C that are technically identical (same make, model, drives, pipework, valvegear, switchgear, and pumping the same liquid against the same head). The generic mindset suggests that a maintenance program developed for one pump should apply to the other two.
- However, pump A stands alone, so if it fails, operations will be affected sooner or later. As a result, the users and/or maintainers of pump A are likely to make some effort to anticipate or prevent its failure. (How hard they try will be governed both by the effect on operations and by the severity and frequency of the failures of the pump.)
- However, if pump B fails, the operators simply switch to pump C, so the only consequence of the failure of pump B is that it must be repaired. As a result, it is likely that the operators of B would at least consider letting it run to failure (especially if the failure of B does not cause significant secondary damage).
- On the other hand, if pump C fails while pump B is still working (for instance if someone cannibalizes a part from C), it is likely that the operators will not even know that C has failed unless or until B also fails. To guard against this possibility, a sensible maintenance strategy might be to run C from time to time to find out whether it has failed.
This example shows how three identical assets can have three totally different maintenance policies because the operating context is different in each case. In the case of the pumps, a generic program would only have specified one policy for all three pumps.
Apart from redundancy, many other factors affect the operating context and hence affect the maintenance programs that could be applied to technically identical assets. These include whether the asset is part of a peak load or base load operation, cyclic fluctuations in market demand and/or raw material supplies, the availability of spares, quality and other performance standards that apply to the asset, the skills of the operators and maintainers, and so on.
- Maintenance tasks. Different organizations—or even different parts of the same organization—seldom employ people with identical skill sets. This means that people working on one asset may prefer to use one type of proactive technology (say high-tech condition monitoring), while another group working on an identical asset may be more comfortable using another (say a combination of performance monitoring and the human senses).
It is surprising how often this difference does not matter, as long as the techniques chosen are cost-effective. In fact, many maintenance organizations are starting to realize that there is often more to be gained from ensuring that the people doing the work are comfortable with what they are doing than it is to compel everyone to do the same thing. (The validity of different tasks is also affected by the operating context of each asset. For instance, think how background noise levels affect checks for noise.)
Because generic analyses necessarily incorporate a “one size fits all” approach to maintenance tasks, they do not cater to these differences and hence have a significantly reduced chance of acceptance by the people who have to do the tasks.
These two points mean that special care must be taken to ensure that the operating context, functions and desired standards of performance, failure modes, failure consequences, and the skills of the operators and maintainers are all effectively identical before applying a maintenance policy designed for one asset to another. They also mean that an RCM analysis performed on one system should never be applied to another without any further thought just because the two systems happen to be technically identical.
Use of generic lists of failure modes
Generic lists of failure modes are lists of failure modes—or sometimes entire FMEAs—prepared by third parties. They may cover entire systems, but more often cover individual assets or even single components. These generic lists are touted as another method of speeding up or “streamlining” this part of the maintenance program development process. In fact, they should also be approached with great caution, for all the reasons discussed in the previous section of this article, and for the following additional reasons:
- The level of analysis may be inappropriate. It is possible to “drill down” almost any number of levels when seeking to identify failure modes (or causes of failure). The point at which this process should stop is the level at which it is possible to identify an appropriate failure management policy, and this can vary enormously depending once again on the operating context of the system. In other words, when establishing causes of failure for technically identical assets, it may be appropriate in one context to ask “why” it fails once, and in another it may be necessary to ask “why” seven or eight times.
However, if a generic list is used, this decision will already have been made in advance of the RCM analysis. For instance, all the failure modes in the generic list may have been identified as a result of asking “why” four or five times, when all that may be needed is level 1. This means that far from streamlining the process, the generic list would condemn the user to analyzing far more failure modes than necessary.
Conversely, the generic list may focus on level 3 or 4 in a situation where some of the failure modes really ought to be analyzed at level 5 or 6. This would result in an analysis that is too superficial and possibly dangerous.
- The operating context may be different. The operating context of your asset may have features which make it susceptible to failure modes that do not appear in the generic list. Conversely, some of the modes in the generic list might be extremely improbable (if not impossible) in your context.
- Performance standards may differ. Your asset may operate to standards of performance which mean that your whole definition of failure may be completely different from that used to develop the generic FMEA.
These three points mean that if a generic list of failure modes is used at all, it should only ever be used to supplement a context-specific FMEA, and never used on its own as a definitive list.
Skipping elements of the process
Another common way in which the RCM process is “streamlined” is by skipping various elements of the process altogether. The step most often omitted is the definition of functions. Proponents of this methodology start immediately by listing the failure modes that might affect each asset, rather than by defining the functions of the asset under consideration.
They do so either because they claim that, especially in the case of a “non-safety-critical” plant, identifying functions does not contribute enough relative to the amount of time it takes (Ref. 13), or because they simply appear not to be aware that defining all the functions and the associated desired standards of performance of the assets under review is an integral part of the RCM process (Ref. 14).
In fact, it is generally accepted by all the proponents of true RCM that in terms of improved plant performance, by far the greatest benefits of true RCM flow from the extent to which the function definition step transforms general levels of understanding of how the equipment is supposed to work. So cutting out this step costs far more in terms of benefits foregone than it saves in reduced analysis time.
From a purely technical point of view, the identification of functions and associated desired levels of performance also makes it far easier to identify the surprisingly common situations (failure modes) where the asset is simply incapable of doing what the user wants it to do, and therefore fails too soon or too often. For this reason, eliminating the function definition step further reduces the power of the process.
The comments in the second bullet in the previous “retroactive approaches” section also apply here.
Analyze only “critical” functions or “critical” failures
The SAE Standard stipulates among other things that a true RCM analysis should define all functions, and that all reasonably likely failure modes should be subjected to the formal consequence evaluation and task selection steps.
The shortcuts embodied in some of the streamlined RCM processes try to analyze “critical” functions only, or to subject only “critical” failure modes to detailed analysis. These approaches have two main flaws:
- The process of dismissing functions and/or failure modes as being “non-critical” necessarily entails making assumptions about what a more detailed analysis might reveal. In the personal experience of the author, such assumptions are frequently wrong. It is surprising how often apparently innocuous functions or failure modes are found on closer examination to embody elements that are highly critical in terms of safety and/or environmental integrity. As a result, the practice of prematurely dismissing functions or failure modes results in much riskier analyses, but because the analysis is incomplete, no one knows where or what these risks are.
- Many of the streamlined processes that adopt this approach incorporate elaborate additional steps designed to “help” identify what functions and/or failure modes are critical or noncritical. In a great many cases, applying these additional steps takes longer and costs more than it would take to conduct a rigorous analysis of every function and every reasonably likely failure mode using true RCM, yet the output is considerably less robust.
Analyze only “critical” equipment
An approach to maintenance strategy formulation that is often presented as a streamlined form of RCM suggests that the RCM process should be applied to “critical” equipment only. This issue does not fall within the ambit of the SAE Standard, because the standard does not deal with the selection of equipment for analysis. It defines RCM as a process that can be applied to any asset, and it assumes that decisions about what equipment is to be analyzed and about system boundaries have already been made when the time comes to apply the RCM process defined in the standard. There were two reasons why the equipment selection process was omitted from the standard:
- Different industries use widely differing criteria to judge what is critical. For instance, the ability of assets to produce products within given quality limits is a major issue in manufacturing operations, and hence features prominently in assessments of criticality. However, this issue barely figures at all with respect to equipment used by military undertakings. This means that there is an equally wide range of techniques used to assess criticality—so wide that it is impossible to encompass this issue in one universal standard.
- There is a growing school of thought (with which the author has some sympathy) that there is no such thing as an item of plant—at least in an industrial context—that is noncritical or nonsignificant to the extent that it does not justify analysis using RCM. Two of the main reasons for believing that systems or items of plant should not be dismissed as noncritical prior to rigorous analysis are exactly the same as the reasons given in the previous section about critical functions or critical failure for not dismissing functions and failure modes in the same way. (In fact, many organizations that choose to start with a formal, across-the-board equipment criticality assessment seem to spend as much time deciding what assessment methodology they will use and then applying it as they would have spent using true RCM to analyze all the equipment in their facility.)
There is a great deal more that could be said both in favor of and against the idea of using equipment criticality assessments as a means of deciding whether to perform rigorous analyses using techniques such as RCM. However, since criticality assessment techniques are not an integral part of the RCM process, they will not be discussed here. Suffice it to say that it is incorrect to present such techniques as streamlined forms of RCM because they do not form part of the RCM process as defined by the SAE Standard.
Is streamlined RCM worth the risk?
In nearly all cases, the proponents of the streamlined approaches to RCM outlined previously claim that these approaches can produce much the same results as true RCM in about a half to a third of the time. However, the above discussion indicates that not only do they not produce the same results as true RCM, but that they contain logical or procedural flaws which increase risk to an extent that overwhelms any small advantage they might offer in reduced application costs. See “True RCM is Faster.”
It also transpires that many of these streamlined techniques actually take longer and cost more to apply than true RCM, so even this small advantage is lost. As a result, the business case for applying streamlined RCM is suspect at best.
However, a rather more serious point needs to be borne in mind when considering these techniques. The very word “streamline” suggests that something is being omitted, and as has been indicated, this is indeed so for the streamlined techniques described. In other words, there is to a greater or lesser extent a degree of suboptimization embodied in all of these techniques.
Leaving things out inevitably increases risk. More specifically, it increases the probability that an unanticipated failure, possibly one with very serious consequences, could occur. If this does happen, as suggested in the section on regulatory issues, managers of the organization involved are increasingly likely to find themselves called personally to account. If worse comes to worst, they will have to explain, often in an emotionally charged courtroom confronted by bitterly hostile legal Rottweilers, what went wrong and why.
They will also have to explain why they deliberately chose a suboptimal decision-making process to establish their asset management strategies in the first place, rather than using one that complies fully with a standard set by an internationally recognized standards-setting organization. It would not be me that they would have to convince, not their peers and not their managers, but a judge and jury.
One rationale often advanced for using the streamlined methods is that it is better to do something than to do nothing. However, this rationale misses the point that all the analytical processes described above, streamlined or otherwise, require their users to document the analyses. This means that a clear audit trail exists showing all the key information and decisions underlying the asset management strategy, in most cases where no such audit trail has existed before. If a suboptimal approach is used to formulate these strategies, the existence of written records makes every shortcut much clearer to any investigators than they would otherwise have been. (This in turn may suggest that perhaps we should simply forget about all of these formal analytical processes. Unfortunately, the demand for documented analyses embodied in the second wave of safety legislation described in the section “`Worldwide Regulatory Issues” does not allow us this option.)
A further rationale for streamlining says something like “we have been using this approach for a few years now and we haven’t had any accidents, so it must be all right.” This rationale betrays a complete misunderstanding of the basic principles of risk. Specifically, no analytical methodology can completely eliminate risk.
However, the difference between using a more rigorous methodology and a less rigorous methodology may be the difference between a probability of a catastrophic event of 1 in 1,000,000 versus 1 in 10,000. In both cases, the event may happen next year or it may not happen for thousands of years, but in the second case, it is a hundred times more likely. If such an event were to happen, the user of true RCM would be able to claim that he or she exercised prudent, responsible custodianship by applying a rigorous process that complies with an internationally recognized standard, and as such would be in a highly defensible position. Under the same circumstances, the user of streamlined RCM is on much, much shakier ground. MT
This version of the article includes biographic references. Otherwise, it is the same as that published in Maintenance Technology magazine.
1 Nowlan FS and Heap H: “Reliability-centered Maintenance”. Springfield, Virginia. National Technical Information Service, United States Department of Commerce
2 Maintenance Steering Group – 3 Task Force: “Maintenance Program Development Document MSG-3″. Washington DC: Air Transport Association (ATA) of America. 1993
3 International Society of Automotive Engineers: “JA1011 – Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes”. Warrendale, Pennsylvania, USA: SAE Publications
4 Netherton D: “SAE’s New Standard for RCM”. Maintenance (UK) 15 (1) 3 – 7, 2000
5 US Naval Air Systems Command: “NAVAIR 00-25-403: Guidelines for the Naval Aviation Reliability Centered Main-tenance Process”. Philadelphia, Pennsylvania. US Department of Defense Publications
6 RCM Implementation Team, Royal Navy: “NES 45 Naval Engineering Standard 45, Requirements for the Application of Reliability-Centred Maintenance Techniques to HM Ships, Royal Fleet Auxiliaries and other Naval Auxiliary Ves-sels”. Foxhill, Bath, United Kingdom. UK Ministry of Defence Publications
7 UK Health & Safety Executive: “Train Accident at Ladbroke Grove Junction, 5 October 1999″: Third HSE Interim Report”.
8 Bartram P: “What Price a Life?” Financial Director (UK), 2 August 2000
9 Various: “The Longford Royal Commission”: www.theage.com.au/special/gas/index.html
10 Bookless C & Sharkey M: “Streamlined RCM in the Nuclear Industry”. Maintenance (UK) 14 (1) 27 – 30, 2000
11 Jacobs KS: “Reducing Maintenance Workload Through Reliability-Centered Maintenance Processes”: ASNE Fleet Maintenance Symposium. October 1997. San Diego, California
12 Moubray JM: “Reliability-centered Maintenance”: New York, New York USA: Industrial Press
13 Dixey M & Gallimore J: “Fast Track RCM – Getting Results from RCM”. Maintenance (UK) 15 (1) 2000 8 – 11
14 Mundy S D: “Completing the Reliability Centered Maintenance Loop at a New Process Facility”. Reliability (USA) 7 (3) 30 – 33, 2000
Reliability-centered maintenance (RCM), a process used to determine what must be done to ensure that any physical asset or system continues to do whatever its users want it to do, finds its roots in work done by the international commercial aviation industry. Driven by the need to improve reliability while containing the cost of maintenance, the aviation industry developed a comprehensive process for deciding what maintenance work is needed to keep aircraft airborne. This process evolved steadily since its early beginnings in 1960.
In 1978, the report “Reliability-Centered Maintenance” (Ref. 1) was prepared for the U.S. Department of Defense by F. Stanley Nowlan and Howard Heap of United Airlines. It described the then current state of the process and formed the basis of the maintenance strategy formulation process called MSG3 (Ref. 2) after the document produced by the Maintenance Steering Group of the Air Transport Association of America. MSG3 was first promulgated in 1980, and in slightly modified form, it is used to this day by the international commercial aviation industry. In the early 1980s, RCM as described by Nowlan and Heap also began to be used in industries other than aviation.
It soon became apparent that no other comparable technique exists for identifying the true, safe minimum of what must be done to preserve the functions of physical assets. As a result, RCM has now been used by thousands of organizations spanning nearly every major field of organized human endeavor. It is becoming as fundamental to the practice of physical asset management as double-entry bookkeeping is to financial asset management.
The growing popularity of RCM has led to the development of numerous derivatives. Some of these derivatives are refinements and enhancements of Nowlan and Heap’s original RCM process. However, less rigorous derivatives have also emerged, most of which are attempts to “streamline” the maintenance strategy formulation process.
The evolution of the SAE Standard JA1011 Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes was described by Dana Netherton, chairman of the SAE RCM Committee, in an article “SAE’s New Standard for RCM” (Ref. 4) in March 7, 2000 issue of Maintenance (U.K.), as follows.
Since the early 1990s, a great many organizations have developed variations of the RCM process. Some, such as the U.S. Naval Air Command with its Guidelines for the Naval Aviation Reliability Centered Maintenance Process (NAVAIR 00-25-403) (Ref. 5) and the British Royal Navy with its RCM-oriented Naval Engineering Standard (NES45) (Ref. 6), have remained true to the process originally expounded by Nowlan and Heap. However, as the RCM bandwagon has started rolling, a whole new collection of processes has emerged that are called “RCM” by their proponents, but that often bear little or no resemblance to the original meticulously researched, highly structured, and thoroughly proven process developed by Nowlan and Heap. As a result, if an organization said that it wanted help in using or learning how to use RCM, it could not be sure what process would be offered.
Indeed, when the U.S. Navy recently asked for equipment vendors to use RCM when building a new ship class, one U.S. company offered a process closely related to the 1970 MSG-2 process. It defended its offering by noting that its process used a decision-logic diagram. Since RCM also uses a decision-logic diagram, the company argued, its process was an RCM process.
The U.S. Navy had no answer to this argument, because in 1994 William Perry, the U.S. Secretary of Defense, had established a new policy about U.S. military standards and specifications, which said that the U.S. military would no longer require industrial vendors to use the military’s standard or specific processes. Instead it would set performance requirements, and would allow vendors to use any processes that would provide equipment that would meet these requirements.
The policy voided the U.S. military standards and specifications that defined “RCM.” The U.S. Air Force standard was cancelled in 1995. The U.S. Navy has been unable to invoke its standards and specifications with equipment vendors (though it continues to use them for its internal work) and it was unable to invoke them with the U.S. company that wished to use MSG-2.
This development happened to coincide with the interest in RCM in the industrial world. During the 1990s, magazines and conferences devoted to equipment maintenance have multiplied, and magazine articles and conference papers about RCM became more and more numerous. These have shown that very different processes are being given the same name, “RCM.” So both the US military and commercial industry saw a need to define what an RCM process is.
In his 1994 memorandum, Perry said, “I encourage the Under Secretary of Defense (Acquisition and Technology) to form partnerships with industry associations to develop nongovernment standards for replacement of military standards where practicable.” The Technical Standards Board of the Society of Automotive Engineers (SAE) has had a long and close relationship with the standards community in the U.S. military, and has been working for several years to help develop commercial standards to replace military standards and specifications, when needed and when none existed.
So in 1996 the SAE began working on an RCM-related standard, when it invited a group of representatives from the U.S. Navy aviation and ship RCM communities to help it develop a standard for Scheduled Maintenance Programs. These U.S. Navy representatives had already been meeting for about a year in an effort to develop a U.S. Navy RCM process that might be common between the aviation and ship communities, so they had already done a considerable amount of work when they began to meet under SAE sponsorship. In late 1997, having gained members from commercial industry, the group realized that it was better to focus entirely on RCM. In 1998, the group found the best approach for its standard, and in 1999 it completed its draft of the standard, and the SAE approved it and published it.
After a brief discussion about the practical difficulties associated with attempting to develop a universal standard of this nature, Netherton went on to say:
The standard now approved by the SAE does not present a standard process. Its title is, “Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes (SAE JA1011).” This standard presents criteria against which a process may be compared. If the process meets the criteria, it may confidently be called an “RCM process.” If it does not, it should not. (This does not necessarily mean that processes that do not comply with the SAE RCM standard are not valid processes for maintenance strategy formulation. It simply means that the term “RCM” should not be applied to them.)
Netherton then quoted Section 5 of the standard published here in the section “Key Attributes of Any RCM Process.”
Section 5 of SAE Standard JA1011 Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes summarizes the key attributes of any RCM process as follows:
Reliability-Centered Maintenance (RCM)–Any RCM process shall ensure that all of the following seven questions are answered satisfactorily and are answered in the sequence shown below:
a. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?
b. In what ways can it fail to fulfill its functions (functional failures)?
c. What causes each functional failure (failure modes)?
d. What happens when each failure occurs (failure effects)?
e. In what way does each failure matter (failure consequences)?
f. What should be done to predict or prevent each failure (proactive tasks and task intervals)?
g. What should be done if a suitable proactive task cannot be found (default actions)?
To answer each of the above questions “satisfactorily,” the following information shall be gathered, and the following decisions shall be made. All information and decisions shall be documented in a way which makes the information and the decisions fully available to and acceptable to the owner or user of the asset.
The reaction of society as a whole to equipment failures is an aspect of physical asset management that is changing at warp speed.
The changes began with sweeping legislation governing industrial safety, mainly in the 1970s. Among the best known examples of such legislation are the Occupational Safety and Health Act of 1970 in the United States and the Health and Safety at Work Act of 1974 in the United Kingdom. Laws have been passed in nearly all major industrialized countries. Their intent is to ensure that employers provide a generally safe working environment.
These Acts were followed by a series of more specific safety-oriented laws such as OSHA 1910.119: “Process Safety Management of Highly Hazardous Chemicals” in the United States and the “Control of Substances Hazardous to Health Regulations” in the United Kingdom. Both of these regulations were first promulgated in the early to mid-1990s. They are noteworthy examples of a then-new requirement for the users of hazardous materials to perform formal analyses or assessments of the associated systems, and to document the analyses for subsequent inspection if necessary by regulators.
These two sets of developments represent a steady increase in legal requirements to exercise—and to be able to demonstrate that we are exercising—responsible custodianship of the assets under our control. They reflect the steadily rising expectations of society in terms of industrial safety and we have no choice but to comply as best we can.
The late 1990s have seen even more changes, this time concerning the sanctions that society now wishes to impose if things go wrong. Until the mid 1990s, if a failure occurred whose consequences were serious enough to warrant criminal proceedings, these proceedings usually ended at worst with a substantial fine imposed on the organization found to be at fault, and the matter—at least from the criminal point of view—usually ended there. (Occasionally, the organization’s permit to operate was withdrawn, as in the case of the ValuJet airline after the crash in Florida on May 11, 1996. This effectively put the airline out of business in its then-current form.)
However, following recent disasters, a movement is now developing not only to punish the organizations concerned, but also to impose criminal sanctions on individual managers. In other words, under certain circumstances, individual managers can be sent to prison in connection with equipment failures that have sufficiently nasty consequences.
For instance, in the United Kingdom, John Prescott, the minister of transport, has stated that in light of the official inquiry into the Paddington rail crash (Ref. 7) that occurred in 1999, he will introduce a law for a crime to be called “corporate killing,” part of which will entail prison sentences for specific executives (Ref. 8). In the United States, following the outcry about the accidents involving tire tread separation on SUVs, section 30170 of the “Motor Vehicle and Motor Vehicle Defect Notification Act” was revised in October 2000 to include prison sentences of up to 15 years for “directors, officers or agents” of vehicle manufacturers who commit specified offenses in connection with vehicles that fail in a way that causes death or bodily injury.
There is considerable controversy about the reasonableness of these initiatives, and even some doubt about their ultimate enforceability. However, from the point of view of people involved in the management of physical assets, the issue is not what is reasonable, but that we are increasingly being held personally accountable for actions that we take on behalf of our employers. Not only that, but if we are called to account in the event of a serious incident, it will be in circumstances that could culminate in jail sentences.
Perhaps the most startling legislative developments of all were triggered by an industrial accident that occurred in Australia. Following the Longford gas plant explosion (Ref. 9) in September 1998 in the state of Victoria, the Victorian State Parliament on November 13, 1998 added a new section to the State of Victoria Evidence Act of 1958 which reads as follows:
19D. Legal professional privilege
(1) Despite anything to the contrary in this Division, if a person is required by a commission to answer a question or produce a document or thing, the person is not excused from complying with the requirement on the ground that the answer to the question would disclose, or the document contains, or the thing discloses, matter in respect of which the person could claim legal professional privilege.
(2) The commissioner may require the person to comply with the requirement at a hearing of the commission from which the public, or specified persons, are excluded in accordance with section 19B.
In essence, this amendment suspended attorney/client confidentiality for the purposes of the Longford—and subsequent—official inquiries.
Not only this, but the state governments of Victoria and Queensland are considering legislation to deal with “Industrial Manslaughter (Vic)” and “Corporate Culpability (Qld),” as both governments believe that their current legislation does not deal adequately with industrial incidents causing death or serious injury. Victoria is leading the way after the Longford incident. These proposed laws go further than the laws in the U.K. and the U.S., in that the concept of “aggregation of negligence” is introduced. This allows the aggregation of actions and omissions of a group of employees and managers to establish that an organization is negligent. Both governments have made it clear that if managers and/or a management system fails to prevent workplace death or serious injury, then the responsible manager and/or management team is likely to face criminal prosecution. If the legislation proceeds, penalties of over $500,000 and 7 years imprisonment are proposed.
The message to us all is that society is getting so sick of industrial accidents with serious consequences that not only is it seeking to call individuals as well as corporations to account, but that it is prepared to alter well-established principles of jurisprudence to do so. Under these circumstances, everyone involved in the management of physical assets needs to take greater care than ever to ensure that every step they take in executing their official duties is beyond reproach. It is becoming professionally suicidal to do otherwise.
An interesting footnote to the debate about streamlined RCM concerns what exactly it is that is ostensibly being streamlined. Nearly all the advocates of streamlined processes compare their offerings to something they call “classical” RCM. However, closer study of what they mean by “classical” RCM reveals that it is often a monstrously complicated process or collection of processes that bears little or no resemblance to RCM as defined in the SAE Standard. In these cases, it is hardly surprising that streamlined RCM is cheaper and quicker than these so-called “classical” fantasies. In reality, if true RCM is applied by well-trained individuals to properly defined and managed projects, it is nearly always quicker and cheaper than the streamlined versions, in addition to being far more defensible and producing far greater returns.