Archive | January, 2003


2:44 am
January 2, 2003
Print Friendly

Resources for Computerized Maintenance Management Systems

You already may have accessed your computerized maintenance management systems (CMMS) over the web or rented licenses for an entire CMMS through an Internet application service provider (ASP). This is a fairly new method for software application delivery that simply requires an Internet connection and a browser such as Microsoft Internet Explorer or Netscape Navigator. Experts seem to be in agreement that “renting” the use of software applications like CMMS over the Internet will grow substantially over the next few years.

The list of web-based CMMS companies is too extensive and the feature sets are too varied for this column; however, a comprehensive directory of CMMS vendors is published at Check with your CMMS vendor to learn more about its Internet options for hosting, accessing, and supporting your CMMS software.

Whether you are a single user, network user, or an Internet user of CMMS, there are a number of independent Internet resources that can help you increase software productivity. is not connected with the publishers of Maximo software. It was designed originally to assist Maximo users in connecting with each other to provide advice and exchange ideas for productivity. It has grown into a very impressive resource site for any CMMS user. Visit the download area for free failure code templates, RCM analysis codes, PM descriptions, report templates, MTBF tracker, and more. offers support specifically for users of SAP, the giant German enterprise software supplier for many of the world’s largest corporations. SAP is one of the most powerful enterprise software systems available, and it is also one of the most reviled because of the software’s complexity. Click the SAP CD tab to learn more about SAP training for the Plant Maintenance module. A list of helpful SAP resource links is also part of this specialized site. is running an online CMMS benchmarking survey with a goal of collecting data from 1000 CMMS users. It has logged over 250 responses and is growing daily. Survey participants get access to the benchmarking results in real time and in summary form to compare their CMMS productivity to others from around the world. is an independent CMMS site with a wide variety of white papers, book excerpts, and presentations on various aspects of CMMS/EAM. Articles range from practical topics such as “ROI Calculation for CMMS Projects” to IT-related issues such as “.Net and the Future of Enterprise Asset Care.”

Perspective CMMS is a British CMMS consultant’s site that gives away many of his secrets online at no cost. It also offers a CMMS audit by e-mail.

We hope you find these CMMS resources useful and that you will share useful maintenance sites with us for future columns. Please send your comments, suggestions, and web sites. MT

Continue Reading →


7:50 pm
January 1, 2003
Print Friendly

Understanding Hidden Failures in RCM Analyses

Addressing hidden failure modes is a key aspect for successfully achieving plant reliability.

Reliability Centered Maintenance (RCM) is not new. Airline Maintenance Steering Group (MSG) Logic, the predecessor to RCM, has existed since the early 1960s. F. Stanley Nowlan and Howard Heap of United Airlines introduced formal RCM to the commercial aviation industry in 1978. Airline reliability is primarily based on this work. The vision is as relevant today as it was when the first edition of Reliability Centered Maintenance was published in 1978.

Today, almost everyone in a manufacturing, power generating, or technological environment is familiar with the concept of RCM. However, the perceived degree of familiarity with RCM may be deceiving. RCM is simple in concept but also sophisticatedly subtle in its application.

As with many processes, a simplistic and limited understanding of RCM may prove more problematic than beneficial. The false comfort level of naïvely believing that a superficial implementation of the process will become a panacea for plant equipment problems and then depending on that process to produce significant reliability results is unrealistic.

Analyzing a system
The simple understanding of RCM consists of identifying system functions, functional failures, consequences of those failures, etc. However, Nowlan and Heap gave great importance to understanding hidden failures which are not widely understood and are often overlooked when performing an RCM analysis.

The true reliability benefits of RCM become evident only with a thorough understanding of how to functionally analyze a system. Understanding hidden failure modes, understanding when a single-failure analysis is not acceptable, and understanding when run-to-failure (RTF) is acceptable, are the real cornerstones of RCM. Additionally, the subtle but important distinction between true redundancy and redundant components fulfilling a backup function is also a key to reliability success.

Many utilities and other industries have implemented an RCM program only to find that they continued to have fundamental reliability issues that were not addressed by their analysis. The primary reason is the lack of a grass-roots philosophical understanding of the principles governing the analysis.

Identifying important equipment
Optimizing a preventive maintenance program consists of three phases: Phase 1, identifying equipment that is important to plant safety, operation, and asset protection; Phase 2, specifying the requisite PM tasks for the equipment identified in Phase 1; and Phase 3, properly executing the tasks specified in Phase 2.

At the very least, identifying equipment important to plant safety, operation, and asset protection consists of three programmatic principles that must be well understood before commencing an RCM analysis.

  1. Understand the cornerstones for developing an effective RCM program.
  2. Identify the defensive strategies for maintaining an effective RCM program.
  3. Identify when a component can be classified as RTF and understand the limitations governing RTF components.

A look at each of these principles in detail will illustrate the key areas for successfully achieving plant reliability and maximizing cost containment efforts.

Understand the cornerstones
There are three cornerstones that must be understood for developing an effective RCM program:

  • Know when a single-failure analysis is not acceptable.
  • Identify hidden failures.
  • Know when a multiple-failure analysis is required.

A single-failure analysis is not acceptable when the occurrence of the failure is hidden. When a component is required to perform its function and the occurrence of the failure is not evident to operating personnel, that is, the immediate overall operation of the system remains unaffected in either the normal or demand mode of operation, then the failure mode is defined as hidden.

0103_rcmfailure_fig1A multiple-failure analysis is required when the occurrence of a single failure is hidden. Addressing hidden failure modes is a key aspect for maintaining plant reliability.

Identify the defensive strategies
There are three distinct lines of defense for maintaining an effective RCM program. The first strategy for defending a plant against unplanned equipment failures is identifying critical components. These are components where a single failure will result in one or more consequences similar to the following:

  • A direct impact to personnel or plant safety.
  • A plant trip or shutdown of a manufacturing facility.
  • A power reduction, down power, or the loss of a facility’s operational capability.
  • An inadvertent actuation of a safety system.
  • An unplanned forced outage.
  • Other (depending on specific type of plant or industry)

The second line of defense for protecting a plant or facility is to identify what this author refers to as potentially critical components. These are components which, if they fail when called upon to function, the failure is hidden and will not have an immediate effect on the plant. However, the hidden failure in combination with one or more additional failures will result in consequences similar to the following:

  • A direct impact to personnel or plant safety.
  • A plant trip or shutdown of a manufacturing facility.
  • A power reduction, down power, or the loss of a facility’s operational capability.
  • An inadvertent actuation of a safety system.
  • An unplanned forced outage.
  • Other (depending on specific type of plant or industry)

Note the similarities between critical and potentially critical components. The only difference is that critical failures manifest themselves immediately while failures of potentially critical components are hidden and will not manifest themselves until a second, multiple failure occurs.

To better understand the concept of potentially critical components (which is totally different from the potential failure of a given component) consider the following example.

0103_rcmfailure_fig2When two or more components (valves, pumps, motors, etc.) operate in parallel flow paths to supply a function but only one component is required to fulfill the function, and there is no indication of failure for each component individually, then a failure of one of the components will be hidden (there will be no indication the component has failed) and the failure will not result in a plant effect. However, if the second component should fail, then a plant-effecting consequence would occur. Hence, the component is considered to be potentially critical.

Another example involves a pump discharge check valve. If there are two pumps operating at the same time, a failure of the check valve in the open position will be hidden. Only when one pump fails will the unwanted reverse flow path through the failed open check valve become evident.

How prevalent are hidden failures? Extremely. Just a few examples include main turbine overspeed components, many check valves, diesel generator fuel oil pumps, and emergency diesel generator shutdown components. Identifying potentially critical components affords perhaps the greatest degree of reliability protection for a plant or facility.

Hidden failures are typically failures of one or more components aligned in parallel with no indication of failure for each individual component. In Fig. 1 for example, one of the two components could fail but since each one by itself can satisfy the function, only when the second one fails will the functional failure become evident; therefore, the failure of the first component is potentially critical.

How important is this concept? Very. There are many examples in industry where a designer intentionally builds in multiple redundancy to ensure reliable system operation. Unfortunately, if the redundancy has no way of manifesting itself when it fails, a plant-effecting consequence can occur with the second failure.

There is a vast difference between a component operating in a backup function and one that is not (Fig. 2). In Example 2, the component is an RTF component while the component in Example 4 is critical.

The third line of defense to protect a plant is to identify economically significant components. These are components whose failure will not be critical or potentially critical, but will result in one or more of these economic concerns:

  • An unacceptable cost of replacement or restoration.
  • An unacceptable corrective maintenance history.
  • A long lead time for replacement parts.
  • An obsolescence issue.
  • Other (depending on specific type of plant or industry)

Failures of economic components have no effect on plant safety or operability. Economic failures will result only in labor and/or parts replacement costs. It is important to keep this economic categorization separate from critical and potentially critical components to enable a prioritization of work.

Note: If a failure occurs to a major piece of equipment (even if it is economically significant) but it results in an effect on plant safety, operation, or a plant outage, it would be more than merely an economic consideration. It would be captured as either a critical or potentially critical consequence of failure.

Identify RTF components, understand limitations
RTF in its most basic definition means PMs are not required prior to failure. This does not imply that the component is unimportant and never needs to be fixed. Corrective maintenance is required in a timely manner after failure to restore the component to an operable status. RTF components are understood to have no safety, operational, commitment, or economic consequences as the result of a single failure. Also, the occurrence of failure must be evident to operations personnel.

RTF components are designated as such because a failure is evident and there is no significant consequence from a single failure. If it does not matter whether a failed component is ever restored to an operable status, one would question why that component is even installed in the plant.

The heart of reliability is a sound preventive maintenance program and RCM provides the most prudent approach for establishing an effective PM program. MT

Neil Bloom is program manager, RCM and preventive maintenance programs, at Southern California Edison’s San Onofre Nuclear Generating Station. He previously worked in the commercial airline industry in both maintenance and engineering management positions. He can be reached at Mail Unit K-50, P.O. Box 128, San Clemente, CA 92672; (949) 368-6378


Continue Reading →


7:45 pm
January 1, 2003
Print Friendly

Critical Component of the CMMS: The Repair Work Order

The better and more consistently recording of repair activities is done, the greater potential for yielding greater and more specific information about an operation.

From the very start, the implementation of a computerized maintenance management system (CMMS) is a long and arduous process. One of the largest concerns is how to effectively get the correct data into the system in the first place, and then, how to get useful information out.

What follows can provide a method to get better data into the CMMS with every work repair request. The yield is more and better data for analysis, which is the all important question in the long-term successful evaluation of the implementation—is this information tool providing useful information?

There is no replacement for a good, integrated implementation plan that covers the setup of the database, training, data design and collection, etc. Consider this as an enhancement to be added to the existing plan.

Repair data
Basic repair data fields come in four categories:

  • Origination
  • Planning
  • Scheduling
  • Results

Origination data includes the emergency flag, the original observer of the problem and how the person can be reached, the equipment experiencing the problem, and a problem description. This data must be obtained to effectively get labor and materials assigned to the job.

Although it is most important to get all data consistently and correctly into each field, most problems occur at work order origination and multiply as the work order is processed. See accompanying section “Work Order Data Fields.”

The two most important fields at the origination of a repair are the equipment number and the problem description. The equipment number is needed to get the person to the correct equipment, as well as to insure charges are posted back to that piece of equipment for historical detail as well as summary analysis of its department, process, unit, etc.

The importance of the problem description cannot be understated. Whenever a CMMS is implemented, every person who may originate a work order should be trained to call in (type in, write in, etc.) the problem description. This should include what was observed that prompted the call. Sample bad problem descriptions:

  • “It’s not working” or “It’s down.”
  • “It’s broken.”
  • “It sounds like it is going to fail.”

Bad problem descriptions do not provide enough descriptive data and they lead to bad descriptive results such as:

  • “It’s working” or “It’s up.”
  • “It’s fixed.”
  • “Sounds OK to me, just a little noisier than normal.”

If the historical records within the system contain descriptions similar to these, plan to retrain everyone immediately and include a sample of these records to show how useful (or not) they are for historical analysis.

More effective descriptions would be based on what the observer/originator of the problem sensed:

  • Saw a leak
  • Heard excessive gear grinding or a pop in the disconnect panel
  • Smelled something unusual burning
  • Felt excessive vibration at normal run speed
  • Tasted like there was too much syrup, but the controls indicate the proper mix

These are oversimplified examples, but a trained mechanic can identify a starting point and promote a response that is more descriptive of the cause. For historical purposes, this can be invaluable in looking at repetitive problems and working toward engineering them out of existence.

Using a basic repair order
Better understanding of why proper problem descriptions should be used is probably the biggest and most inexpensive way to make a major leap in repair data capture.

A basic repair work order has room for free form text, but also specific codes that can be selected to help sanitize what is reported about the work, specifically to enhance analysis, expedite reporting, and, at the same time, not overburden the mechanic with paperwork.

Results data should at a minimum include the skill/trade that completed the work, work time, a work description (what was done), materials used and/or costs, a cause code, downtime, and an assessment of the repair.

The skill should have an associated wage (or wage plus burden) rate so that hours may be converted to costs for charging back to each piece of equipment, and the associated grouping codes (department, unit, etc.), when combined with the work time. The work description should explain the action taken on what part of the equipment.

If recording downtime, it must be defined and all personnel must be familiar with how it is charged and used. The most common discrepancy comes when a machine is out of service for a maintenance reason during a nonoperating shift for that piece of equipment. Is it down?

Get materials costs
Materials used and their costs are helpful for keeping inventory up to date and charging materials to each piece of equipment.

Having the material identified by its tracking number in the inventory control system (whether or not this is a module in the CMMS) is essential for documenting proper part usage and tracking and bill of material building. This is one of the first areas of potential interface when the nonproduct material is maintained by an organization/system outside of maintenance.

This type of interface would allow documentation of the part number, and then the cost could be brought over from the parts inventory control system if it is not in the CMMS.

This cost is especially important in light of the fact that many materials costs can exceed labor costs significantly, and both are necessary to properly assess the maintenance requirements and history of a given piece of equipment.

Assess the repair, use codes
Although additional comments about a job may not be entered, it is a good idea to get the mechanic’s assessment of the repair at least to the point where the repair is identified as “temporary” or not. A temporary repair is most often done to get the operation through the shift, and subsequently a relatively permanent repair is completed at a more convenient time.

For each repair, an assessment should be provided by the mechanic. This comment may indicate the repair was temporary, and if so, it should be followed by a recommendation indicating what needs to be done to make it more permanent.

Last, and not least, is some type of repair cause code. The reason a code is used instead of a description is to begin to categorize the repairs for easier analysis. Once statistical analysis is completed, the more significant individual items can be further analyzed by review and evaluation of their details.

Codes in the CMMS represent a great potential advantage for accelerating recording of repairs, as well as their analysis, but can be extremely dangerous if overused. There should not be so many code fields, and/or codes per field, that it requires a separate page to list the possibilities, and someone must read through them for each repair.

For example, a CMMS may contain fields for problem, failure, cause, root cause, solution, or action verb/noun combinations, etc. For each field, there may be 40 or 50 possibilities, and probably more. This just makes it take longer to complete the work order and often leads to more specific codes being added, thus making the recording process even more complicated.

An important aspect to documenting work is simplifying the process. Use codes that are broad in nature, and relevant to the process environment wherever possible. An invaluable source of these codes is a review of historical activities that probably exist on manual records. Causes can be derived from work descriptions entered even if they are only to categorize parts problems from electrical, leaks, adjustments, etc.

Multi-line work order
The basic work order example is considered the workhorse for capturing planned and unplanned work, and provides areas to document extensions when the work is carried over for virtually any reason (scheduling, availability of materials, etc.).

A multi-line work order that mechanics would have at the beginning of their shift is typically used to capture work that is often unplanned and would be completed during the shift. Items carried over are typically referenced from here and transferred to the basic work order form for future execution.

The better and more consistently recording of repair activities is done, the greater potential for yielding greater and more specific information about the operation, in both qualitative and quantitative terms. The more quickly this can be done, the sooner actual activities will be reported into the CMMS, and a useful history will be built that can be more easily analyzed through statistical methods. MT

Christopher N. Winston is an independent professional in the Detroit, MI, area contracted to HSB Reliability Systems Group, 1701 N. Beauregard St., Alexandria, VA 22311. He has more than 18 years’ CMMS implementation and business system analysis experience and has a bachelor of science degree in mechanical engineering.

Work Order Data Fields

  • Machine
  • Problem description
  • Emergency flag (Yes/No)
  • Skill/trade
  • Work (done) description
  • Parts requested
  • Wrench or work time
  • Action codes (problem, cause, downtime, failure mode, solution, reason not done, etc.)
  • Downtime
  • Repair assessment
  • (i.e., temporary?)
  • Originator. requester
  • Job number
  • Budget/actual cost
  • Multiple authorizations
  • Job status
  • Parts/material usage
  • Project number
  • Safety/special requirements (JSA, scaffold, formed pit, etc.)
  • Permits (hot work, confined space entry, etc.)

back to article

Continue Reading →