Archive | 2003


3:20 am
February 2, 2003
Print Friendly

Improve Reliability With Mobile, Wireless Workflow Technology

Progressive organizations are using mobile, wireless workflow technology to streamline their process and maximize equipment reliability.

It is no secret that each year petroleum refiners are spending less to process a barrel of crude oil into marketable fuel. Corporations realize they have to go beyond working harder to stay competitive. Forward-thinking companies continuously find more cost effective ways to streamline their processes by leveraging maintenance technologies to improve the bottom line. It is a survival of the fittest for the refining industry—fail to perform optimally over a substantial period and chances are you will end up as the slowest in the herd, eventually falling out of the pack, or worse, a more efficient organization consumes you.

Progressive organizations are using mobile, wireless workflow technology to streamline their process and maximize equipment reliability to stay ahead of the competition. This article will present a review of the reliability goals (business drivers) and the results of implementing an operator driven reliability (ODR) process using mobile, wireless workflow technology at a Gulf Coast refinery.

The first step

Valero Energy Corp. purchased a Gulf Coast, lower-quartile performing refinery in 1997 from a crude oil trading company. After an infusion of refining management, a reliability process was initiated using consultants to facilitate teams that were formed to yield reliability and quartile ranking improvements. Although several teams were successful, two became paramount: the reliability measurements and the ODR teams. Reliability measurements made sense and became a way of life–you cannot manage what you cannot measure. The metrics for reliability measurements were completed and implemented in early 1999, while ODR blossomed in early 2000.

Both teams identified a need to streamline operator rounds and obtain technology to facilitate the review of all data collected. These needs became key requirements for the ODR process. It was essential to develop trending capability of field-collected data to identify equipment in early phases of failure. If problems were detected early, the equipment could be halted and corrective action taken to minimize the cost to repair.

Profits can be increased by producing more or spending less. Management hoped the process of increasing reliability without adding personnel or workflow would result in less spending. To gain data consistency through the implementation of a best practice, the team identified the need to eliminate paper and implement electronic devices where data would be entered and seamlessly transferred to a system that was available for analysis by interested parties. The goal was for operators to accept ownership of their equipment and to accept a new system, one that enforced a best practice workflow process.

After evaluating available technologies, IntelaTrac from SAT Corp., Houston, TX, was chosen. IntelaTrac is industry workflow automation software used on rugged mobile handheld devices. The software aligned with management’s vision for a tool that crosses all disciplines and collects data for analysis—from operations to maintenance, environmental, and turnarounds. As a bonus, the software leveraged existing legacy software systems including the Aspen process historian and SAP.

While in front of equipment, operators can take immediate, preventive actions on their routine rounds. Asset status can be documented with bar code technology or radio identification tags. Audit ability can also be preserved and the data updates into back-end systems for additional analysis in other departments.

During the ODR implementation, a review was performed of what field data should be collected. Input from several departments ensured that all collected field data would be of value and analyzed. The golden rule was, “if data is not going to be analyzed then it will not be collected.” The final result provided key visual, vibration, and temperature readings that provide early detection of equipment degradation, and include several preventive maintenance activities that previously were performed by maintenance personnel. This is beneficial to the maintenance craft as they have more time to perform craft skills activities.

Results in phases

The first phase was a pilot installation of the software and the mobile technology in three process units within the complex. Several specific assets in the early stages of failure were detected and failure was prevented. The return on investment (ROI) for the pilot from that failure prevention indicated a three-month payback. Satisfied with the results, the site rolled out ODR to the remainder of the complex. Similar results were observed and a two-month ROI was determined.

Over a 12-month period, the early detection of equipment failures was recorded and a dollar value assigned. To define the value, historical information from SAP provided the average cost of the specific equipment repair that was used as a baseline against any items identified using the software. The cost to repair equipment identified through early detection was then subtracted from the historical baseline average. Items found included worn bearings and seals, equipment out of alignment, unexpected process conditions (plugged strainers or seal pots), and coupling adjustments. Each item found to have a problem was taken out of service before having a catastrophic failure event. More than $558,000 in savings and maintenance avoidance was identified.

Measuring the result

To align with the reliability measurements effort, total work order costs were reviewed. Fig. 1 illustrates total monthly work order cost for the complex implementing ODR.

An equal period pre- and post-IntelaTrac implementation is illustrated. Total work order costs during the period support the findings in specific equipment saves.

Total work order cost was analyzed for the months prior to the ODR process and software implementation. Comparing the 44 months prior to ODR implementation to the 12 months that followed pointed out the obvious: the complex monthly work order costs had been reduced by more than $87,000 per month, a 33 percent improvement. With this result, a complete ODR site rollout was justified and is currently in progress.

Fig. 2 is a Crow/AMSAA reliability growth diagram illustrating a positive step change in work order cost over time, and a prediction of the amount to be saved in avoided maintenance at a future referenced date.

Crow/AMSAA charts are the log-versus-log of a cumulative-sum graph. Initially used by J. T. Duane of General Electric to plot cumulative failures over time and later mathematically proven by Dr. L. Crow of the U.S. Army Materiel Systems Analysis Activity (AMSAA), the plot is a fairly accurate model and predictor of maintenance costs.

Each point in the graph represents maintenance costs for 1 mo. After a few points are plotted, a best-fit line can be drawn through them. The Beta, or slope of the line, shows if improvements are taking place. A slope greater than 1.0 indicates that reliability efforts are failing while a slope less than 1.0 indicates that reliability efforts are succeeding. A slope of 1.0 is simply treading water. Lambda is the Y intercept of the work order cost line. R2 is the mathematical fit of the work order cost line to the monthly cost points (the closer to 1.000, the more accurate the line is drawn, i.e., the better the slope is).

The figure illustrates the total work order cost each month for the complex implementing ODR. An equal period pre- and post-software implementation is illustrated. A best fit line was drawn for months prior to ODR and the slope calculated. A 1.24 slope was determined indicating that a step change, almost a paradigm shift, was warranted. A second best fit line was drawn for months after ODR implementation and the slope calculated. A 0.75 slope was determined indicating that a paradigm shift in maintenance avoidance had taken place; hence the $87,000 per month cost savings in the complex.

Success factors

Could all of these savings be attributed to the implementation of the ODR process? Probably not. A refinery is a dynamic place with many influencing factors. During this period, operations management took a strong reliability stance, seasoned reliability engineers were assigned to the complexes, and a turnaround was conducted during the period. Maintenance activities likely took place during the outages that may not have been true turnaround items. Operations also took an equipment ownership position during this period that continues today. However, it is difficult to dispute the data. The step change only occurred in the complex that implemented the ODR process, including the installation of the software, and the use of wireless, handheld data collectors.

The trickle-down effect

Operators and other personnel have begun using the software and the wireless handhelds outside of the ODR realm.

Process safety management and occupational safety inspections, such as car seal, fire extinguisher, and hose station inspections, are being conducted. Volatile organic compounds information is being collected, stored, and transferred to the environmental department for regulatory requirements. Caustic concentrations are being reviewed and calculated in the field. Even a large expansion and turnaround conducted early in 2002 was followed and reported to corporate management using the handhelds and software. This was received with enthusiastic acceptance by the turnaround management team.

Staying ahead of the pack

Realizing the need to go beyond just working harder to stay competitive and trying to be the faster animal in the herd, investigation is underway to meld ODR into reliability centered maintenance and risk-based inspection initiatives to develop a more seamless reliability process at the site. The wireless handhelds are rapidly becoming the mobile laptop for many individuals who are required to complete various tasks within the site. It will be another year before the effort to implement ODR refinery-wide is fully analyzed, reviewed, and reported. MT

James R. Cesarini is manager of reliability and turnarounds, Valero Energy Corp., San Antonio, TX


Fig. 1. Total work order costs during the period support the findings in specific equipment saves.

back to article


Fig. 2. CROW/AMSAA reliability growth diagram of work order cost after implementation shows a positive step change over time, and a prediction of the amount to be saved in avoided maintenance at a future referenced date.

back to article

Continue Reading →


2:42 am
February 2, 2003
Print Friendly

Web-Based Government Maintenance Information for the People

In a recent column, I wrote about a root cause analysis government document that was available online. I was so impressed by the information available that I decided to conduct a search for more information that may have been funded by our federal and state tax dollars.

As it turns out, the U.S. government has embraced the web and made it easy to find documents relating to maintenance and reliability or any other subject you care to research.

The best starting point for a search is, the U.S. government portal. This site offers a powerful search engine that scans the federal government web site system as well as sites for all U.S. states and territories. The search results list the document title and a brief summary as well as the file’s size and type.

The Public Buildings Service of the General Services Administration (GSA) is responsible for inspecting and maintaining building equipment and systems. This web site offers a detailed facilities preventive maintenance guidebook. The goal is to maximize equipment life while providing a safe, comfortable, and healthy environment for all building tenants.

The Department of Energy (DOE) is rich with documents relating to maintenance activities at DOE nuclear facilities that can be adapted to maintenance operations. The list is extensive and the URLs are long, so we have listed only a few here:

Guideline to Good Practices for Maintenance Planning & Scheduling

Guideline to Good Practices for Maintenance Control

Life cycle asset management guide

The DOE’s Office of Industrial Technologies (OIT) BestPractices web site () works with industry to identify plant-wide opportunities for energy savings and process efficiency.

NASA headquarters offers dozens of documents related to work planning, reliability centered maintenance, facility maintenance, and predictive maintenance.

A logistics management paper, “Reliability Based Logistics versus Reliability Centered Maintenance” by Mark Lewis, contrasts reliability based logistics (RBL) and reliability centered maintenance (RCM) and discusses appropriate situations.

Maintenance Resources-Optimizing” is a research paper discussing what RCM is and how it works, and identifying lessons learned. MT

Continue Reading →


7:34 pm
February 1, 2003
Print Friendly

Finding the Elephant in Maintenance


Robert M. Williamson, Strategic Work Systems, Inc.

When looking for an elephant many people think they need to know lots of things about the beast. How do we track it? What does it eat? Where does it sleep? Where is it likely to hide on hot, sunny days? Where might it be when it’s cold and dark? Who might have seen the elephant last and when? Where was it headed? And the list goes on and on.

In the world of manufacturing and facilities management we often get called upon to find ways to improve maintenance. How do we track it? What does it cost? Why is maintenance really good at times and other times it is in the pits? Who knows how good maintenance can, and should, be? What IS “maintenance” anyway? This list, too, goes on and on. So, how DO you improve maintenance? And, how do you find an elephant?

Maintenance improvement practices take on many different appearances. For example, there is preventive maintenance analysis and improvement. Then there is reliability-centered maintenance (RCM). How about operator-performed maintenance and total productive maintenance (TPM)? Others include outsourcing specialized maintenance, supplier consolidation, and knowledge transfer. Then there’s lean manufacturing (and even lean maintenance) and the Kaizen initiatives.

In a large number of plants and facilities, 30-40 percent of the maintenance hours worked are for reactive or emergency repair requests. Many of these reactive maintenance organizations did not used to be that way. They may have slipped into it because of cost improvement initiatives, re-organizations, higher than normal technician turnover. Their equipment has become more problem-prone than in the past; leaks have not been stopped, vibrations have not been curtailed, and root causes have not been addressed “because we just don’t have the time or the resources.”

In these reactive maintenance plants and facilities it does not take a great deal of analysis to find out where and what the BIG problem is. It does not take lots and lots of data analysis to point to the biggest, most critical opportunity for improvement. It does not take a fancy maintenance improvement practice, initiative, or activity to address the problem. Consider these four basic steps:

Step 1. Focus on the biggest equipment-related interruption to production throughput. Look for the biggest equipment-related complaint in the facility. Look for the highest maintenance cost. Look for the area having the highest number of maintenance “emergencies.” Find the biggest elephant!

Step 2. Focus on the condition of the equipment identified in Step 1. Is the operational condition satisfactory? Or is it leaking, bouncing, missing parts, patched together?

Step 3. Focus on the past years’ maintenance history for the equipment identified in Step 1. Are there indications of “pencil whipping” PMs? Are the PM tasks accurate and complete? Are the work orders and the CMMS reports reviewed periodically?

Step 4. Focus on the skills and knowledge of the people responsible for operations and maintenance of the equipment identified in Step 1. Are proper maintenance and operations skills being applied? Are proper operations and maintenance decisions being made, and reinforced?

Sometimes in that great elephant hunt we stumble across the beast when we least expect to. All that research, digging through tons of evidence just bogs us down. Rarely do we need a microscope to find the elephant. Just follow the tracks!

Improving maintenance in a reactive organization does not take a microscope either. Chances are these plants and facilities do not have the resources to undertake a widespread maintenance improvement effort in a sustainable manner. They often find themselves slipping deeper and deeper into reactive maintenance with no hope. They are surrounded by a heard of invisible elephants.

So, what’s the bottom line here? In a reactive maintenance plant or facility the goal may NOT be to improve maintenance but rather to improve the equipment—performance and reliability improvement of that most critical, high-maintenance-cost piece of equipment that contributes to the biggest interruption to production, or the biggest complaint in the facility.

Focus on proper operations and maintenance, make just those problems go away, and you have quickly reduced your costs and improved your business performance, one elephant at a time. MT
Continue Reading →


7:31 pm
February 1, 2003
Print Friendly

How Core Is Your Competency?


Robert C. Baldwin, CMRP, Editor

Outsourcing is a business strategy that has been around for a long time, but it seems to be popping up more frequently in business management discussions.

In the industrial plant arena, equipment maintenance is a possible candidate for outsourcing—7 percent of plants do it frequently or all the time.

Writing on the Internet site for the Outsourcing Institute, Timothy P. Smith, vice president of the Amega Group, noted, “The driving force behind the decision to outsource is the ability to focus on core competencies, but what is the true value proposition outsourcing brings to an organization? Plain and simple, outsourcing allows organizations to be more efficient, more effective, and to reduce costs.”

He goes on to amplify outsourcing values: Efficient—producing results with little waste of effort; Effective—producing a decided, decisive, or desired effect; and Cost Reduction—spending less money to achieve better results.

An often-cited business mantra is “Focus on your core competencies and outsource everything else.”

That brings us to the question: Is maintenance and reliability a core competency?

One definition of a core competency is any activity that creates or protects a competitive advantage.

When I look around at the maintenance organizations with reputations for excellence, it seems they are in companies that are in business for the long haul and that have an edge in quality, cost, or responsiveness (availability and capacity), all of which must be protected by a high level of equipment reliability.

Maintenance is indeed a core competency in these companies.

On the other hand, if reliable equipment is needed to support a competitive advantage and the maintenance organization is unable to produce, it would seem to be a core incompetentcy, and likely could be outsourced to good advantage.

I’m reminded of the primary theme of The Peter Principle, the book by Laurence J. Peter: “Everyone rises to his level of incompetence.” I certainly hope we all rise to the core competency level before we hit the wall.

But, according to Peter, “Competence, like truth, beauty, and contact lenses, is in the eye of the beholder.” This must mean that one of the core competencies of a truly competent maintenance manager must be the ability to convince the beholders in the executive suite that maintenance is a core competency. MT


Continue Reading →


3:21 pm
February 1, 2003
Print Friendly

Determining Accurate Alignment Targets

Part three of a four-part series that will cover alignment fundamentals and thermal growth, and highlight the importance of field measurements through two case studies.

The previous article in this series, “Understanding Shaft Alignment: Thermal Growth” (MT 1/03, pg. 19), explained thermal growth and its affect on proper equipment alignment. A practical example involves a recent project at a wastewater treatment plant in Cleveland that needed realistic cold alignment targets for a 3600 rpm compressor.

This machine had a long history of coupling and bearing failures. Over a two-year period several attempts were made to calculate the thermal growth on the motor and compressor supports. The original equipment manufacturer’s (OEM) technical manual gave a vertical thermal offset value of +0.04 in. (+40 mils). There were no recommendations for a target vertical angularity. Horizontal alignment changes were not mentioned.

Confusing data
There was some confusion with the OEM targets as provided. Maintenance personnel did not know if this value represented what the rim dial indicator should read when the cold alignment was completed (with a dial indicator mounted on the stationary shaft and indicating the motor coupling). Dial indicators indicate the total indicated runout (TIR) each time the shaft is rotated 180 deg. Half of the TIR represents the actual centerline offset; therefore the target should actually be +20 mils vertical offset.

The technician averaged temperature changes measured from the bottom of the support to the split line of the machine. This data was compared with hot alignment readings taken with a modern laser shaft alignment system. The result of all the data was a calculated vertical offset target of +19 mils and a target vertical angularity of +0.65 mil/1 in. No targets were calculated to compensate for horizontal alignment changes.

Laser-based system used
A laser-based monitoring system was installed on the machine and the shaft alignment was monitored as the machine was placed online and allowed to operate until it reached normal operating conditions. There were some interesting changes in the machine’s operating characteristics. A set of machine vibration data was collected at 30 min intervals during the machine’s warm-up period.

The graphs show data collected from the laser-based monitoring system.

The shaft alignment was set with a vertical offset value of +19 mils and a vertical angularity value of -0.65 mil/1 in. The vibration data collected on the machine bearings continued to improve, reaching a low of 0.13 in./sec (peak overall) until the change in the alignment reached the calculated targets. Unfortunately, the alignment continued to change past the calculated values; as the alignment moved farther away from zero, the vibration data trended back up to fairly high levels, 0.30 in./sec (peak overall). Spectral data indicated misalignment. The farther the alignment moved away from tolerance, the more clearly the signs of shaft misalignment became.

The laser-based monitoring system’s data indicated changes in the horizontal alignment that would take the alignment out of tolerance in the horizontal plane as well. The total change in the shaft alignment was:

Vertical offset: -22.2 mils
Vertical angularity: -0.88 mil/1 in.
Horizontal offset: +4.42 mils
Horizontal angularity: +0.55 mil/1 in.

Based on the changes in the alignment as measured by the laser-based monitoring system, the cold alignment targets for this machine were:

Vertical offset: +22.2 mils
Vertical angularity: +0.88 mil/1 in.
Horizontal offset: -4.42 mils
Horizontal angularity: -0.55 mil/1 in.

Data was obtained from a startup; therefore, targets are opposite of the recorded change.

Lessons learned

So, what was learned from this example of thermal growth documentation? The first lesson learned is that no matter how many statistical calculations go into a thermal growth estimate, the best way to get thermal growth information is to measure it directly.

Another lesson is OEM-recommended cold alignment targets, while sometimes close, cannot accurately predict the actual operating conditions of a machine in its final installed state.

A third lesson can be learned from the changes in the horizontal alignment data. The dynamics of machines during operation force changes in the shaft alignment that cannot be measured during a hot alignment check. The machine examined in this example had a horizontal offset of +4.4 mils during operation. When the machine was shut down, the horizontal offset immediately changed by -3 mils, leaving a net horizontal change of +1.4 mils. The +1.4 mils is most likely due to temperature changes in the piping; however, 3 mils of the total change were most likely due to rotor torque and discharge pressure of the compressor.

Knowing the initial alignment condition of the machine and the measured changes in the alignment allows us to estimate the current operating misalignment of this machine:

Vertical offset: -3.2 mils
Vertical angularity: -0.23 mil/1 in.
Horizontal offset: +4.42 mils
Horizontal angularity: +0.55 mil/1 in.

For a 3600 rpm machine, the offset values would be considered outside the acceptable tolerance, and the angularity values are also higher than would normally be considered acceptable. This also relates to shaft alignment tolerances based on shaft rpm rather than on maximum coupling alignment values. Many coupling manufacturers would consider the alignment data acceptable; however, the vibration data shows that considerable force can be applied to the machine bearings due to small amounts of shaft misalignment.

Next month this series will conclude with another case study discussing how identical machines may have different alignment targets. MT

Contributors to this article include Rich Henry, Ron Sullivan, John Walden, and Dave Zdrojewski, all of VibrAlign, Inc., 530G Southlake Blvd., Richmond, VA 23236; (804) 379-2250; e-mail


Change in vertical offset when vibration was at its lowest recorded value: -18.61 mils


Change in vertical angularity when vibration was at its lowest recorded value: -0.55 mil/1 in.


Change in horizontal offset when vibration was at its lowest recorded value: +4.658 mils/1 in.


Change in horizontal angularity when vibration was at its lowest recorded value: +0.252 mil/1 in.

back to article

Continue Reading →


2:44 am
January 2, 2003
Print Friendly

Resources for Computerized Maintenance Management Systems

You already may have accessed your computerized maintenance management systems (CMMS) over the web or rented licenses for an entire CMMS through an Internet application service provider (ASP). This is a fairly new method for software application delivery that simply requires an Internet connection and a browser such as Microsoft Internet Explorer or Netscape Navigator. Experts seem to be in agreement that “renting” the use of software applications like CMMS over the Internet will grow substantially over the next few years.

The list of web-based CMMS companies is too extensive and the feature sets are too varied for this column; however, a comprehensive directory of CMMS vendors is published at Check with your CMMS vendor to learn more about its Internet options for hosting, accessing, and supporting your CMMS software.

Whether you are a single user, network user, or an Internet user of CMMS, there are a number of independent Internet resources that can help you increase software productivity. is not connected with the publishers of Maximo software. It was designed originally to assist Maximo users in connecting with each other to provide advice and exchange ideas for productivity. It has grown into a very impressive resource site for any CMMS user. Visit the download area for free failure code templates, RCM analysis codes, PM descriptions, report templates, MTBF tracker, and more. offers support specifically for users of SAP, the giant German enterprise software supplier for many of the world’s largest corporations. SAP is one of the most powerful enterprise software systems available, and it is also one of the most reviled because of the software’s complexity. Click the SAP CD tab to learn more about SAP training for the Plant Maintenance module. A list of helpful SAP resource links is also part of this specialized site. is running an online CMMS benchmarking survey with a goal of collecting data from 1000 CMMS users. It has logged over 250 responses and is growing daily. Survey participants get access to the benchmarking results in real time and in summary form to compare their CMMS productivity to others from around the world. is an independent CMMS site with a wide variety of white papers, book excerpts, and presentations on various aspects of CMMS/EAM. Articles range from practical topics such as “ROI Calculation for CMMS Projects” to IT-related issues such as “.Net and the Future of Enterprise Asset Care.”

Perspective CMMS is a British CMMS consultant’s site that gives away many of his secrets online at no cost. It also offers a CMMS audit by e-mail.

We hope you find these CMMS resources useful and that you will share useful maintenance sites with us for future columns. Please send your comments, suggestions, and web sites. MT

Continue Reading →


7:50 pm
January 1, 2003
Print Friendly

Understanding Hidden Failures in RCM Analyses

Addressing hidden failure modes is a key aspect for successfully achieving plant reliability.

Reliability Centered Maintenance (RCM) is not new. Airline Maintenance Steering Group (MSG) Logic, the predecessor to RCM, has existed since the early 1960s. F. Stanley Nowlan and Howard Heap of United Airlines introduced formal RCM to the commercial aviation industry in 1978. Airline reliability is primarily based on this work. The vision is as relevant today as it was when the first edition of Reliability Centered Maintenance was published in 1978.

Today, almost everyone in a manufacturing, power generating, or technological environment is familiar with the concept of RCM. However, the perceived degree of familiarity with RCM may be deceiving. RCM is simple in concept but also sophisticatedly subtle in its application.

As with many processes, a simplistic and limited understanding of RCM may prove more problematic than beneficial. The false comfort level of naïvely believing that a superficial implementation of the process will become a panacea for plant equipment problems and then depending on that process to produce significant reliability results is unrealistic.

Analyzing a system
The simple understanding of RCM consists of identifying system functions, functional failures, consequences of those failures, etc. However, Nowlan and Heap gave great importance to understanding hidden failures which are not widely understood and are often overlooked when performing an RCM analysis.

The true reliability benefits of RCM become evident only with a thorough understanding of how to functionally analyze a system. Understanding hidden failure modes, understanding when a single-failure analysis is not acceptable, and understanding when run-to-failure (RTF) is acceptable, are the real cornerstones of RCM. Additionally, the subtle but important distinction between true redundancy and redundant components fulfilling a backup function is also a key to reliability success.

Many utilities and other industries have implemented an RCM program only to find that they continued to have fundamental reliability issues that were not addressed by their analysis. The primary reason is the lack of a grass-roots philosophical understanding of the principles governing the analysis.

Identifying important equipment
Optimizing a preventive maintenance program consists of three phases: Phase 1, identifying equipment that is important to plant safety, operation, and asset protection; Phase 2, specifying the requisite PM tasks for the equipment identified in Phase 1; and Phase 3, properly executing the tasks specified in Phase 2.

At the very least, identifying equipment important to plant safety, operation, and asset protection consists of three programmatic principles that must be well understood before commencing an RCM analysis.

  1. Understand the cornerstones for developing an effective RCM program.
  2. Identify the defensive strategies for maintaining an effective RCM program.
  3. Identify when a component can be classified as RTF and understand the limitations governing RTF components.

A look at each of these principles in detail will illustrate the key areas for successfully achieving plant reliability and maximizing cost containment efforts.

Understand the cornerstones
There are three cornerstones that must be understood for developing an effective RCM program:

  • Know when a single-failure analysis is not acceptable.
  • Identify hidden failures.
  • Know when a multiple-failure analysis is required.

A single-failure analysis is not acceptable when the occurrence of the failure is hidden. When a component is required to perform its function and the occurrence of the failure is not evident to operating personnel, that is, the immediate overall operation of the system remains unaffected in either the normal or demand mode of operation, then the failure mode is defined as hidden.

0103_rcmfailure_fig1A multiple-failure analysis is required when the occurrence of a single failure is hidden. Addressing hidden failure modes is a key aspect for maintaining plant reliability.

Identify the defensive strategies
There are three distinct lines of defense for maintaining an effective RCM program. The first strategy for defending a plant against unplanned equipment failures is identifying critical components. These are components where a single failure will result in one or more consequences similar to the following:

  • A direct impact to personnel or plant safety.
  • A plant trip or shutdown of a manufacturing facility.
  • A power reduction, down power, or the loss of a facility’s operational capability.
  • An inadvertent actuation of a safety system.
  • An unplanned forced outage.
  • Other (depending on specific type of plant or industry)

The second line of defense for protecting a plant or facility is to identify what this author refers to as potentially critical components. These are components which, if they fail when called upon to function, the failure is hidden and will not have an immediate effect on the plant. However, the hidden failure in combination with one or more additional failures will result in consequences similar to the following:

  • A direct impact to personnel or plant safety.
  • A plant trip or shutdown of a manufacturing facility.
  • A power reduction, down power, or the loss of a facility’s operational capability.
  • An inadvertent actuation of a safety system.
  • An unplanned forced outage.
  • Other (depending on specific type of plant or industry)

Note the similarities between critical and potentially critical components. The only difference is that critical failures manifest themselves immediately while failures of potentially critical components are hidden and will not manifest themselves until a second, multiple failure occurs.

To better understand the concept of potentially critical components (which is totally different from the potential failure of a given component) consider the following example.

0103_rcmfailure_fig2When two or more components (valves, pumps, motors, etc.) operate in parallel flow paths to supply a function but only one component is required to fulfill the function, and there is no indication of failure for each component individually, then a failure of one of the components will be hidden (there will be no indication the component has failed) and the failure will not result in a plant effect. However, if the second component should fail, then a plant-effecting consequence would occur. Hence, the component is considered to be potentially critical.

Another example involves a pump discharge check valve. If there are two pumps operating at the same time, a failure of the check valve in the open position will be hidden. Only when one pump fails will the unwanted reverse flow path through the failed open check valve become evident.

How prevalent are hidden failures? Extremely. Just a few examples include main turbine overspeed components, many check valves, diesel generator fuel oil pumps, and emergency diesel generator shutdown components. Identifying potentially critical components affords perhaps the greatest degree of reliability protection for a plant or facility.

Hidden failures are typically failures of one or more components aligned in parallel with no indication of failure for each individual component. In Fig. 1 for example, one of the two components could fail but since each one by itself can satisfy the function, only when the second one fails will the functional failure become evident; therefore, the failure of the first component is potentially critical.

How important is this concept? Very. There are many examples in industry where a designer intentionally builds in multiple redundancy to ensure reliable system operation. Unfortunately, if the redundancy has no way of manifesting itself when it fails, a plant-effecting consequence can occur with the second failure.

There is a vast difference between a component operating in a backup function and one that is not (Fig. 2). In Example 2, the component is an RTF component while the component in Example 4 is critical.

The third line of defense to protect a plant is to identify economically significant components. These are components whose failure will not be critical or potentially critical, but will result in one or more of these economic concerns:

  • An unacceptable cost of replacement or restoration.
  • An unacceptable corrective maintenance history.
  • A long lead time for replacement parts.
  • An obsolescence issue.
  • Other (depending on specific type of plant or industry)

Failures of economic components have no effect on plant safety or operability. Economic failures will result only in labor and/or parts replacement costs. It is important to keep this economic categorization separate from critical and potentially critical components to enable a prioritization of work.

Note: If a failure occurs to a major piece of equipment (even if it is economically significant) but it results in an effect on plant safety, operation, or a plant outage, it would be more than merely an economic consideration. It would be captured as either a critical or potentially critical consequence of failure.

Identify RTF components, understand limitations
RTF in its most basic definition means PMs are not required prior to failure. This does not imply that the component is unimportant and never needs to be fixed. Corrective maintenance is required in a timely manner after failure to restore the component to an operable status. RTF components are understood to have no safety, operational, commitment, or economic consequences as the result of a single failure. Also, the occurrence of failure must be evident to operations personnel.

RTF components are designated as such because a failure is evident and there is no significant consequence from a single failure. If it does not matter whether a failed component is ever restored to an operable status, one would question why that component is even installed in the plant.

The heart of reliability is a sound preventive maintenance program and RCM provides the most prudent approach for establishing an effective PM program. MT

Neil Bloom is program manager, RCM and preventive maintenance programs, at Southern California Edison’s San Onofre Nuclear Generating Station. He previously worked in the commercial airline industry in both maintenance and engineering management positions. He can be reached at Mail Unit K-50, P.O. Box 128, San Clemente, CA 92672; (949) 368-6378


Continue Reading →


7:45 pm
January 1, 2003
Print Friendly

Critical Component of the CMMS: The Repair Work Order

The better and more consistently recording of repair activities is done, the greater potential for yielding greater and more specific information about an operation.

From the very start, the implementation of a computerized maintenance management system (CMMS) is a long and arduous process. One of the largest concerns is how to effectively get the correct data into the system in the first place, and then, how to get useful information out.

What follows can provide a method to get better data into the CMMS with every work repair request. The yield is more and better data for analysis, which is the all important question in the long-term successful evaluation of the implementation—is this information tool providing useful information?

There is no replacement for a good, integrated implementation plan that covers the setup of the database, training, data design and collection, etc. Consider this as an enhancement to be added to the existing plan.

Repair data
Basic repair data fields come in four categories:

  • Origination
  • Planning
  • Scheduling
  • Results

Origination data includes the emergency flag, the original observer of the problem and how the person can be reached, the equipment experiencing the problem, and a problem description. This data must be obtained to effectively get labor and materials assigned to the job.

Although it is most important to get all data consistently and correctly into each field, most problems occur at work order origination and multiply as the work order is processed. See accompanying section “Work Order Data Fields.”

The two most important fields at the origination of a repair are the equipment number and the problem description. The equipment number is needed to get the person to the correct equipment, as well as to insure charges are posted back to that piece of equipment for historical detail as well as summary analysis of its department, process, unit, etc.

The importance of the problem description cannot be understated. Whenever a CMMS is implemented, every person who may originate a work order should be trained to call in (type in, write in, etc.) the problem description. This should include what was observed that prompted the call. Sample bad problem descriptions:

  • “It’s not working” or “It’s down.”
  • “It’s broken.”
  • “It sounds like it is going to fail.”

Bad problem descriptions do not provide enough descriptive data and they lead to bad descriptive results such as:

  • “It’s working” or “It’s up.”
  • “It’s fixed.”
  • “Sounds OK to me, just a little noisier than normal.”

If the historical records within the system contain descriptions similar to these, plan to retrain everyone immediately and include a sample of these records to show how useful (or not) they are for historical analysis.

More effective descriptions would be based on what the observer/originator of the problem sensed:

  • Saw a leak
  • Heard excessive gear grinding or a pop in the disconnect panel
  • Smelled something unusual burning
  • Felt excessive vibration at normal run speed
  • Tasted like there was too much syrup, but the controls indicate the proper mix

These are oversimplified examples, but a trained mechanic can identify a starting point and promote a response that is more descriptive of the cause. For historical purposes, this can be invaluable in looking at repetitive problems and working toward engineering them out of existence.

Using a basic repair order
Better understanding of why proper problem descriptions should be used is probably the biggest and most inexpensive way to make a major leap in repair data capture.

A basic repair work order has room for free form text, but also specific codes that can be selected to help sanitize what is reported about the work, specifically to enhance analysis, expedite reporting, and, at the same time, not overburden the mechanic with paperwork.

Results data should at a minimum include the skill/trade that completed the work, work time, a work description (what was done), materials used and/or costs, a cause code, downtime, and an assessment of the repair.

The skill should have an associated wage (or wage plus burden) rate so that hours may be converted to costs for charging back to each piece of equipment, and the associated grouping codes (department, unit, etc.), when combined with the work time. The work description should explain the action taken on what part of the equipment.

If recording downtime, it must be defined and all personnel must be familiar with how it is charged and used. The most common discrepancy comes when a machine is out of service for a maintenance reason during a nonoperating shift for that piece of equipment. Is it down?

Get materials costs
Materials used and their costs are helpful for keeping inventory up to date and charging materials to each piece of equipment.

Having the material identified by its tracking number in the inventory control system (whether or not this is a module in the CMMS) is essential for documenting proper part usage and tracking and bill of material building. This is one of the first areas of potential interface when the nonproduct material is maintained by an organization/system outside of maintenance.

This type of interface would allow documentation of the part number, and then the cost could be brought over from the parts inventory control system if it is not in the CMMS.

This cost is especially important in light of the fact that many materials costs can exceed labor costs significantly, and both are necessary to properly assess the maintenance requirements and history of a given piece of equipment.

Assess the repair, use codes
Although additional comments about a job may not be entered, it is a good idea to get the mechanic’s assessment of the repair at least to the point where the repair is identified as “temporary” or not. A temporary repair is most often done to get the operation through the shift, and subsequently a relatively permanent repair is completed at a more convenient time.

For each repair, an assessment should be provided by the mechanic. This comment may indicate the repair was temporary, and if so, it should be followed by a recommendation indicating what needs to be done to make it more permanent.

Last, and not least, is some type of repair cause code. The reason a code is used instead of a description is to begin to categorize the repairs for easier analysis. Once statistical analysis is completed, the more significant individual items can be further analyzed by review and evaluation of their details.

Codes in the CMMS represent a great potential advantage for accelerating recording of repairs, as well as their analysis, but can be extremely dangerous if overused. There should not be so many code fields, and/or codes per field, that it requires a separate page to list the possibilities, and someone must read through them for each repair.

For example, a CMMS may contain fields for problem, failure, cause, root cause, solution, or action verb/noun combinations, etc. For each field, there may be 40 or 50 possibilities, and probably more. This just makes it take longer to complete the work order and often leads to more specific codes being added, thus making the recording process even more complicated.

An important aspect to documenting work is simplifying the process. Use codes that are broad in nature, and relevant to the process environment wherever possible. An invaluable source of these codes is a review of historical activities that probably exist on manual records. Causes can be derived from work descriptions entered even if they are only to categorize parts problems from electrical, leaks, adjustments, etc.

Multi-line work order
The basic work order example is considered the workhorse for capturing planned and unplanned work, and provides areas to document extensions when the work is carried over for virtually any reason (scheduling, availability of materials, etc.).

A multi-line work order that mechanics would have at the beginning of their shift is typically used to capture work that is often unplanned and would be completed during the shift. Items carried over are typically referenced from here and transferred to the basic work order form for future execution.

The better and more consistently recording of repair activities is done, the greater potential for yielding greater and more specific information about the operation, in both qualitative and quantitative terms. The more quickly this can be done, the sooner actual activities will be reported into the CMMS, and a useful history will be built that can be more easily analyzed through statistical methods. MT

Christopher N. Winston is an independent professional in the Detroit, MI, area contracted to HSB Reliability Systems Group, 1701 N. Beauregard St., Alexandria, VA 22311. He has more than 18 years’ CMMS implementation and business system analysis experience and has a bachelor of science degree in mechanical engineering.

Work Order Data Fields

  • Machine
  • Problem description
  • Emergency flag (Yes/No)
  • Skill/trade
  • Work (done) description
  • Parts requested
  • Wrench or work time
  • Action codes (problem, cause, downtime, failure mode, solution, reason not done, etc.)
  • Downtime
  • Repair assessment
  • (i.e., temporary?)
  • Originator. requester
  • Job number
  • Budget/actual cost
  • Multiple authorizations
  • Job status
  • Parts/material usage
  • Project number
  • Safety/special requirements (JSA, scaffold, formed pit, etc.)
  • Permits (hot work, confined space entry, etc.)

back to article

Continue Reading →