Archive | Reliability Engineering


7:49 pm
April 11, 2016
Print Friendly

Choose The Right Emergency Stop

With today’s number of customizable available options, selecting the right emergency stop (e-stop) for process equipment can be a daunting task, but it’s critical for overall safety. According to human-machine-interface (HMI) experts at EAO Corp., Shelton, CT, fitting equipment with a highly functional e-stop in line with the basic application design concept, versus a lesser-certified safety switch, is key. MT

Determine if your application requires Category 0 or Category 1 shutdown.
This is crucial in the placement, size, electrical specifications, mechanical characteristics, color, and number of required e-stops.

Research international and North American standards, performance ratings, and codes that govern your application (see table below).
Each industry has unique regulatory standards. These restrictions may govern factors such as size, color, and contact terminals.

Select the product.
Choose your e-stop based on design factors to meet industry demands and international compliance. Proper selection involves understanding market and application requirements, environmental conditions, and electrical demands.

Vendors often provide a variety of unique features to enhance your e-stop and complete virtually any application. It’s important to research these additions as some accessories may be mandated by industry standards.

Consult an expert.
Many suppliers offer consultative services to assist customers throughout the process of selecting and integrating their HMI needs, from individual e-stops to completely designed and produced ‘mixed technology’ solutions.

For more information on e-stops and other HMI components and systems, visit

Screen Shot 2016-04-11 at 2.46.56 PM

Screen Shot 2016-04-11 at 2.47.23 PM


8:26 pm
February 8, 2016
Print Friendly

Calculate the True Cost of Unreliability


The economic impact on manufacturers that haven’t bought into the idea of failure-free operation is easy to determine and, more important, enormous.

By Al Poling, CMRP

Although experts have espoused the virtues of equipment reliability for decades, countless manufacturing operations still suffer significant and unnecessary downtime due to equipment failure. Apparently these manufacturers haven’t bought into the benefits of failure-free operation. What will it take to get them to accept the time-proven benefits of reliability? Perhaps they will never be convinced by examples of other manufacturing operations, believing that they are somehow unique. If the benefits derived through reliable operation won’t lead them to change, perhaps an examination of the true cost of unreliability will.

The big picture

Businesses operate under the basic equation of: profit = sales minus cost. Although equipment failures affect both sides of the equation, this article focuses on the impact of unreliability on maintenance costs—typically the largest fixed costs in a process-industry manufacturing facility. End users can apply the following calculations from a hypothetical plant to their own business and develop an order-of-magnitude estimate of the impact of unreliability on maintenance costs at their site(s).

For purposes of these calculations, let’s assume our hypothetical operation has a plant replacement value (PRV) of US$1 billion and a resident maintenance workforce of 150 craft-level employees.

Maintenance-labor cost

Maintenance costs in a plant include those for skilled craft labor to repair and restore equipment to good operating condition following a failure. The current average U.S. Gulf Coast, fully loaded, maintenance skilled-craft wage rate is approximately $45/hr. Using the U.S. standard of 2,080 hr./man-year, with an estimated overtime rate of 5%, the cost/year/skilled craft worker is approximately $100,000. Consequently, 150 skilled craft workers will cost approximately $15 million/year. In terms of man-hours, including overtime, the number is about 300,000 man-hour/year.

Benchmarking studies have confirmed that best-performing plants average 1% downtime due to unreliability/year, while average performers suffer 7% downtime due to unreliability. These numbers include annualized downtime for turnarounds. To calculate the annualized downtime for turnarounds, simply take the total downtime for your last turnaround and divide it by the number of years between turnarounds. A 30-day turnaround taken every three years equals 10 days of annualized downtime due to the turnaround alone.

Best performers average less than four days of downtime/year due to unreliability, including annualized downtime for turnarounds. Average performers endure more than 25 days of downtime/year due to unreliability.

There is a direct correlation between the number of equipment failures and the number of craft workers required to effect repairs. In theory, the average-performing manufacturer would have seven times more maintenance craft workers than the best performer. That, however, is in theory only. Achieving and sustaining failure-free operation requires truly skilled craft workers and, even they have to focus their efforts on failure avoidance instead of repair.

Work sampling studies have revealed that the efficiency of maintenance-craft workers is extremely high in highly reliable operations, as their work is well defined and scheduled in advance. In comparison to reactive maintenance, schedule interruptions happen on an exception basis in a failure-free environment. Instead of seven-times as many skilled craft workers needed in an average-performing plant, we’ll estimate (conservatively) that the number is half that (or three and a half times).

With regard to maintenance labor, the cost of unreliability is the difference between the number and associated cost of skilled craft workers required to support a reliable operation versus an unreliable one. Assuming that the aforementioned 150 such workers, costing $15 million/year, are working in an operation suffering average unreliability, the additional maintenance labor costs are 70% of the total—$10.5 million/year. In this example the true cost of unreliability in skilled craft workers is an additional 105 such workers costing an additional $10.5 million/year, whereas a reliable operation would only need 45 skilled craft workers. This calculation does not factor in the elimination of overtime that would be found in a failure-free environment. While equipment still fails, the impending failure is discerned well in advance so repairs can be made during normal maintenance work hours.

Screen Shot 2016-02-08 at 1.14.13 PM

Maintenance-material cost

Repair material is another major element of maintenance costs. Unfortunately, the ratio of maintenance-material cost to maintenance-labor cost varies by region due to differences in the prevailing wage and the availability (or lack) of repair materials. Equipment’s material of construction also factors into material-to-labor ratios.Maintenance-material cost

A reasonable hypothesis is to use a one-to-one ratio of maintenance material to maintenance labor. Applying this ratio to our hypothetical plant with 150 maintenance craft workers at a cost of $10.5 million/year means the site spends another $15 million on maintenance-repair material annually. Using the same approximation as we used with maintenance labor, 70% of these material costs would be avoidable if the plant were operating in a failure-free mode. In monetary terms, this represents yet another $10.5 million attributable to unreliability.

Screen Shot 2016-02-08 at 1.15.16 PM

Equipment-replacement cost

In consequential failures, equipment cannot be repaired and, thus, must be replaced. Benchmarking studies have shown that manufacturing operations running their equipment to failure spend exponentially more than best performers spend on maintenance capital, i.e., equipment replacement.Equipment-
replacement cost

Manufacturers that take care of their equipment and embrace failure-free operation derive extraordinary service-life from that equipment. Conversely, those who operate in a run-to-failure mode wear out equipment quickly.

Run-to-failure is a particularly costly maintenance strategy. Best performers will spend 1% or less of their PRV each year to replace equipment that has reached the end of its useful life. In contrast, average performers will spend 3% to 5% annually on replacement equipment. Determining the true cost of unreliability, therefore, requires factoring in the price tag for equipment replacement.

A reasonable assumption is that best performers spend 0.5% of PRV and average performers spend 4% of PRV on annual equipment replacement. That means, based on our hypothetical plant, with a PRV of US$1 billion, a best performer would be spending approximately $5 million annually on equipment replacement due to unreliability, and an average performer would be spending approximately $40 million annually. Thus, in our hypothetical example, the true cost of unreliability reflects an additional $35 million/year for equipment replacement.

Screen Shot 2016-02-08 at 1.16.14 PM

Additional costs

Another significant maintenance cost involves maintenance administration and staff. Granted, there is not a direct correlation between the number of maintenance salaried personnel and maintenance wage personnel. Still, there are common ratios of salaried to hourly wage personnel—and they differ dramatically between better and poorer performers. Merely reducing numbers of skilled craft workers, though, doesn’t translate to an equal percentage reduction in staff. For example, in average-performing operations, there may be more maintenance supervisors, but the ratio of craft to supervisor positions is higher. In best-performing operations, the ratio of maintenance supervisors to craft personnel is lower. This situation results from recognition of the value of maintenance supervisors as facilitators who can greatly enhance the efficiency of a maintenance workforce.Additional costs

A similar condition exists with maintenance planners. Poor performers have larger numbers of skilled craft workers/maintenance planners—with some of the worst performers in the range of 60:1. An individual maintenance planner can’t effectively serve such a large number of skilled craft workers—and is likely operating in a reactive mode, expediting materials or performing other duties required to support reactive maintenance.

In contrast, the ratio of skilled craft workers to planners at a best-performing site is more apt to be in the 20:1 range. With this type of ratio, a planner can prepare detailed job plans, procure materials, and efficiently perform other planning functions. The net result is that there will be no appreciable administration and staff cost savings in moving from a run-to-failure to failure-free environment. This is due to changes in ratios of craft to staff positions and the redeployment of some personnel from reactive work to proactive functions that are needed to support failure-free operations.

Additional maintenance costs affected by unreliability involve facilities, including offices, shops, break rooms, restrooms, and related infrastructure costs. Rolling-stock requirements can also be affected, as can various support staff outside of the maintenance function, such as human resources, training, and safety. Generally speaking, though, there is no substantive reduction in administration, staffing, and related cost categories as a result of reducing and/or eliminating unreliability.

The bottom line

As discussed here (and shown in the accompanying sidebar), the true cost of unreliability is enormous. By adding up the previously noted line-item maintenance costs for our hypothetical plant, we can see that unreliability amounted to a staggering $56 million (or 80%) of unnecessary spending for maintenance labor, materials, and equipment replacement costs.

Given this type of economic impact of unreliability, why don’t all manufacturing operations transition from failure-prone to failure-free environments? Unfortunately, there’s no single root cause. Many factors contribute to the situation. Among them:

The constant distraction of equipment failures is akin to putting out fires. Consequently, everyone is so focused on reacting that they believe they can’t take the time to implement measures to avoid the failure. A fairly simple solution here would be to devote a small number of employees to developing and implementing plans to avoid equipment failures. For this approach to be effective, however, those proactive resources can’t be dragged back into firefighting mode. Otherwise, nothing will improve.

Poorer-performing operations rarely have a strategic plan or, if they do, it’s typically mere window-dressing written to satisfy corporate management. Without a well-thought-out vision or mission, plant personnel will naturally accept the status quo as the normal mode of operation.

There is a lack of leadership in poorer-performing manufacturing operations. Either the current management lacks the requisite leadership skills or there are no incentives positive or negative to change the status quo. Humans respond to stimulus. If there are no consequences for being unreliable, nothing will change. Conversely, if there are no rewards for becoming reliable, or if the existing reward system somehow perversely rewards unreliable behavior, nothing will change. Better-performing manufacturing operations typically share the benefits of failure-free operation with all employees. As a result, everybody has a stake in improved reliability. 

While this discussion used a hypothetical manufacturing site to illustrate the true cost of unreliability, the same ratios can be applied to obtain an order-of-magnitude estimate of the cost of unreliability for your operations. Remember, though, that someone needs to take the initiative before improvement can begin. MT

Al Poling has more than 35 years of reliability and maintenance experience in the process industries, many of them spent in engineering and corporate-leadership roles with several companies. A Certified Maintenance and Reliability Professional (CMRP) through the Society for Maintenance and Reliability Professionals (SMRP), he served as technical director of the organization from 2008 to 2010. Prior to starting his own consultancy, Poling served as the project manager for Dallas-based Solomon Associates’ International Study of Plant Reliability and Maintenance (RAM) Effectiveness, during which he worked with clients to identify performance improvement opportunities through benchmarking. For more information, contact

Unreliability: A Very Expensive Proposition

The three largest maintenance-cost categories affected by unreliability are maintenance labor, maintenance material, and maintenance capital, i.e., equipment replacement. In our hypothetical manufacturing operation with a plant replacement value (PRV) of US$1 billion and resident workforce of 150 skilled craft workers, we can calculate the cost of unreliability individually and collectively as follows:

Screen Shot 2016-02-08 at 1.17.42 PM

$70,000,000 = Total current annual maintenance cost for labor, material, and maintenance capital, i.e., equipment replacement.

80% = Percentage of the total maintenance labor, maintenance material, and maintenance capital spent unnecessarily due to unreliability.

At first glance, these figures may appear unrealistic. They’re not. The harsh reality is that unreliable operation is very expensive for any manufacturer, regardless of size.

learnmore“The Business Case for Asset Reliability”

“Choose Reliability or Cost Control”

“The Risk Is In The Management”

“Reliability Business Case: Conversion Costs”


5:27 am
April 1, 2015
Print Friendly

Are You Ready? Preparing For Reliability Improvement


Improvement initiatives backed by effective practices and policies can enhance profitability. Careful preparation is key, says this industry veteran.

By Wayne Vaughn, CMRP, PCA Consulting

It’s a fact of life across industry: Many organizations need to upgrade their basic maintenance practices. Even when good ones are in place—such as preventive and predictive programs, effective planning, scheduling and MRO-purchasing/storeroom policies—some plants still need to strengthen elements like equipment availability and cost-reduction efforts. Improving availability and reducing costs (from both maintenance and operational standpoints) can best be approached by implementing an equipment-reliability-improvement program. While these efforts may require significant time and resources to implement, they can generate enormous returns.

Getting the biggest reliability-improvement bang for your buck calls for careful, upfront planning: Success involves more than simply hiring engineers and telling them to go out and “fix reliability.”

Start with your data
While the fact that data drives reliability improvement may seem obvious, it’s not uncommon for companies to either 1) not have data; or 2) have data that’s not easily mined from their CMMS/EAM systems. The following activities are crucial to undertake prior to embarking on a reliability-improvement initiative:

  • Ensure you have best-practice work processes that collect data and appropriate policies in place.
  • Ensure that you classify work in a way that it can be mined to understand problems.
  • Identify key performance indicators (KPIs) that can point to potential opportunities.
  • Establish KPIs to gauge how well your preventive (PM) and predictive (PdM) programs perform.

Best-practice work processes
The work-order work process is the most fundamental element of successful maintenance. All work must be captured. This must include labor, materials, contractors and other expenses that go into maintaining plant equipment. This information must be accurate, and the type of work being done must be coded carefully.

A second key area is ensuring that all work goes through a planning and scheduling process so that needed work is agreed to by operations and executed systematically. This means all PM and PdM work will go through this process. (KPIs and effective management-review procedures must be in place to make sure these important processes are accomplished effectively.) While an entire book could be written about these basics, this article focuses on PM and PdM work orders. Companies spend time to write PM instructions and create PM programs, but often don’t manage the process effectively.

It is important that PM efforts find and repair things that will prevent operational outages or other emergency situations. A good way to do this is to ensure that when something is found, a work order is created to do that corrective work. Too often, companies allow technicians to repair found defects and charge their time and materials to PM work orders. This is a big mistake, and indicative of an area where a policy must be in place. Although this can seem a pragmatic way to perform work that might involve only a few minutes of a technician’s time, it’s one that could potentially mask a problem.

A good policy is to establish a timeframe for such work, say 15 minutes. Work that can’t be performed within the specified time would require a follow-up work order. It’s also important that the follow-up work order be appropriately coded as work identified by a PM activity. An additional recommended policy element is that, regardless of time, if a part is required for the repair, a corrective-action work order must be created to capture the labor and materials used.

Classifying work
There are many ways to classify or “type” the forms or reasons behind why any given work is needed. Each company has to determine what makes sense for its particular situation. For starters, consider the following two methods:


Whatever designations or codes your operation chooses, they must be accompanied by clear definitions on their use, including by whom and under what circumstance. This leads to granularity and standardization of data that eventually provides useful diagnostic information.

KPIs for opportunities
There are several KPIs that can indicate opportunities. An excellent one is a report that shows the mean time between failure (MTBF) for equipment. Again, a policy may be established that if a piece of equipment falls below a selected value, that equipment must receive a formal review. It may be that the equipment will be placed in the capital replacement plan, put into a rebuild schedule, placed on a list of equipment that will undergo an improvement process or simply be lived with as is. This review must be done in coordination with operations.

Often, a piece of equipment that does not appear on the MTBF list may be identified by operations as needing improvement. This may be because of the schedule production load, lengthy repair time when a failure occurs, the lack of a back-up process, support of an important customer or the high cost of replacement processes. Another good KPI to create is a report of high-maintenance-cost equipment. These may offer significant cost-reduction opportunities.

Once a list of opportunities has been identified, some are selected for equipment improvement. A joint, maintenance-and-operations, equipment-improvement process can be effective, and should be considered. Such groups should be facilitated by a reliability engineering expert. This will leverage engineering resources and create greater buy-in when solutions are presented for approval.

Data must then be gathered from the selected equipment to support the problem-solving process. This is where good coding and the creation of corrective-action work orders provides a substantial payback. If corrective labor and materials are charged to PMs when defects are found, data-mining becomes very difficult. That’s because with possibly thousands of PMs done in a facility over the course of a year, perhaps only hundreds capture faults. Since those hundreds will be hard to find, the data will need to be sorted and classified manually—a frustrating process for any site.

KPIs for PM effectiveness
An excellent KPI for monitoring PM effectiveness is to create a report and chart of how many faults are detected per 100 PM activities. An effective program will likely find between 5 and 20 faults for each 100 PMs performed. The report should show PMs that do not detect any faults, and also highlight those PMs with a large number of findings. Too few may indicate that the PM frequency is too often or that the PM tasks are the wrong ones. Too many may indicate that PM frequency is not often enough or that there is a problem that needs a reliability-engineering review.

It is important to regularly review PMs to keep them current. Annually is a good practice. Not only do technicians become aware of things that need to be checked over time, they see things that do not need to be checked. Changing operational conditions can also induce different failure modes or necessitate a different PM.

Regular reviews can also help determine if a PM can be moved to a PdM, add more objectivity to the work process, and identify the most appropriately skilled person(s) to perform the work. A KPI here might be the average age of PMs since their last review, with special reports on PMs that exceed the company’s policy on review frequency.

Of course MTBF and mean time to repair (MTTR) will also indicate how effectively a PM effort is being planned and performed. However, since the absolute value of these measures may not be the best indicator, look carefully at the trend line to verify continuous improvement.

The bottom line
Implementing a successful equipment-reliability-improvement program may require a significant investment in terms of time and resources, but payoff can be enormous: Incorporating best practices, good practices and well-defined policies, these initiatives help companies become and remain profitable. Don’t let inadequate preparation come back to haunt your organization’s efforts. MT

Wayne Vaughn recently retired as Director of Maintenance for Harley-Davidson. He currently is a Senior Reliability Consultant with Performance Consulting Associates, Inc. (PCA), of Duluth, GA. Contact him at


7:29 pm
December 1, 2014
Print Friendly

A Contrarian View: What Being Proactive Really Means


By Heinz Bloch, P.E.

A company I choose to call NTBO (“Not-To-Be-Offended”) will long remember a string of expensive pump failures that jeopardized the continuity of boiler feed water supplied to its power generation turbines. When all components were carefully measured, it was determined that the oil-slinger concentricity exceeded maximum allowable by a factor of 30. Oil slingers (cone-shaped collars on revolving shafts designed to return passing oil outward to the point of origin) are critically important components, but I don’t know if NTBO implemented the specification and inspection routines needed to capitalize on this costly experience.

Capitalizing on an event means not doing the same dumb things all over again. In NTBO’s case, bringing 30- or 40-year-old rotating machinery back to original tolerances should be one of this company’s priority tasks. In fact, the next shutdown might include retrofitting fluid machinery with more efficient blades, impellers, vanes, improved lube delivery, superior filtration and the like.

The time to be thoroughly proactive is NOW. Today is the best time to communicate with competent upgrade shops; or to write an oil-slinger-ring specification requiring stress-relieving (annealing) before finish-machining; or to find out if better reciprocating-compressor valves are available and cost-justified; or to determine if excessive pipe stress on the discharge nozzle of P-207 warrants pre-fabricating a spool piece for insertion between points A and B during the plant shutdown scheduled for later in the year.

Repeat failures in a plant should be thought of as management failures, plain and simple. Someone in management needs to hire, nurture or groom people who know that the above activities are among the hundreds of proactive tasks that fit under various subheadings in role statements for responsible reliability professionals. One such task is to keep current a list of actions that could be carried out if an unanticipated downtime event were to occur.

Say, for instance, the reliability manager tells you that one of your plant’s process units has just shut down because of a pipe rupture. He tells his engineers and technicians that unit restart is scheduled in 30 hours. A proactive reliability employee might, for example, immediately look at his/her two-day-opportunity list that prioritizes coupling replacement in P-207 B, followed by the addition of 24 pre-fabricated hydraulic tubing lines to 12 electric motors on pre-defined process pumps already hooked up to the plant-wide oil mist system.

But it’s a two-way street. A competent manager creates role statements and training plans for his employees. For their part, the reliability professionals reporting to this manager make sure they arrive at work knowing what tasks they will perform (on a “normal” day) in harmony with their defined roles. As they then return home at the end of the day, these professionals should ask themselves if they have added value and if—should they ever leave their current jobs—the present manager or employer would notice they were no longer there.

Granted, if you work for a multi-billion-dollar enterprise, the corporation’s end-of-year profit statement probably wouldn’t be directly affected by your presence or departure. But your group or section or department should notice if or when you’re no longer around.

So make a difference. Whether you’re a manager or a junior contributor, strive to be above average. Be self-motivated and proactive. Offer facts that comport with common sense and the laws of science. As I’ve mentioned before, anecdotes add nothing but wasted time. Factual information translated into cost-justified actions adds value. MT


12:56 pm
November 4, 2014
Print Friendly

The Reliability-Driven Maintenance Organization


Getting there requires taking a close look at weaknesses and taking measurable steps to correct them. Here, a respected industry expert shares tips on how to become a high-performing operation.

By Christer Idhammar

Any plant maintenance department wants to be known as a cost-effective organization. For our purposes, “cost effective” means maintenance without waste: where waste is the gap between how good the organization is and how good it can become. Waste includes poor safety, losses in quality and high costs.

In a poorly performing maintenance organization, the gap between the real and the ideal world tends to increase over time because it reacts to problems instead of preventing them. As a result, there isn’t time to take measures that will break this reactive work cycle. Even in periods when equipment is operating well and no panic-work comes up, the maintenance organization tends to slow down and wait for the next problem. This creates a culture where maintenance personnel think it is useless to start other work because they will be interrupted with real, or often perceived, urgent work. So even between reactive work, maintenance personnel accomplish very little.

From an operations standpoint, this situation can be comforting because it means maintenance can deal with equipment problems on short notice. It is far easier for operations to call maintenance to fix a problem when it occurs than to write a work request to correct an anticipated problem. This type of relationship typically occurs when operations does not feel responsible for the cost of maintenance. Even if most work is requested by operations, the maintenance manager is in the hot seat if budgets are overrun.

A high-performing maintenance organization is far different. It is founded on anticipating what will happen in the future and planning and scheduling corrective actions in advance. It is not only DO-oriented, it is THINK-oriented. It is an organization that continuously designs out problems and improves.

  • Correcting attitudes and cultures
  • To develop a high-performing maintenance organization, the first steps are:
  • To fully understand how good the organization is currently, and locate the gaps where improvement can occur
  • To develop and commit to an action plan to close the gaps, including clearly defined roles and responsibilities
  • To change work attitudes and culture.

In some plants, the typical first step toward improving maintenance performance is to purchase a new computerized maintenance management system (CMMS) or instruments for predictive maintenance. They may also implement fragmented improvement initiatives using Reliability Centered Maintenance (RCM), 5S or similar tools. And while these are good tools, they often fail because they are implemented before an organization does the basics well or changes the work culture to support their efficient use.

Bill Gates addressed succinctly the potential value of technology-based tools when he said, “The first rule of any technology used in a business is that automation applied to an efficient, well-defined operation will magnify the efficiency. The second is that automation applied to an inefficient or poorly defined operation will magnify the inefficiency.”

Measuring results

To measure the results of maintenance activities, plants traditionally view good maintenance in terms of low costs. With few exceptions, this cost is always considered too high. This view of maintenance stems from an old attitude, which is that maintenance only costs money and does not contribute to productivity.

Plants must change the way they measure maintenance results. Analysis of production advancements over the past 35 years reveals that many process industries have more than tripled their production output. During this time, the number of operators has decreased about 30%, while the number of maintenance crafts people has decreased about 6%. This growth in productivity can be traced to increased automation and more reliable equipment—and it’s not necessarily a result of efficient maintenance.

A common way plant maintenance departments measure their effectiveness is to compare maintenance costs with other plants. This is the wrong thing to do, because those who are not the top performer in the comparison will waste time explaining why the figures are wrong instead of focusing on how to improve. We also know that different accounting principles can make a difference of up to 100% in what is considered a maintenance cost, capital investment or operations expense.

The focus must instead be on learning about activities, technology and processes that drive reliability, safety and cost. Better planning and scheduling of maintenance work correlates directly to high manufacturing reliability, better safety and lower costs. It is also important to understand that predictive maintenance alone does not prevent anything. It only gives information on failures that are developing toward a breakdown. But with this information, plants can “anticipate” the future and plan and schedule corrective maintenance actions.

In the best case, plants can schedule the corrective action to be executed in a maintenance “window.” This is an opportunity that presents itself when equipment is down for reasons other than planned and scheduled maintenance, such as changing belts, unscheduled shutdowns, cleaning and other tasks. The link between predictive maintenance and planning and scheduling of work is an essential basic reliability and maintenance process. Executed with precision, it will increase quality product throughput, improve safety and reduce costs.

Performance indicators*

The right thing to do is benchmark the maintenance department and measure continuous improvement internally. If comparing with other organizations, plants should learn what processes best performers use to drive improved reliability and maintenance costs, and how they execute them well.

To continuously improve execution of essential processes, it’s necessary implement performance indicators as close to the action as possible. This will motivate and trigger actions that will influence the overall performance.

In a reactive organization, break-in work must be reduced. During transition to an organization in control, planning and scheduling quality can be an indicator. Trends in backlog, overtime and contractor hours can also be meaningful indicators when the organization is starting to gain control. When an organization gains control over its maintenance strategies, it becomes important to measure Root Cause Implementations completed and problems eliminated. To do this properly, clear definitions on what’s measured are necessary.

In a study of 38 process lines, the only strong correlation between low and high performers is how well they planned and scheduled maintenance and operations work. All machines that planned and scheduled more than 50% of work had measured Reliability (as % Quality x % Time, with Time based on 8760 hours available per year) of over 85%. Top performers that planned and scheduled between 75% and 90% of all work achieved a Reliability of 92–96%.

Work measurements

Plants that use hands-on tools or other types of work measurements as a way to determine maintenance efficiency are doing the wrong thing. Here’s why:

  • They do not promote cooperation between management and crafts people.
  • They do not consider those who may be busy doing the right thing. For example, in the work-measurement system, thinking time and trouble-shooting time is considered hand-off-tools and, thus, non-productive.
  • Almost all time identified as non-productive by work measurement is typically attributed to a lack of work management and planning and/or scheduling. In fact, it is a result of poor management.
  • When equipment is operating, it is not always true that maintenance people who are busy with hands-on tools are productive. In fact, they can be busy doing the wrong things or only pretending to be busy.
  • In a scheduled shutdown, it is true that people are more productive if they can work on planned and scheduled work without interruptions. Again, only good planning and scheduling—good management—can accomplish this.

Partnering in reliability

To achieve results-oriented reliability and maintenance, plants must realize that production is a partnership between operations, maintenance, stores and engineering. The traditional view is that maintenance is a service organization; operations is the internal customer of maintenance; stores support maintenance; and engineering is an isolated “happy island.” The right thing to do is to view these sectors as partners in a joint venture to reliably produce quality products.

In this partnership, maintenance will deliver equipment reliability; operations will deliver production process reliability; stores will continue to support maintenance; and engineering will support both maintenance and operations, as well as practice life-cycle costs (LCC) or asset management in its design, specification and selection procedures for new equipment. This means that equipment selection will be based on the cost to buy and cost to own. The concept includes reliability and maintainability analyses.

Recognition is important

Most maintenance organizations can verify that they receive recognition when they fix a major breakdown, but seldom hear anything when they prevent a breakdown. While there is nothing wrong with recognizing good work in a breakdown situation, if this is the only time maintenance people are recognized, it sends the wrong message. This type of recognition fosters a culture of maintenance heroes or “Maintenance Tarzans.” They become action-oriented, which can make it difficult for them to transition to more planned, scheduled and organized maintenance work.

Overtime compensation can motivate, especially considering that breakdowns are about 74% more likely to occur when the full crew is off site. However, this is changing as the Y-generation enters the job market—a group that values time off more than higher pay. Plants need to remember that poor maintenance is visible and good maintenance is invisible, because it is less action-oriented. It is always right for plants to recognize implemented improvements, failure avoidance, planning and scheduling performance and overall reliability.

Performance-improving tips

The following strategies can help develop a high-performing organization:

Work management and planning & scheduling: Most frontline supervisors schedule work to the people they have available. The right thing to do is schedule work that must be done, prioritize it based on risk and what is best for the business, then schedule people to execute this work.

Time estimates are almost always based on four or eight-hour time segments. In many cases, no fewer than two people are assigned to each job. This provides the supervisor a buffer of resources he or she can use for jobs added to the schedule on short notice. In this setup, scheduling-compliance can wrongly appear to be high. Therefore, it’s better to schedule work with real time estimates and include problem solving, or thinking time, as part of all work done by crafts people.

In a high-performing maintenance organization, 20% of all effort hours should be used on problem elimination or continuous improvement that will “design out maintenance problems.”

Anticipation: Most plants have morning meetings to discuss what happened the previous day and night, and what is planned for the current day. High- performing maintenance organizations will spend most of this meeting on what will happen tomorrow and next week. Though it sounds unrealistic, this can be done because very few problems occur and little time needs to be spent on yesterday’s problems. The focus should be on future activities.

Following the same principle, the organization should work on a monthly or weekly forecast and finalize the next day’s schedule about four hours before the end of each day. The schedule should be communicated to crafts people before they leave for the day so they can prepare for the next day’s work.

Flexibility: The 12- to 14-person craft-line-oriented maintenance organization is, or must soon be, a thing of the past. Craft lines should not limit work flexibility—only work skills to do a job safely should be a constraint. This will often require changes in union agreements and a focused training program for crafts people. Experience indicates that if management presents a clear plan, it will be well received.

Lost-production analyses: These types of analyses often reflect lost production only by department. Such a procedure does not build a partnership between departments, nor does it solve problems.

The better approach is to define, solve and classify a problem by department, equipment and type of failure after analyses are complete, then follow up on how to solve the problem in the future.

Storeroom closure: Many maintenance organizations waste up to 30% percent of their time walking to the store(s) and searching for parts. Plants should plan and schedule maintenance activities so stores can prepare and deliver parts where and when they are needed. This will require a Bill Of Material (BOM) populated to 95%+ accuracy.

Technical documentation: All technical and economic information about equipment should be readily available. The equipment, loop or circuit number should be the key to this information. At a minimum, all parts kept in stores, or not kept in stores, should be tied to equipment identification in the BOMs. The lack of good and reliable documentation is one of the reasons why most maintenance planners do not have time to plan.

Maintenance shift coverage: Most three-shift plants have maintenance resources on the late shifts. Some have a maintenance supervisor on each shift. Ideally, a plant should operate without maintenance people on the night or evening shift. This is possible only if maintenance believes the plant can operate 16 hours without major maintenance problems. If this is not possible, the plant should do something about it.

The above issues are select examples of actions and cultures that will promote high-performing maintenance. It is important that a plant maintenance organization seriously examine how good it truly is, determine if it is promoting the right things and if improvements are needed. Only then can a maintenance organization proceed to make the changes needed to become as good as it can be. MT

Christer Idhammar is the Founder of IDCON, Inc. (


3:55 pm
September 1, 2014
Print Friendly

Viewpoint: Reliability Advancements Over a Generation

1409viewpointBy Jim Cahill, Emerson Process Management

Rotating equipment has always been a source of unplanned downtime within operations. I know that from experience. I began my professional career in the 1980s working as a systems engineer on offshore oil & gas platforms in the Gulf of Mexico. The rotating equipment we dealt with included turbine and reciprocating generators, turbine and motor-driven compressors, various types of pumps and more.

Although we had control and safety systems at the time, we were just starting to look at machinery-protection systems. Vibration sensors had not yet been fully ruggedized for the wet, salty conditions of the Gulf. In the 30 years since, technologies—and their ruggedness—have advanced considerably.

Triaxial accelerometers can now measure vibration in the x, y and z axes from a single-mount location. Installation has become much easier with the wireless vibration transmitters now available. (That capability alone would have opened up many more pieces of rotating equipment to continuous monitoring had we been able to leverage it “back in the day.”)

The diagnostics that interpret the incoming vibration and temperature signals have also substantially improved. Some specialized diagnostics are available to spot pitting in bearings to provide early warning of failures. Advanced notification means maintenance activities can be planned instead of reacted to.

Other advanced diagnostics help spot resonant frequencies, misalignments, machinery impacts and lubrication issues, to name a few. And there are more ways to get these diagnostics into the right hands than ever before. Portable vibration analyzers make these analyses available to the technician performing route-based maintenance. They also connect with computerized maintenance management systems (CMMS) to schedule work orders as issues are identified.

Condition-monitoring systems also were once an island unto themselves. They now connect more closely with plant control systems. Similar to the dashboard warning lights in your automobile, these monitoring systems can provide early warning to operators in the control room. This provides a path of communications between the operators and maintenance teams about potential problems that require further investigation.

As technologies have advanced, the need for specialization has increased. Remote communications access is important to bridge the time and distance between plants and experts. These experts may work for the process manufacturer, the reliability technologies and services provider or be independent contractors. Condition-monitoring equipment and portable analyzers, though, have become so sophisticated that they can provide information from remote points to experts wherever they are located. Many companies are developing integrated strategies to connect experts to all of their plant sites, thus providing continuous expertise around the clock. Through early detection, analysis, recommendations and action, unscheduled downtime can be greatly reduced.

Remote access brings its own set of issues around cybersecurity. The same set of best practices around control and instrumentation system security need to be applied to reliability-based systems. Much like a safety program, the process starts with having a security program in place. Also needed are a champion and high-level support to get the organization engaged in the importance of following secure practices and continuously finding ways to mitigate security risks.

Changes in technology over a generation have been far-reaching in the areas of reliability and safe operations. They’ve also highlighted the need for specialization and the ability to connect experts to plant personnel in the quest to reduce unplanned downtime and increase the overall efficiency of manufacturing processes. Looking forward, I see continued expertise being added into the technologies to provide a clearer, more actionable recommendation set for operations staff to improve the performance of their facilities.  MT


2:32 pm
August 1, 2014
Print Friendly

Uptime: Mainstream 2014 — Shared Visions for Reliability

bob_williamson_thumb_thumb_thumbThe first-ever Mainstream North America conference was held in Austin, TX, in June. It was organized by the Eventful Group, well known for Mainstream conferences in Australia and New Zealand. They’ll present a Mainstream conference in South Africa later this year.

The theme at the North American venue was “Asset Management.” Attracting nearly 300 speakers and delegates from mostly “asset intensive” businesses, the event focused on leadership, personnel and work cultures through four conference tracks: Organizational Leadership & People; Asset Reliability; Operations & Supply Chain; and Shutdowns & Turnaround Management.

As the Asset Reliability track leader, I picked up a great deal of compelling insight from the many presentations by respected practitioners in the world of plant operations, maintenance, reliability and more. Here, I share some of what I learned in Austin, along with my personal reflections.

The terminology we use

Maintenance-speak is rife with terms intended to communicate concepts quickly among various maintenance and reliability (M&R) practitioners and communities. We’ve also noticed that terminology we use in the M&R arena is rapidly invading the “Asset Management” field of practice. As with most businesses and industries, these terms can be confusing and misinterpreted by outsiders, those on the fringes and even by co-workers.

The basic terms “repair, maintenance, reliability and asset management” are among those used regularly in the expanding body of knowledge covered in this column over the years. The term “reliability” is often used to describe an activity or an equipment condition, and the term “quality” is used to describe an object or performance, as in “that was a quality piece of work.” Without a qualifier, however, neither reliability nor quality is something that can be touched. Technically, that statement of “quality” begs the question: “Was the quality good or bad?” Likewise, the term “reliability” has to be further defined on a similar scale from good or acceptable to bad or unacceptable.

Here are a few terms we discussed and clarified in the closing panel discussion at the Mainstream conference:

  • Repair: Fixing things that have failed
  • Maintenance: Preserving a desired condition
  • Reliability: A functional performance result or goal
  • Asset Management: A systematic business risk-management process for the life-cycle of something of value

While all of these terms relate to each other in the world of M&R, they have entirely different meanings. To use them interchangeably, as is frequently the case, leads to confusion.

‘Maintenance’ and ‘reliability’

I’ve discussed this issue previously, but it bears repeating: When it comes to actions in the workplace, M&R professionals must find ways to de-couple the words “maintenance” and “reliability.” Sure, we perform maintenance; we do maintenance tasks; and there are actions, procedures, plans and schedules that define maintenance work. But, “maintenance” and “reliability” are two different concepts. While maintenance is an action, reliability is a process result or a goal. We don’t really DO reliability, do we?

Several presenters pointed to the many different departments involved in their asset-reliability processes. As Joe Park, the Global Reliability Leader at Novelis, emphasized repeatedly, “Reliability is NOT just maintenance.” (That assertion will be the focus of an upcoming article by Park in MT.) Yet the longer we continue to weave the two terms together in our discussions, projects and in our plans, the more others in our businesses will believe that “maintenance is in charge of reliability.” From inside the world of maintenance, we know that most of the causes of un-reliability are outside the direct control of those who maintain the assets.

Talking about asset reliability

The speakers in the “Asset Reliability” track highlighted various recurring terms, concepts and concerns in describing what works in their companies regarding maintenance, reliability and asset management. Their presentations provided the following powerful takeaways for the rest of us about the way things happen—or should happen—on the road to improving the performance of physical assets (i.e., machines, equipment, facilities):

  • “Reliability is not just Maintenance.” (Thanks, Novelis)
  • Leadership and teamwork
  • People, the human resources for reliability
  • Business strategy
  • Focus/alignment
  • Operator involvement
  • “Playing nicely together” (Thanks, Agrium)
  • Culture change
  • Investing in reliability is not a cost.
  • Data, information, action, feedback
  • Best practices
  • Training and mentors
  • Succession planning
  • Rewards and recognition
  • Human variability: skills, knowledge, actions
  • Building stakeholder relationships
  • Integrated, interdependent, holistic actions
  • Four generations in the workplace
  • Millennials are running away from maintenance.

One term suspiciously absent from this list was “technology” for asset reliability. Although some of the presentations pointed to examples of technology, the critical success factors (if I can call them that) tended to focus on people, organizations and business processes.

The soft stuff counts

Discussions surrounding “Asset Management” and Asset Reliability” are finally getting to the crux of the reliability-improvement challenge: the soft stuff. Not long ago, maintenance and reliability conference sessions were filled with “technology and tools” to use in improving the way maintenance is performed and reliability goals are achieved.

It now appears that the keystone—the solution to eliminating equipment-, machinery- and facilities-related problems—centers around “people,” including how we develop and deploy their skill sets, how we focus them on the right things and how we deal with human variation and errors. To be clear, reliability is more about people than machines and technologies. It is, in fact, a team sport.

Technology alone will not necessarily lead to improved performance and reliability. It takes people to properly select, deploy and use technologies and tools. Likewise, Asset Management Standards—whether local or global in scope—will not lead to asset performance and reliability improvement without proper engagement of the right people with the right skills and knowledge at every step of the journey.

More operators than maintainers

We’ve known for decades that there are generally more operators than maintainers in most physical-asset-intensive businesses. Equipment operators not only outnumber maintenance people, they are typically much closer to the equipment for longer periods of time than are the maintainers.

Miguel Valdez, Corporate Asset Management Lead for MillerCoors, shared what his company does worldwide to capitalize on the availability and positioning of operators. As he put it, “Engaging operators in an ‘Autonomous Maintenance’ approach (from the Total Productive Maintenance process) is a relatively simple concept, but extremely difficult to do. It is a culture change.”

Valdez noted that with Autonomous Maintenance procedures targeting known problem areas in MillerCoors’ bottle- and can-packaging lines, 5 to 10% machine efficiency (ME) was gained. But when Autonomous Maintenance is coupled with Asset Care performed weekly by maintenance technicians, the results have been an unprecedented ME of 95% and higher. “This partnership,” he said, “represents a huge step in the right direction.” According to Valdez, the company’s plants in Europe and Latin America out-performed those in other countries. Resistance to change (culture change) was clearly holding others back.

Traditional divisions of labor, where operators operate machines and maintainers maintain machines, can reflect a seemingly insurmountable barrier. The challenge in culture change is to LEAD it from the perspective of a solid business case for changing the way things
get done. And, from a business perspective in an era of growing maintenance skills shortages, there is a powerful motivation to leverage the experiences of MillerCoors.

Secrets of success

We all look for “secrets of success” to make work easier. John Crowe, former CEO of Buckeye Technologies, offered some big ones in his Mainstream presentation. “The secrets of success in reliability are not new,” he said. “It’s a simple matter of commitment and constancy of purpose. See what’s essential and ignore the rest.”

What a powerful insight from the top of an organization—one that has experienced continuing and sustainable benefits of reliability improvements. Crowe recognized that “good maintenance costs money and downtime, but poor maintenance costs more of both money and downtime.”

Crowe also acknowledged that the journey to reliability improvement will rarely be the same for every organization. Still, the role of leadership is of key importance. As M&R professionals, it is our job to provide meaningful business-case data, proof and evidence of the value of reliability best practices to top levels of leadership. In discussing culture change, he emphasized these six points:

  1. You need a champion, a role model.
  2. Do the right things, and people will follow.
  3. Be willing to change.
  4. Seeking perfection is a must.
  5. The journey never ends.
  6. Learning never ends.

Leading the importance of asset-reliability improvement from the top of the organization is easy when the CEO is connected to the reality of what goes on in the plant. Through data-based decision-making, CEO Crowe came to realize that the “products of good maintenance are machine uptime, production at rate and at quality and safety.”

Mainstream 2015

There were so many excellent sessions and speakers with countless opportunities for professional networking at Mainstream 2014. I’ve just scratched the surface here. In its first time at bat in the North American market, Mainstream hit a home run this year. It’s next event here, Mainstream 2015, is already positioned to be another winner. MT

Robert Williamson, CMRP, CPMM and member of the Institute of Asset Management, is in his fourth decade of focusing on the “people side” of world-class maintenance and reliability in plants and facilities across North America. Contact him at


1:17 pm
June 11, 2014
Print Friendly

Marius Basson Acquires Aladon Network from Bentley Systems


Marius Basson

Marius Basson

Bentley Systems, Inc., and the Aladon Network have announced that Marius Basson, a long-time Reliability-Centered Maintenance (RCM2) practitioner and Aladon Network member, has acquired the Aladon business from Bentley. Bentley Systems will remain a member of the network and continue to be its priority technology partner.

The Aladon Network, which became part of Bentley through the company’s acquisition of Ivara in 2012, is a global community of reliability professionals whose members are certified as “practitioners” and “facilitators” in the delivery of RCM2, Maintenance Task Analysis (MTA2) and Asset Prioritization (AP).

Basson was previously Vice President of Asset Management, at CH2M HILL, and had also served as Director of Reliability at PricewaterhouseCoopers. A mechanical engineer who has worked in the field of reliability and maintenance management for more than 20 years, he has successfully implemented RCM2 in a wide range of industry sectors, including mining transport; water, gas and electric utilities; fabrication; manufacturing; and petrochemicals.