Archive | Reliability Engineering


7:06 pm
July 13, 2016
Print Friendly

Calculate the Impact of Unreliability On Sales

While most acknowledge that unreliable operation is costly at the plant level, the impact, when projected to sales, is enormous.

By Al Poling, CMRP

Generally speaking, manufacturing personnel understand the effect unreliability has on maintenance. Unreliability requires more maintenance resources and materials to repair failed equipment as well as increased maintenance capital spending caused by the need to replace equipment that has reached the end of its useful life. Running equipment to failure causes equipment to reach the end of its useful life prematurely. What many manufacturing personnel do not understand is the effect unreliability has on sales.

Screen Shot 2016-07-13 at 1.35.07 PMMaintenance professionals find it difficult to garner support of corporate executives who do not understand maintenance. However, these same executives have a very clear understanding of profit and loss. If they understand the effect unreliability has on sales and, therefore, profit, they will be much more inclined to support a comprehensive reliability initiative. It might surprise many maintenance professionals to learn that there is a mutual benefit to be derived from reliability: reduced maintenance costs and increased sales and revenue.

To understand this relationship, we must examine the basic business model. All for-profit businesses operate under the same equation:  PROFIT = SALES – COST. Equipment failures affect both sides of this equation.

Calculate the True Cost of Unreliability,” an article published in the February issue of Maintenance Technology examined the impact unreliability has on maintenance costs. In this article we will examine the effect unreliability has on sales.

A hypothetical plant will be used for purposes of calculations. You can apply these calculations to your own operations to develop an order-of-magnitude estimate of the impact unreliability has on sales and profitability.

For the calculation purposes, we will use a hypothetical plant that has a plant-replacement value (PRV) of $1 billion US, with a targeted return on capital employed (ROCE) of 30%. In other words, business stakeholders expect to realize $300 million in earnings before interest and taxes on their $1 billion investment. We will also assume that this plant operates at 70% capacity due to lack of sales.

Raise sales price

Sales revenue is driven by two key levers, price and volume. The higher the sales price per unit the higher the margin, the higher the sales revenue, and the greater the profit. Additionally, the more product you sell (sales volume), the higher the sales revenue and the greater the profit. So both sales price and sales volume determine the revenue garnered by the business. Unreliability has a very profound effect on those two factors. To understand the relationship between asset reliability and sales revenue in this equation we need to examine each component in more detail.

The price of a product is largely set by whatever price the market will bear. However, the market places a premium on quality. The highest sustainable product quality can only be produced through uninterrupted manufacturing. As assets become more reliable, manufacturers are able to produce consistently higher quality product, something customers value. This isn’t new. W. Edwards Deming espoused the virtues of product consistency more than a half century ago.

If a 5% price premium can be garnered from customer willingness to pay more for higher quality product, then the subsequent increase in sales revenue is calculable. Assuming the hypothetical plant had $500 million in sales during the reporting period, the increased revenue from a higher price enabled by higher-quality product would be an additional $25 million in sales revenue.

This increase in sales revenue was made simply by reducing and/or eliminating unplanned equipment failures. No additional capital was required, resulting in a direct increase in the return on capital employed and, more importantly, on profitability.

LINE ITEM: $25 million = The increase in revenue due to higher sales price for higher quality product derived from reducing and/or eliminating unreliability.

Increase capacity

A second sales-revenue benefit derived from the elimination and/or reduction of unreliability is garnered through a lower cost per unit (CPU) of production. By operating in a failure-free mode, manufacturers are able to increase throughput. When there are fewer production interruptions caused by equipment failures, more product is made over the same period of time.

For example, if the average production rate was 80 tons per day, including time lost to equipment failures, then a natural benefit derived by reducing and/or eliminating equipment failures would be an automatic increase in capacity. If one additional hour per day of production was gained, the subsequent increase in capacity would be 4%.

A 4% increase on $525 million in annual sales revenue would be worth an additional $21 million in sales revenue. As was the case with improved product quality, this increase in capacity was derived without any additional capital investment. Companies are always striving for increased sales by whatever means, but they inevitably expect to have to invest significant capital in a new production unit or to expand an existing production unit.

LINE ITEM: $21 million = The incremental sales gained through the incremental increase in production capacity derived from reducing and/or eliminating unreliability.

Increase sales margin

Additionally, a 5% reduction in the cost per unit derived by spreading costs, e.g., operational and energy costs, over a larger volume of product could be significant. This is effectively an increase in the sales margin of the product being sold. Using the aforementioned $500 million in annual sales, the benefit would be 5% of $500 million, or an additional $25 million in profit.

LINE ITEM: $25 million = The increase in profit caused by an increased sales margin gained by reducing the cost per unit derived from reducing and/or eliminating unreliability.

Admittedly, an argument against the aforementioned gain could be made. Just because you produce more product doesn’t mean that you can sell it. But let’s examine the primary means of competition in a capitalistic environment. Companies generally compete on price and/or on quality. By reducing and/or eliminating equipment failures, both of these factors are enhanced. If you have a higher quality product to offer, your competitive position is automatically strengthened. You can increase price to increase sales revenue and/or maintain the same price and increase sales volume by offering a higher quality product for the same price.

The gains illustrated above appear to be reasonable, so we’ll assume that we could potentially increase sales price and sales volume, thereby deriving a dual benefit from the reduction and/or elimination of unreliability.

Reduce maintenance

We must also consider that, with a reduction in unreliability, maintenance costs, typically the highest fixed cost in manufacturing, are substantially lowered. Maintenance costs are distributed across all production in the form of maintenance cost per unit of production. The net result of lower maintenance cost is therefore lower cost per unit of production. In a poorly performing operation, characterized by high unreliability and subsequent high maintenance cost, the benefit derived from reducing the maintenance cost per unit alone can be profound. Benchmark studies have shown that the difference between a best performer and a worst performer, relative to maintenance cost, can be exponential. In other words, a worst performer will spend exponentially more on maintenance per unit of production than a best performer.

In the process industry, the range of performance in maintenance cost as a percent of plant-replacement value (PRV) is from less than 1% for best performers to more than 15% for worst performers. For illustration purposes we will assume a 1% reduction in maintenance cost as a percent of PRV. We will assume maintenance costs were 3% of PRV, but have been reduced to 2% of PRV by implementing a robust condition-monitoring program that facilitates corrective action prior to catastrophic failure. The net increase in profit through reduced maintenance costs based on a PRV of $1 billion would be $10 million.

LINE ITEM: $10 million = The increase in profit gained by a reduction in maintenance cost derived from reducing and/or eliminating unreliability.

Extend turnaround frequency

Although it is not universally recognized, maintenance turnarounds are caused largely by unreliability. The primary driver for turnarounds is typically pressure-equipment inspection. But what if you used non-intrusive condition monitoring such that you eliminated the need to open equipment for visual inspection?

Far too many process plants still take annual turnarounds. In this era of advanced inspection technologies, that is inexcusable. Better-performing process plants have extended the frequency of their turnarounds out to 5 to 7 yr. Let us assume that the hypothetical plant still takes annual turnarounds that cause 21 days of lost production. If the turnaround frequency was extended out to 3 yr., with only a 7-hr. increase in duration, a net annualized increase in production of approximately 12 days would be realized.

If we conservatively calculated the value of each day of production, based on current production rates and sales prices, twelve additional days of production would net an additional $18 million in sales revenue.

LINE ITEM: $18 million = The increased sales revenue gained from 12 additional days of production derived from reducing and/or eliminating unreliability caused by annual turnarounds.

Increase production

The final potential gain we will examine is the 30% of production capacity that is not currently utilized, auspiciously because of a lack of sales. Claiming that no sales were lost due to unreliability is a self-fulfilling prophecy. As long as the manufacturer is not a sole source producer, additional sales were lost to competitors. If we go back to the benefits of the highest sustainable product quality and lowest sustainable unit cost of production, there would be no valid reason for not selling every unit of production. That additional 30% of production and subsequent sales is a game changer for the business. Using the original assumption of $500 million in annual sales, adding in the additional sales revenue from continuous production, and ignoring the quality premium, the net gain in sales revenue is an astounding $215 million.

LINE ITEM: $215 million = The increased sales revenue gained by running continuously, derived directly and indirectly through the reduction and/or elimination of unreliability.

There are arguably additional sales and revenue gains that can be derived through the reduction and/or elimination of unreliability. However, using the examples above we can see that a significant increase in sales and related revenue can be gained through reliable operation.

This is not an insignificant amount of sales revenue for any size organization. The business case for reliability is compelling! Although a hypothetical manufacturing site was used to illustrate the effect of unreliability on sales, the same calculations can be used to obtain an order-of-magnitude estimate of the value of lost sales due to unreliability for any plant. Plant management and corporate leaders need to understand the high cost of unreliability. All it takes is for someone to take the initiative and calculate the value for your operation. Once the true cost of unreliability has been exposed, garnering support for improved reliability should be easy! MT

Al Poling has more than 35 years of reliability and maintenance experience and is a Certified Maintenance and Reliability Professional (CMRP). His consultancy, RAM Analytics, is located in Houston. For more information, contact him at

Click here to download an ebook pdf containing this article and Al Poling’s February 2016 article “Calculate the True Cost of Unreliability”.


7:49 pm
April 11, 2016
Print Friendly

Choose The Right Emergency Stop

With today’s number of customizable available options, selecting the right emergency stop (e-stop) for process equipment can be a daunting task, but it’s critical for overall safety. According to human-machine-interface (HMI) experts at EAO Corp., Shelton, CT, fitting equipment with a highly functional e-stop in line with the basic application design concept, versus a lesser-certified safety switch, is key. MT

Determine if your application requires Category 0 or Category 1 shutdown.
This is crucial in the placement, size, electrical specifications, mechanical characteristics, color, and number of required e-stops.

Research international and North American standards, performance ratings, and codes that govern your application (see table below).
Each industry has unique regulatory standards. These restrictions may govern factors such as size, color, and contact terminals.

Select the product.
Choose your e-stop based on design factors to meet industry demands and international compliance. Proper selection involves understanding market and application requirements, environmental conditions, and electrical demands.

Vendors often provide a variety of unique features to enhance your e-stop and complete virtually any application. It’s important to research these additions as some accessories may be mandated by industry standards.

Consult an expert.
Many suppliers offer consultative services to assist customers throughout the process of selecting and integrating their HMI needs, from individual e-stops to completely designed and produced ‘mixed technology’ solutions.

For more information on e-stops and other HMI components and systems, visit

Screen Shot 2016-04-11 at 2.46.56 PM

Screen Shot 2016-04-11 at 2.47.23 PM


8:26 pm
February 8, 2016
Print Friendly

Calculate the True Cost of Unreliability


The economic impact on manufacturers that haven’t bought into the idea of failure-free operation is easy to determine and, more important, enormous.

By Al Poling, CMRP

Although experts have espoused the virtues of equipment reliability for decades, countless manufacturing operations still suffer significant and unnecessary downtime due to equipment failure. Apparently these manufacturers haven’t bought into the benefits of failure-free operation. What will it take to get them to accept the time-proven benefits of reliability? Perhaps they will never be convinced by examples of other manufacturing operations, believing that they are somehow unique. If the benefits derived through reliable operation won’t lead them to change, perhaps an examination of the true cost of unreliability will.

The big picture

Businesses operate under the basic equation of: profit = sales minus cost. Although equipment failures affect both sides of the equation, this article focuses on the impact of unreliability on maintenance costs—typically the largest fixed costs in a process-industry manufacturing facility. End users can apply the following calculations from a hypothetical plant to their own business and develop an order-of-magnitude estimate of the impact of unreliability on maintenance costs at their site(s).

For purposes of these calculations, let’s assume our hypothetical operation has a plant replacement value (PRV) of US$1 billion and a resident maintenance workforce of 150 craft-level employees.

Maintenance-labor cost

Maintenance costs in a plant include those for skilled craft labor to repair and restore equipment to good operating condition following a failure. The current average U.S. Gulf Coast, fully loaded, maintenance skilled-craft wage rate is approximately $45/hr. Using the U.S. standard of 2,080 hr./man-year, with an estimated overtime rate of 5%, the cost/year/skilled craft worker is approximately $100,000. Consequently, 150 skilled craft workers will cost approximately $15 million/year. In terms of man-hours, including overtime, the number is about 300,000 man-hour/year.

Benchmarking studies have confirmed that best-performing plants average 1% downtime due to unreliability/year, while average performers suffer 7% downtime due to unreliability. These numbers include annualized downtime for turnarounds. To calculate the annualized downtime for turnarounds, simply take the total downtime for your last turnaround and divide it by the number of years between turnarounds. A 30-day turnaround taken every three years equals 10 days of annualized downtime due to the turnaround alone.

Best performers average less than four days of downtime/year due to unreliability, including annualized downtime for turnarounds. Average performers endure more than 25 days of downtime/year due to unreliability.

There is a direct correlation between the number of equipment failures and the number of craft workers required to effect repairs. In theory, the average-performing manufacturer would have seven times more maintenance craft workers than the best performer. That, however, is in theory only. Achieving and sustaining failure-free operation requires truly skilled craft workers and, even they have to focus their efforts on failure avoidance instead of repair.

Work sampling studies have revealed that the efficiency of maintenance-craft workers is extremely high in highly reliable operations, as their work is well defined and scheduled in advance. In comparison to reactive maintenance, schedule interruptions happen on an exception basis in a failure-free environment. Instead of seven-times as many skilled craft workers needed in an average-performing plant, we’ll estimate (conservatively) that the number is half that (or three and a half times).

With regard to maintenance labor, the cost of unreliability is the difference between the number and associated cost of skilled craft workers required to support a reliable operation versus an unreliable one. Assuming that the aforementioned 150 such workers, costing $15 million/year, are working in an operation suffering average unreliability, the additional maintenance labor costs are 70% of the total—$10.5 million/year. In this example the true cost of unreliability in skilled craft workers is an additional 105 such workers costing an additional $10.5 million/year, whereas a reliable operation would only need 45 skilled craft workers. This calculation does not factor in the elimination of overtime that would be found in a failure-free environment. While equipment still fails, the impending failure is discerned well in advance so repairs can be made during normal maintenance work hours.

Screen Shot 2016-02-08 at 1.14.13 PM

Maintenance-material cost

Repair material is another major element of maintenance costs. Unfortunately, the ratio of maintenance-material cost to maintenance-labor cost varies by region due to differences in the prevailing wage and the availability (or lack) of repair materials. Equipment’s material of construction also factors into material-to-labor ratios.Maintenance-material cost

A reasonable hypothesis is to use a one-to-one ratio of maintenance material to maintenance labor. Applying this ratio to our hypothetical plant with 150 maintenance craft workers at a cost of $10.5 million/year means the site spends another $15 million on maintenance-repair material annually. Using the same approximation as we used with maintenance labor, 70% of these material costs would be avoidable if the plant were operating in a failure-free mode. In monetary terms, this represents yet another $10.5 million attributable to unreliability.

Screen Shot 2016-02-08 at 1.15.16 PM

Equipment-replacement cost

In consequential failures, equipment cannot be repaired and, thus, must be replaced. Benchmarking studies have shown that manufacturing operations running their equipment to failure spend exponentially more than best performers spend on maintenance capital, i.e., equipment replacement.Equipment-
replacement cost

Manufacturers that take care of their equipment and embrace failure-free operation derive extraordinary service-life from that equipment. Conversely, those who operate in a run-to-failure mode wear out equipment quickly.

Run-to-failure is a particularly costly maintenance strategy. Best performers will spend 1% or less of their PRV each year to replace equipment that has reached the end of its useful life. In contrast, average performers will spend 3% to 5% annually on replacement equipment. Determining the true cost of unreliability, therefore, requires factoring in the price tag for equipment replacement.

A reasonable assumption is that best performers spend 0.5% of PRV and average performers spend 4% of PRV on annual equipment replacement. That means, based on our hypothetical plant, with a PRV of US$1 billion, a best performer would be spending approximately $5 million annually on equipment replacement due to unreliability, and an average performer would be spending approximately $40 million annually. Thus, in our hypothetical example, the true cost of unreliability reflects an additional $35 million/year for equipment replacement.

Screen Shot 2016-02-08 at 1.16.14 PM

Additional costs

Another significant maintenance cost involves maintenance administration and staff. Granted, there is not a direct correlation between the number of maintenance salaried personnel and maintenance wage personnel. Still, there are common ratios of salaried to hourly wage personnel—and they differ dramatically between better and poorer performers. Merely reducing numbers of skilled craft workers, though, doesn’t translate to an equal percentage reduction in staff. For example, in average-performing operations, there may be more maintenance supervisors, but the ratio of craft to supervisor positions is higher. In best-performing operations, the ratio of maintenance supervisors to craft personnel is lower. This situation results from recognition of the value of maintenance supervisors as facilitators who can greatly enhance the efficiency of a maintenance workforce.Additional costs

A similar condition exists with maintenance planners. Poor performers have larger numbers of skilled craft workers/maintenance planners—with some of the worst performers in the range of 60:1. An individual maintenance planner can’t effectively serve such a large number of skilled craft workers—and is likely operating in a reactive mode, expediting materials or performing other duties required to support reactive maintenance.

In contrast, the ratio of skilled craft workers to planners at a best-performing site is more apt to be in the 20:1 range. With this type of ratio, a planner can prepare detailed job plans, procure materials, and efficiently perform other planning functions. The net result is that there will be no appreciable administration and staff cost savings in moving from a run-to-failure to failure-free environment. This is due to changes in ratios of craft to staff positions and the redeployment of some personnel from reactive work to proactive functions that are needed to support failure-free operations.

Additional maintenance costs affected by unreliability involve facilities, including offices, shops, break rooms, restrooms, and related infrastructure costs. Rolling-stock requirements can also be affected, as can various support staff outside of the maintenance function, such as human resources, training, and safety. Generally speaking, though, there is no substantive reduction in administration, staffing, and related cost categories as a result of reducing and/or eliminating unreliability.

The bottom line

As discussed here (and shown in the accompanying sidebar), the true cost of unreliability is enormous. By adding up the previously noted line-item maintenance costs for our hypothetical plant, we can see that unreliability amounted to a staggering $56 million (or 80%) of unnecessary spending for maintenance labor, materials, and equipment replacement costs.

Given this type of economic impact of unreliability, why don’t all manufacturing operations transition from failure-prone to failure-free environments? Unfortunately, there’s no single root cause. Many factors contribute to the situation. Among them:

The constant distraction of equipment failures is akin to putting out fires. Consequently, everyone is so focused on reacting that they believe they can’t take the time to implement measures to avoid the failure. A fairly simple solution here would be to devote a small number of employees to developing and implementing plans to avoid equipment failures. For this approach to be effective, however, those proactive resources can’t be dragged back into firefighting mode. Otherwise, nothing will improve.

Poorer-performing operations rarely have a strategic plan or, if they do, it’s typically mere window-dressing written to satisfy corporate management. Without a well-thought-out vision or mission, plant personnel will naturally accept the status quo as the normal mode of operation.

There is a lack of leadership in poorer-performing manufacturing operations. Either the current management lacks the requisite leadership skills or there are no incentives positive or negative to change the status quo. Humans respond to stimulus. If there are no consequences for being unreliable, nothing will change. Conversely, if there are no rewards for becoming reliable, or if the existing reward system somehow perversely rewards unreliable behavior, nothing will change. Better-performing manufacturing operations typically share the benefits of failure-free operation with all employees. As a result, everybody has a stake in improved reliability. 

While this discussion used a hypothetical manufacturing site to illustrate the true cost of unreliability, the same ratios can be applied to obtain an order-of-magnitude estimate of the cost of unreliability for your operations. Remember, though, that someone needs to take the initiative before improvement can begin. MT

Al Poling has more than 35 years of reliability and maintenance experience in the process industries, many of them spent in engineering and corporate-leadership roles with several companies. A Certified Maintenance and Reliability Professional (CMRP) through the Society for Maintenance and Reliability Professionals (SMRP), he served as technical director of the organization from 2008 to 2010. Prior to starting his own consultancy, Poling served as the project manager for Dallas-based Solomon Associates’ International Study of Plant Reliability and Maintenance (RAM) Effectiveness, during which he worked with clients to identify performance improvement opportunities through benchmarking. For more information, contact

Unreliability: A Very Expensive Proposition

The three largest maintenance-cost categories affected by unreliability are maintenance labor, maintenance material, and maintenance capital, i.e., equipment replacement. In our hypothetical manufacturing operation with a plant replacement value (PRV) of US$1 billion and resident workforce of 150 skilled craft workers, we can calculate the cost of unreliability individually and collectively as follows:

Screen Shot 2016-02-08 at 1.17.42 PM

$70,000,000 = Total current annual maintenance cost for labor, material, and maintenance capital, i.e., equipment replacement.

80% = Percentage of the total maintenance labor, maintenance material, and maintenance capital spent unnecessarily due to unreliability.

At first glance, these figures may appear unrealistic. They’re not. The harsh reality is that unreliable operation is very expensive for any manufacturer, regardless of size.

learnmore“The Business Case for Asset Reliability”

“Choose Reliability or Cost Control”

“The Risk Is In The Management”

“Reliability Business Case: Conversion Costs”


5:27 am
April 1, 2015
Print Friendly

Are You Ready? Preparing For Reliability Improvement


Improvement initiatives backed by effective practices and policies can enhance profitability. Careful preparation is key, says this industry veteran.

By Wayne Vaughn, CMRP, PCA Consulting

It’s a fact of life across industry: Many organizations need to upgrade their basic maintenance practices. Even when good ones are in place—such as preventive and predictive programs, effective planning, scheduling and MRO-purchasing/storeroom policies—some plants still need to strengthen elements like equipment availability and cost-reduction efforts. Improving availability and reducing costs (from both maintenance and operational standpoints) can best be approached by implementing an equipment-reliability-improvement program. While these efforts may require significant time and resources to implement, they can generate enormous returns.

Getting the biggest reliability-improvement bang for your buck calls for careful, upfront planning: Success involves more than simply hiring engineers and telling them to go out and “fix reliability.”

Start with your data
While the fact that data drives reliability improvement may seem obvious, it’s not uncommon for companies to either 1) not have data; or 2) have data that’s not easily mined from their CMMS/EAM systems. The following activities are crucial to undertake prior to embarking on a reliability-improvement initiative:

  • Ensure you have best-practice work processes that collect data and appropriate policies in place.
  • Ensure that you classify work in a way that it can be mined to understand problems.
  • Identify key performance indicators (KPIs) that can point to potential opportunities.
  • Establish KPIs to gauge how well your preventive (PM) and predictive (PdM) programs perform.

Best-practice work processes
The work-order work process is the most fundamental element of successful maintenance. All work must be captured. This must include labor, materials, contractors and other expenses that go into maintaining plant equipment. This information must be accurate, and the type of work being done must be coded carefully.

A second key area is ensuring that all work goes through a planning and scheduling process so that needed work is agreed to by operations and executed systematically. This means all PM and PdM work will go through this process. (KPIs and effective management-review procedures must be in place to make sure these important processes are accomplished effectively.) While an entire book could be written about these basics, this article focuses on PM and PdM work orders. Companies spend time to write PM instructions and create PM programs, but often don’t manage the process effectively.

It is important that PM efforts find and repair things that will prevent operational outages or other emergency situations. A good way to do this is to ensure that when something is found, a work order is created to do that corrective work. Too often, companies allow technicians to repair found defects and charge their time and materials to PM work orders. This is a big mistake, and indicative of an area where a policy must be in place. Although this can seem a pragmatic way to perform work that might involve only a few minutes of a technician’s time, it’s one that could potentially mask a problem.

A good policy is to establish a timeframe for such work, say 15 minutes. Work that can’t be performed within the specified time would require a follow-up work order. It’s also important that the follow-up work order be appropriately coded as work identified by a PM activity. An additional recommended policy element is that, regardless of time, if a part is required for the repair, a corrective-action work order must be created to capture the labor and materials used.

Classifying work
There are many ways to classify or “type” the forms or reasons behind why any given work is needed. Each company has to determine what makes sense for its particular situation. For starters, consider the following two methods:


Whatever designations or codes your operation chooses, they must be accompanied by clear definitions on their use, including by whom and under what circumstance. This leads to granularity and standardization of data that eventually provides useful diagnostic information.

KPIs for opportunities
There are several KPIs that can indicate opportunities. An excellent one is a report that shows the mean time between failure (MTBF) for equipment. Again, a policy may be established that if a piece of equipment falls below a selected value, that equipment must receive a formal review. It may be that the equipment will be placed in the capital replacement plan, put into a rebuild schedule, placed on a list of equipment that will undergo an improvement process or simply be lived with as is. This review must be done in coordination with operations.

Often, a piece of equipment that does not appear on the MTBF list may be identified by operations as needing improvement. This may be because of the schedule production load, lengthy repair time when a failure occurs, the lack of a back-up process, support of an important customer or the high cost of replacement processes. Another good KPI to create is a report of high-maintenance-cost equipment. These may offer significant cost-reduction opportunities.

Once a list of opportunities has been identified, some are selected for equipment improvement. A joint, maintenance-and-operations, equipment-improvement process can be effective, and should be considered. Such groups should be facilitated by a reliability engineering expert. This will leverage engineering resources and create greater buy-in when solutions are presented for approval.

Data must then be gathered from the selected equipment to support the problem-solving process. This is where good coding and the creation of corrective-action work orders provides a substantial payback. If corrective labor and materials are charged to PMs when defects are found, data-mining becomes very difficult. That’s because with possibly thousands of PMs done in a facility over the course of a year, perhaps only hundreds capture faults. Since those hundreds will be hard to find, the data will need to be sorted and classified manually—a frustrating process for any site.

KPIs for PM effectiveness
An excellent KPI for monitoring PM effectiveness is to create a report and chart of how many faults are detected per 100 PM activities. An effective program will likely find between 5 and 20 faults for each 100 PMs performed. The report should show PMs that do not detect any faults, and also highlight those PMs with a large number of findings. Too few may indicate that the PM frequency is too often or that the PM tasks are the wrong ones. Too many may indicate that PM frequency is not often enough or that there is a problem that needs a reliability-engineering review.

It is important to regularly review PMs to keep them current. Annually is a good practice. Not only do technicians become aware of things that need to be checked over time, they see things that do not need to be checked. Changing operational conditions can also induce different failure modes or necessitate a different PM.

Regular reviews can also help determine if a PM can be moved to a PdM, add more objectivity to the work process, and identify the most appropriately skilled person(s) to perform the work. A KPI here might be the average age of PMs since their last review, with special reports on PMs that exceed the company’s policy on review frequency.

Of course MTBF and mean time to repair (MTTR) will also indicate how effectively a PM effort is being planned and performed. However, since the absolute value of these measures may not be the best indicator, look carefully at the trend line to verify continuous improvement.

The bottom line
Implementing a successful equipment-reliability-improvement program may require a significant investment in terms of time and resources, but payoff can be enormous: Incorporating best practices, good practices and well-defined policies, these initiatives help companies become and remain profitable. Don’t let inadequate preparation come back to haunt your organization’s efforts. MT

Wayne Vaughn recently retired as Director of Maintenance for Harley-Davidson. He currently is a Senior Reliability Consultant with Performance Consulting Associates, Inc. (PCA), of Duluth, GA. Contact him at


7:29 pm
December 1, 2014
Print Friendly

A Contrarian View: What Being Proactive Really Means


By Heinz Bloch, P.E.

A company I choose to call NTBO (“Not-To-Be-Offended”) will long remember a string of expensive pump failures that jeopardized the continuity of boiler feed water supplied to its power generation turbines. When all components were carefully measured, it was determined that the oil-slinger concentricity exceeded maximum allowable by a factor of 30. Oil slingers (cone-shaped collars on revolving shafts designed to return passing oil outward to the point of origin) are critically important components, but I don’t know if NTBO implemented the specification and inspection routines needed to capitalize on this costly experience.

Capitalizing on an event means not doing the same dumb things all over again. In NTBO’s case, bringing 30- or 40-year-old rotating machinery back to original tolerances should be one of this company’s priority tasks. In fact, the next shutdown might include retrofitting fluid machinery with more efficient blades, impellers, vanes, improved lube delivery, superior filtration and the like.

The time to be thoroughly proactive is NOW. Today is the best time to communicate with competent upgrade shops; or to write an oil-slinger-ring specification requiring stress-relieving (annealing) before finish-machining; or to find out if better reciprocating-compressor valves are available and cost-justified; or to determine if excessive pipe stress on the discharge nozzle of P-207 warrants pre-fabricating a spool piece for insertion between points A and B during the plant shutdown scheduled for later in the year.

Repeat failures in a plant should be thought of as management failures, plain and simple. Someone in management needs to hire, nurture or groom people who know that the above activities are among the hundreds of proactive tasks that fit under various subheadings in role statements for responsible reliability professionals. One such task is to keep current a list of actions that could be carried out if an unanticipated downtime event were to occur.

Say, for instance, the reliability manager tells you that one of your plant’s process units has just shut down because of a pipe rupture. He tells his engineers and technicians that unit restart is scheduled in 30 hours. A proactive reliability employee might, for example, immediately look at his/her two-day-opportunity list that prioritizes coupling replacement in P-207 B, followed by the addition of 24 pre-fabricated hydraulic tubing lines to 12 electric motors on pre-defined process pumps already hooked up to the plant-wide oil mist system.

But it’s a two-way street. A competent manager creates role statements and training plans for his employees. For their part, the reliability professionals reporting to this manager make sure they arrive at work knowing what tasks they will perform (on a “normal” day) in harmony with their defined roles. As they then return home at the end of the day, these professionals should ask themselves if they have added value and if—should they ever leave their current jobs—the present manager or employer would notice they were no longer there.

Granted, if you work for a multi-billion-dollar enterprise, the corporation’s end-of-year profit statement probably wouldn’t be directly affected by your presence or departure. But your group or section or department should notice if or when you’re no longer around.

So make a difference. Whether you’re a manager or a junior contributor, strive to be above average. Be self-motivated and proactive. Offer facts that comport with common sense and the laws of science. As I’ve mentioned before, anecdotes add nothing but wasted time. Factual information translated into cost-justified actions adds value. MT


12:56 pm
November 4, 2014
Print Friendly

The Reliability-Driven Maintenance Organization


Getting there requires taking a close look at weaknesses and taking measurable steps to correct them. Here, a respected industry expert shares tips on how to become a high-performing operation.

By Christer Idhammar

Any plant maintenance department wants to be known as a cost-effective organization. For our purposes, “cost effective” means maintenance without waste: where waste is the gap between how good the organization is and how good it can become. Waste includes poor safety, losses in quality and high costs.

In a poorly performing maintenance organization, the gap between the real and the ideal world tends to increase over time because it reacts to problems instead of preventing them. As a result, there isn’t time to take measures that will break this reactive work cycle. Even in periods when equipment is operating well and no panic-work comes up, the maintenance organization tends to slow down and wait for the next problem. This creates a culture where maintenance personnel think it is useless to start other work because they will be interrupted with real, or often perceived, urgent work. So even between reactive work, maintenance personnel accomplish very little.

From an operations standpoint, this situation can be comforting because it means maintenance can deal with equipment problems on short notice. It is far easier for operations to call maintenance to fix a problem when it occurs than to write a work request to correct an anticipated problem. This type of relationship typically occurs when operations does not feel responsible for the cost of maintenance. Even if most work is requested by operations, the maintenance manager is in the hot seat if budgets are overrun.

A high-performing maintenance organization is far different. It is founded on anticipating what will happen in the future and planning and scheduling corrective actions in advance. It is not only DO-oriented, it is THINK-oriented. It is an organization that continuously designs out problems and improves.

  • Correcting attitudes and cultures
  • To develop a high-performing maintenance organization, the first steps are:
  • To fully understand how good the organization is currently, and locate the gaps where improvement can occur
  • To develop and commit to an action plan to close the gaps, including clearly defined roles and responsibilities
  • To change work attitudes and culture.

In some plants, the typical first step toward improving maintenance performance is to purchase a new computerized maintenance management system (CMMS) or instruments for predictive maintenance. They may also implement fragmented improvement initiatives using Reliability Centered Maintenance (RCM), 5S or similar tools. And while these are good tools, they often fail because they are implemented before an organization does the basics well or changes the work culture to support their efficient use.

Bill Gates addressed succinctly the potential value of technology-based tools when he said, “The first rule of any technology used in a business is that automation applied to an efficient, well-defined operation will magnify the efficiency. The second is that automation applied to an inefficient or poorly defined operation will magnify the inefficiency.”

Measuring results

To measure the results of maintenance activities, plants traditionally view good maintenance in terms of low costs. With few exceptions, this cost is always considered too high. This view of maintenance stems from an old attitude, which is that maintenance only costs money and does not contribute to productivity.

Plants must change the way they measure maintenance results. Analysis of production advancements over the past 35 years reveals that many process industries have more than tripled their production output. During this time, the number of operators has decreased about 30%, while the number of maintenance crafts people has decreased about 6%. This growth in productivity can be traced to increased automation and more reliable equipment—and it’s not necessarily a result of efficient maintenance.

A common way plant maintenance departments measure their effectiveness is to compare maintenance costs with other plants. This is the wrong thing to do, because those who are not the top performer in the comparison will waste time explaining why the figures are wrong instead of focusing on how to improve. We also know that different accounting principles can make a difference of up to 100% in what is considered a maintenance cost, capital investment or operations expense.

The focus must instead be on learning about activities, technology and processes that drive reliability, safety and cost. Better planning and scheduling of maintenance work correlates directly to high manufacturing reliability, better safety and lower costs. It is also important to understand that predictive maintenance alone does not prevent anything. It only gives information on failures that are developing toward a breakdown. But with this information, plants can “anticipate” the future and plan and schedule corrective maintenance actions.

In the best case, plants can schedule the corrective action to be executed in a maintenance “window.” This is an opportunity that presents itself when equipment is down for reasons other than planned and scheduled maintenance, such as changing belts, unscheduled shutdowns, cleaning and other tasks. The link between predictive maintenance and planning and scheduling of work is an essential basic reliability and maintenance process. Executed with precision, it will increase quality product throughput, improve safety and reduce costs.

Performance indicators*

The right thing to do is benchmark the maintenance department and measure continuous improvement internally. If comparing with other organizations, plants should learn what processes best performers use to drive improved reliability and maintenance costs, and how they execute them well.

To continuously improve execution of essential processes, it’s necessary implement performance indicators as close to the action as possible. This will motivate and trigger actions that will influence the overall performance.

In a reactive organization, break-in work must be reduced. During transition to an organization in control, planning and scheduling quality can be an indicator. Trends in backlog, overtime and contractor hours can also be meaningful indicators when the organization is starting to gain control. When an organization gains control over its maintenance strategies, it becomes important to measure Root Cause Implementations completed and problems eliminated. To do this properly, clear definitions on what’s measured are necessary.

In a study of 38 process lines, the only strong correlation between low and high performers is how well they planned and scheduled maintenance and operations work. All machines that planned and scheduled more than 50% of work had measured Reliability (as % Quality x % Time, with Time based on 8760 hours available per year) of over 85%. Top performers that planned and scheduled between 75% and 90% of all work achieved a Reliability of 92–96%.

Work measurements

Plants that use hands-on tools or other types of work measurements as a way to determine maintenance efficiency are doing the wrong thing. Here’s why:

  • They do not promote cooperation between management and crafts people.
  • They do not consider those who may be busy doing the right thing. For example, in the work-measurement system, thinking time and trouble-shooting time is considered hand-off-tools and, thus, non-productive.
  • Almost all time identified as non-productive by work measurement is typically attributed to a lack of work management and planning and/or scheduling. In fact, it is a result of poor management.
  • When equipment is operating, it is not always true that maintenance people who are busy with hands-on tools are productive. In fact, they can be busy doing the wrong things or only pretending to be busy.
  • In a scheduled shutdown, it is true that people are more productive if they can work on planned and scheduled work without interruptions. Again, only good planning and scheduling—good management—can accomplish this.

Partnering in reliability

To achieve results-oriented reliability and maintenance, plants must realize that production is a partnership between operations, maintenance, stores and engineering. The traditional view is that maintenance is a service organization; operations is the internal customer of maintenance; stores support maintenance; and engineering is an isolated “happy island.” The right thing to do is to view these sectors as partners in a joint venture to reliably produce quality products.

In this partnership, maintenance will deliver equipment reliability; operations will deliver production process reliability; stores will continue to support maintenance; and engineering will support both maintenance and operations, as well as practice life-cycle costs (LCC) or asset management in its design, specification and selection procedures for new equipment. This means that equipment selection will be based on the cost to buy and cost to own. The concept includes reliability and maintainability analyses.

Recognition is important

Most maintenance organizations can verify that they receive recognition when they fix a major breakdown, but seldom hear anything when they prevent a breakdown. While there is nothing wrong with recognizing good work in a breakdown situation, if this is the only time maintenance people are recognized, it sends the wrong message. This type of recognition fosters a culture of maintenance heroes or “Maintenance Tarzans.” They become action-oriented, which can make it difficult for them to transition to more planned, scheduled and organized maintenance work.

Overtime compensation can motivate, especially considering that breakdowns are about 74% more likely to occur when the full crew is off site. However, this is changing as the Y-generation enters the job market—a group that values time off more than higher pay. Plants need to remember that poor maintenance is visible and good maintenance is invisible, because it is less action-oriented. It is always right for plants to recognize implemented improvements, failure avoidance, planning and scheduling performance and overall reliability.

Performance-improving tips

The following strategies can help develop a high-performing organization:

Work management and planning & scheduling: Most frontline supervisors schedule work to the people they have available. The right thing to do is schedule work that must be done, prioritize it based on risk and what is best for the business, then schedule people to execute this work.

Time estimates are almost always based on four or eight-hour time segments. In many cases, no fewer than two people are assigned to each job. This provides the supervisor a buffer of resources he or she can use for jobs added to the schedule on short notice. In this setup, scheduling-compliance can wrongly appear to be high. Therefore, it’s better to schedule work with real time estimates and include problem solving, or thinking time, as part of all work done by crafts people.

In a high-performing maintenance organization, 20% of all effort hours should be used on problem elimination or continuous improvement that will “design out maintenance problems.”

Anticipation: Most plants have morning meetings to discuss what happened the previous day and night, and what is planned for the current day. High- performing maintenance organizations will spend most of this meeting on what will happen tomorrow and next week. Though it sounds unrealistic, this can be done because very few problems occur and little time needs to be spent on yesterday’s problems. The focus should be on future activities.

Following the same principle, the organization should work on a monthly or weekly forecast and finalize the next day’s schedule about four hours before the end of each day. The schedule should be communicated to crafts people before they leave for the day so they can prepare for the next day’s work.

Flexibility: The 12- to 14-person craft-line-oriented maintenance organization is, or must soon be, a thing of the past. Craft lines should not limit work flexibility—only work skills to do a job safely should be a constraint. This will often require changes in union agreements and a focused training program for crafts people. Experience indicates that if management presents a clear plan, it will be well received.

Lost-production analyses: These types of analyses often reflect lost production only by department. Such a procedure does not build a partnership between departments, nor does it solve problems.

The better approach is to define, solve and classify a problem by department, equipment and type of failure after analyses are complete, then follow up on how to solve the problem in the future.

Storeroom closure: Many maintenance organizations waste up to 30% percent of their time walking to the store(s) and searching for parts. Plants should plan and schedule maintenance activities so stores can prepare and deliver parts where and when they are needed. This will require a Bill Of Material (BOM) populated to 95%+ accuracy.

Technical documentation: All technical and economic information about equipment should be readily available. The equipment, loop or circuit number should be the key to this information. At a minimum, all parts kept in stores, or not kept in stores, should be tied to equipment identification in the BOMs. The lack of good and reliable documentation is one of the reasons why most maintenance planners do not have time to plan.

Maintenance shift coverage: Most three-shift plants have maintenance resources on the late shifts. Some have a maintenance supervisor on each shift. Ideally, a plant should operate without maintenance people on the night or evening shift. This is possible only if maintenance believes the plant can operate 16 hours without major maintenance problems. If this is not possible, the plant should do something about it.

The above issues are select examples of actions and cultures that will promote high-performing maintenance. It is important that a plant maintenance organization seriously examine how good it truly is, determine if it is promoting the right things and if improvements are needed. Only then can a maintenance organization proceed to make the changes needed to become as good as it can be. MT

Christer Idhammar is the Founder of IDCON, Inc. (