Archive | June, 2007


6:00 am
June 1, 2007
Print Friendly

Maintenance Quarterly: Uptime Is Key to Wallboard Production

0607_mq_maintenancemanagement_img2A value-added maintenance management system helps an industry leader stay competitive in a demanding market.

Located between Denver and Grand Junction, CO, American Gypsum – Eagle Plant operates around the clock to produce more than 113 miles of wallboard each day. One small hiccup in a continuous production line that includes well over 1500 pieces of equipment can spell disaster from both a financial and order fulfillment perspective.

In a plant extending a quarter mile in length, Maintenance crews for this 24/7 operation are faced with ensuring conveyor systems, crushers, hammer mills, feeders and mixers along with the primary forming line are running at their highest production levels without failure. Because American Gypsum recognized how important it was to keep this wallboard operation running reliably and smoothly, it set out to find a maintenance management software system that would improve how equipment history was tracked.

A better way
As Maintenance Superintendent Bill Baxter tells the story, it was the Maintenance Department that set out to find the right tool to replace a manual process that included log books full of hand-written equipment records.

“We knew there was a better way to manage the data related to equipment PMs, repairs and scheduled maintenance,” states Baxter. “Keeping the production equipment running as much as possible with minimal downtime was key for us. While searching for a maintenance management system, our team focused on that objective and sought a package that was easy to use, simple to learn and without all the ‘extras’ that were unimportant to us.”

The team’s search led to Benchmate, which was immediately implemented and made available for use by both Maintenance and Production personnel. The new system rapidly took shape and soon became the key tool for the plant’s maintenance management. As users quickly became comfortable with the system, mechanics and electricians within the Maintenance Department learned how to create a parts requisition within the system and send it directly to purchasing. This now enables all maintenance requisitions to be housed within one system before going to the purchasing agent for review, approval and issuing purchase orders.

Beyond maintenance
On the Production side, Operations personnel began to utilize the Benchmate system’s trouble-call function— something that is accessible to all crew foremen on the plant floor. Trouble calls are issues that may arise at any time and require the attention of the Maintenance Department. Rather than having to track down a mechanic or electrician, Production personnel simply enter information pertaining to the problem area directly into the system. Maintenance Department workers can then review the trouble-call log and determine what action is required and when it needs to be addressed. This capability lets Maintenance know exactly what is happening with production equipment at all times. Furthermore, when the services log is created, a complete history of work performed against production equipment is maintained in the system for reference purposes.

Baxter notes that the software application has enhanced the way his team performs maintenance. Each week, a down day is scheduled just for planned maintenance. This provides the Maintenance team with an opportunity to review work to be performed and plan accordingly to accomplish the required tasks that keep the plant’s production equipment properly maintained. Complete equipment history data is maintained so that workers know exactly what has been done to every piece of equipment being tracked.

The heart of maintenance
“We use the system extensively to schedule our planned maintenance, the ‘heart’ of our maintenance management activities,” says Baxter.

American Gypsum normally sets preventive maintenance (PM) intervals based on expected equipment life from major parts down to component levels. Routine inspections are conducted on all critical components to detect any suspected failure points before failures actually occur. A mishap with even a drive belt could shut down production resulting in major schedule delays and losses.

To ensure that planned maintenance is optimized, stepby- step instructions outlining the work to be performed are loaded into the Benchmate system. Technical documentation and manuals are electronically attached within the system for quick and easy reference whenever required. Quick access to an equipment schematic or specific part diagram can save mechanics and electricians valuable time and improve their work performance. Reports showing weekly maintenance planning, backlog of outstanding jobs and number of man hours expended against equipment maintenance are routinely generated by the systems. These reports assist Baxter and his team in scheduling downtime and prioritizing the outstanding work orders that need to be executed.

Rock-solid support
Proper technical and system support is critical. Thus, Baxter maintains an annual services agreement that provides access to the Benchmate help desk and other support areas. Knowing support is only a telephone call away helps assure the Maintenance team that any potential system problems—or even what may turn out just to be routine questions—are handled professionally and thoroughly. The same level of service applies to system upgrades that are performed seamlessly to minimize interruptions to the plant’s work processes and related activities.

According to Baxter, the future use of the maintenance system for the plant will be in the area of spare parts inventory. “We currently have a large portion of our spares in the system,” he continues, “but we still need to add several records in the near future. Our plan is to use bar code scanning for our storeroom which will certainly improve our spares tracking and management of parts inventory.”

In this small town nestled in the valley of the Colorado mountains, American Gypsum continues to produce sheet after sheet of quality wallboard. Stacked high in a shortterm finished goods warehouse, product is promptly shipped to what will become an ultimate destination in either a home or building.

As the nation’s fifth largest wallboard producer, American Gypsum knows it must maintain a continuous production process to remain competitive. With a reliable maintenance system keeping track of critical equipment, this plant will continue to supply the construction industry with topgrade wallboard products and enhance its position as a market leader. MQ

Bob Nichols is founder and president of Benchmate. E-mail:; telephone: (360) 678-8358

Continue Reading →


6:00 am
June 1, 2007
Print Friendly

Maintenance Quarterly: Organizational Competency

0607_mq_professionaldevelopment_img1Staying competitive requires companies to strive for continuous improvement in all things, including the development of their personnel.

Traditionally, training has been limited to the individual level (and typically just for beginners or experienced people in new jobs). The need for constant improvement to stay competitive has changed that notion and brought words like kaizen into focus as a business strategy— as well as into the common vernacular. Organizations have discovered that kaizen (continuous improvement) must apply to all aspects of an organization. Continuous improvement must be the modus operandi (MO) for everything—including training.

What was once thought of as a necessary evil (or only something that an organization had to do) is now considered to be an enabling tool. Realization that ongoing training is the key that unlocks many secrets of poor productivity, poor safety records and poor attitudes and creates an environment that is alive with creativity. That creativity comes from the fact that the minds of the employees are always being challenged through continuous learning. This continuous learning prevents people from forgetting how to learn. This, in turn, makes their minds more nimble and agile, thus increasing the conditional probability that when faced with challenges in the workplace, creative and logical solutions will follow shortly thereafter.

With enough success at problem solving, the brain naturally gravitates toward eliminating the possibility of defects altogether—this is when significant improvements come. The identification and elimination of the root cause of the problem is the only true method of positive step change in performance.

One major goal
The continuous learning objective should be organized with one major goal in mind—organizational competency through clear definition of roles, responsibilities and definitive learning objectives. While employees within a company all pull together toward a common goal, their interdependencies form a complex system. Such a system is not unlike the machines that operate to produce the facility’s output.

To study an individual machine without studying its interdependency on the machines up and downstream is not acknowledging the effect that machine’s performance has on the system as a whole. Analysis of failure modes and their effects must be performed with the entire system in mind. People working together as a team are no different.

0607_mq_professionaldevelopment_img2To analyze the training and competency of one person without considering his/her effect on the team does not describe the effect of that person’s abilities or inabilities on the entire system or in this case the team.

Competency, then, has to be considered at the team level (also known as Organizational Competency). The ability of a team or an organization to successfully deliver results rests not only with the individuals knowing their roles and being capable of performing individually, but also on the ability of all those involved to work together as a cohesive



unit. Understanding the nature of humans to be technically competent—but not necessarily socially integrated with their fellow teammates—clarifies the point that individual competency isn’t sufficient to carry the day. While organizational competency relies on individual performance, it relies even more on interdependency and collaboration.

In the past, communication was touted as the answer to team interdependency and success. The problem is that communication normally carries the connotation of sharing information for information’s sake, not necessarily reliance for the accomplishment of specific objectives. Organizational competency carries the connotation of a group effort required for success—that an effort on the part of all is necessary and that everyone on the team is interdependent on everyone else.

Mapping competencies
A “competency map” is a particularly effective tool for showing the interdependency of people and concepts. Competency maps indicate roles and responsibilities of different positions relative to specific tasks.

It is common for a particular person’s roles and responsibilities to be the combination of tasks that exist on more than one competency map. For example, the Operations Manager might be involved in tasks on the Maintenance competency map, the Asset and Operational Reliability competency map and Stores Management competency map. This clearly defines the interconnection of the Operation Manager’s position with several functions within the plant. Competency maps go a long way in breaking down interdepartmental barriers and increasing the amount of collaboration within an organization. In essence, competency maps are crucial for the eventual development of Organizational Competency.

Competency maps also should include the depth of knowledge that a particular position should have in order to successfully perform that function. This accomplishes two things. First, it paints a clear picture of the breadth and depth to which a person’s education about that topic needs to go. Second, it easily defines how much other positions can expect out of that person with regard to that topic. Bloom’s Taxonomy of Learning Objectives (Fig. 1) provides an easy to understand table that defines the levels of understanding of a given topic.

An analysis of a reliability competency map, referred to in this case as a “Reliability Task Assignment” (as illustrated by the excerpt in Fig. 2), shows not only the interdependency of the three roles within the plant relative to a single task, but also the level of knowledge required for that role (refer to Fig. 1).

A typical example of the interdependency of these roles might play out as follows:

  • A new piece of equipment is purchased for the plant as a part of an expansion.
  • The Reliability Engineer develops a FMECA table for the failure modes of this new machine.
  • Failure modes that are more prone to random failures are assigned to the Condition Monitoring group as a part of the regular PdM inspection process. Failure modes that are more time-based and failure modes for which there exists no PdM or NDT technology are assigned an interval based on preventive maintenance (PM) tasks. Some of the tasks are inspections, some are lubrication tasks and a few are interval-based replacements of certain items.
  • A cost/benefit analysis is performed for each task to determine the most beneficial interval.
  • This constitutes the Engineered Maintenance Strategy (EMS). The EMS is the Equipment Maintenance Plan (EMP) that has been optimized for cost/benefit through the use of Weibull analysis and Monte Carlo simulation.
  • For those tasks requiring PM procedures, the technical content, including tolerances, specifications and specific procedures, is provided by the Maintenance Engineer.
  • The Planner then takes those technical steps and converts them to an effective PM Procedure making sure that the correct tools, parts and consumables are called out in the procedure. Additionally, the Planner is responsible for ensuring that this procedure matches the other PM Procedures in the PM program and that version control is maintained.
  • Both the Maintenance Engineer and the Planner are responsible for interacting with the Maintenance Crafts personnel to ensure that the PM Procedure steps are technically accurate and laid out in the most efficient manner possible.

Clear definitions and focus
Given a clear definition of roles, responsibilities and learning objectives, it is quite easy to develop individual training plans, as well as interdependent training plans for the whole team. As a result, a dynamic is created within the organization whereby interdepartmental barriers are easily removed because of operational objectives that require a high level of communication and a high level of collaboration. Responsibilities are more easily shared through clarity of purpose and individual accountability to the entire team. Turf wars and power struggles become moot points as the competency maps remove all the mystique of “whose job is this?”

The importance of individual competency begins to define itself within the context of organizational competency. Organizational competency then becomes the focus. True effectiveness and efficiency are the results of increased amounts of interdependency and collaboration. All of this is enabled through an effort for continuous improvement manifested in a focus on training, not just for the new employee but for all employees—all the time. MQ

Andy Page is training director for Allied Reliability, Inc. For more information and/or to request a complete Reliability Task Assignment Chart, as shown in Fig. 2, e-mail: pagea@alliedreliability. com.

Continue Reading →


6:00 am
June 1, 2007
Print Friendly

Elegant Maintenance Management: Manifesting The Big Picture

You, as a Maintenance Manager, can enjoy great success with the resources you already have, while leading your management into a new way of thinking. In the process, you’ll be changing into an “Elegant” Maintenance Manager. This type of transformation shifts the playing field—and calls for real strength on your part. As you “go for the gold,” you’ll need to challenge and overcome many misconceptions dotting the Maintenance arena. You’ll also need to embrace a number of proven management principles with which you may or may not be familiar.

Misconception #1:
That predictive maintenance (PdM) is a necessary part of a world-class Maintenance operation…
It is, but only when principally used in post-maintenance or equipment startup testing. If you need to use PdM in routine monitoring because of high equipment failure potential, your equipment is not being properly installed, maintained or operated. Your resources can be more effectively utilized in addressing those dysfunctional areas that cause unreasonable failures instead of collecting data that is not nearly as good as other readily available information, e.g., bearing temperature data gathered as part of a material condition inspection program.

0607_maintstrategies1Misconception #2:
That spending time to prepare a case for upper management in support of additional funds for the latest maintenance technologies is effective…You cannot simply apply organizational development (OD) methods to justify substantial increases in resources. How does a Maintenance Manager become an OD Specialist overnight? The reality is that if one cannot do a reasonable job with current resources, upper management, behind closed doors, will question granting more resources to a manager who they perceive to be ineffective.

Misconception #3:
That overhauls because of excessive wearout and/or obsolescence and modifications to existing systems and equipment are largely Operations and Maintenance (O&M) work (a misconception that is compounded when there is a complete absence of an in-house Engineering function)…This type of work actually has major engineering components in it (design, startup and testing) that go far beyond the nature of the O&M function. Failure to accurately understand the impact of modifications, obsolescence and extreme wearout on the Maintenance function leads to surprises that impact reliability and cost and result in puzzling failures of new equipment. Note, also, that modifications, obsolescence and extreme wearout are always present in industrial facilities and must be part of the ongoing maintenance plan to ensure reliability. When they are not, the net result is a pigin- a-poke for the O&M teams.

Misconception #4:
That the use of performance indicators is always effective… This is related to the “content approach to data collection” fallacy. The latter is the concept that if you collect as much data as possible, you always will have the data you need, when you need it. This is not true. The same applies to the use of countless performance indicators for gauging maintenance management effectiveness. The name of the performance indicator game is to select a handful of indicators that deal with the core areas of maintenance, which can be taken in snapshot fashion to provide self-assessment when needed. These include preventive maintenance/ corrective maintenance (PM/CM) program performance, CM backlog, supervisory performance, planning performance and materiel management to name one set of very useful indicators, especially for self-assessment. The Elegant Maintenance Manager always does self-assessment to better understand the advice and criticism of others, especially the stakeholders in his organization.

Misconception #5:
That your management systems work and that managers know how to manage, i.e., they know the nature of managerial work…This is the biggest misconception of all. Furthermore, it is the root of virtually all dysfunctional, reactive Maintenance programs. There is, of course, any number of other misconceptions. Many of them will be examined in detail in future installments of this series. Here, though, let’s begin laying
the groundwork for becoming an Elegant Maintenance
Manager by discussing basic principles.

Developing your game
If your management style is based on command and control, you’re really a supervisor. Get a new way of thinking about things.

Managers chart the strategic course for their organizations and lead them to the vision at the end of that strategy. In order to do this, the management system must communicate the requirements to achieve the vision and provide the means of implementing the series of decisions embodied in the strategy to achieve the vision. Communication takes place through the organizational structure of the management system that defines how information flows in the organization. This provides management and supervision with an operating structure that will support the effective, efficient functioning of the Maintenance organization. The “means of implementing” is the removal of roadblocks and allocations of resources at the right place and time in the right amount to the right person. In concise terms, this is what the management system must support, and the manager must have the vision and leadership qualities (leave the command and control qualities to the supervisors) that provide the driving force for the management system.



Managerial functions and practice
In the early 1970s, Dr. Henry Mintzberg identified three functional areas comprised of 10 practical areas that defined the essence of managerial work. Up until that time, elaborate management theory, command and control and groupthink traps had been dominating the management mindset in the United States, becoming especially forceful after the Korean War. Mintzberg’s thorough research on the matter clarified the work of managers, and provided the basis for the guiding principles that a manager must understand and embrace in order to do his/her job. Mintzberg got it right. Thus, more than 30 years later, his principles still apply.

But, what’s the purpose of having managers? If we organize and structure operations in a logical manner and write administrative procedures to guide the conduct of business, and if we hire personnel who have the functional capabilities that we need to execute the action plan of the organization, why do we need a manager at all? There are four primary purposes:

  1. The turbulent, changing environment will necessitate change in the organization and the person that controls that change is the manager. This person develops the strategy for controlling change in the organization.
  2. The manager is the person who provides the unprogrammed interaction to address imperfections and random events that affect the organization’s operations.
  3. The organizational growth can be transient and transitional in nature giving rise to a big-picture thinking that leads to strategy development and program implementation. The attendant internal changes that occur in an organization as it goes from inception to maturity require the input of a manager to make sure things happen correctly.
  4. Human cognitive problems, the psychological types and the varying degrees of ability of individuals give rise to the internal human conflict. It is impossible to program the handling of the human element by procedure or by checklist. We can do it to a certain extent, but the essence of managing the human resource requires an entity that can handle the unprogrammed, complex problem situation. Again, this is the manager.

Now that we understand the purposes of having managers, what might we expect from them? Based on Dr. Mintzberg’s research, we can look forward to the following:

  1. Managers ensure that their organizations serve their prime purpose and efficiently deliver products and services. We would expect the manager to deliver the basic mission of the organization.
  2. Managers must maintain the stability of their organizations’ operations to control the effective and efficient execution of the basic mission and to relieve anxiety in the workforce.
  3. Managers focus on the big picture, taking charge of strategy-making systems and adapting their organizations, in a controlled way, to changing environments.
  4. Managers ensure that their organizations serve the needs of the principal stakeholders and handle public relations. In so doing, a manager also must shield his/her organization from the external environment so as to prevent distractions to it and misrepresentations of it.
  5. Managers serve as the key informational link between their organizations and their environments. He/she establishes information flow within the organization and provides information about it to the outside world and vice versa. A manager guarantees that the information flow network, including all feedback networks, is functional in the organization.
  6. Managers ensure that personnel within their organizations know their jobs and how they’re supposed to do them. A manager also ensures that personnel understand that their goals and objectives must be linked to the goals and objectives of upper management.

The preceding lists reflect somewhat conceptualized purposes and expectations regarding managers. These four purposes and six expectations lead to the identification, based on actual observation, of three functional areas made up of 10 activities that define the essence of managerial work shown in Table I.

In Table I, we see the coalescing of the purpose and expectation concepts into actual functions that are observable in the day-to-day activities of managers. This is the nature of managerial work.



Developing the foundation
Is there a key part of Maintenance management that can be addressed to make the overall task cleverly apt and simple? I found the answer to this question while preparing my doctoral dissertation in business administration and management and providing advice, consultation and program development and implementation services to a number of industrial staffs. I later validated this answer during several tenures as a hands-on O&M Manager. The essence of the answer is information. Isn’t that what we might expect? The manifestation of the answer is in an information flow network. Let’s define this information flow network so we can establish the basis for “Elegant Maintenance Management” straightaway, then address a variety of Maintenance management tasks—cleverly and simply.

In the O&M assignments I’ve had over the past 15 years or so, my staffs and I have achieved production reliability levels in excess of 98%—and for the last four years in my current assignment, our production reliability level has been 100%. Interestingly, these levels have been achieved while meeting all schedule and budget requirements. In my current assignment, that means having to effectively manage the significant obsolescence and extreme wearout challenges one would expect to find in a 900,000-sq. ft., 24/7 production facility that is 16 years old. Yet, our site still has not had to resort to detailed, formal RCM, RCA, PdM or staff training programs to heal a formerly poorperforming O&M function.

During my first few attempts to evaluate performance of the previously mentioned Maintenance management functions and the Operations interface, I found that organization charts and administrative procedures were of little value in understanding the functional performance of the Maintenance organization. Theories as to what makes for a well organized department based on admin practices, span of control for supervisors, command and control performance and other assumptions from conventional thinking were of little or no value in most instances involving programmatic assessments. As a performance evaluator, I needed to come up with a methodology, or model, to help determine what areas of the Maintenance function were performing satisfactorily and which were dysfunctional. As a result, I developed the following information-based approach in the context of the information flow network.

First on the to-do list was the facility material condition inspection. This was a walk through the facility, observing everything and logging all deficiencies; it typically would require one full day at a singlestation nuclear facility. I augmented the deficiency data from the inspection with a printout of all of the PM and CM records for the previous year, along with all the open CM records for the past year. From these records I was able to determine the CM backlog as a function of time, the PM/CM ratio, and CM work that resulted from PM work (maintenance induced failures). This required an additional 4-8 hours. As the patterns of equipment and system failure emerged, I applied the information flow network (operating structure) concept shown in Fig. 1 to determine dysfunctional management system areas. Now, I was ready to chat with the Maintenance Manager—and the topic was not going to be his organization chart.

The applied information flow network
The information flow network concept, detailed in Fig. 1, is based on a logical structure that begins with a clear statement of what decisions need to be made. This requires a great deal from the manager since he/she must make the first move. It is a difficult task to specify the future for an organization, but it is a necessary part of the job for every manager. This is a manager’s principal leadership role, and much depends on it, as can be seen by what follows.

If the decisions to be made are specified, the alternatives the organization needs to develop can be better understood. The responsible groups will have sufficient direction to begin their complex analysis task. When alternatives are specified for development, the analytical methods and processes are easily identified, and hence, the information needed to support the analysis becomes known. Once the information is known, the minimum data set is specified. Accordingly, it should now be possible to provide management and supervision with a decision-support strategy. The problem is this: “By the information flow network, we know what to do to create a decision-support capability, but what does its operating structure look like?” It looks much like the one detailed in Fig. 1.

Operating structure overview
The operating structure (OS) is based on the idea that problem identification and corrective action occur at the OS functional level. This is defined as the level at which the organization’s information circulates, where information is used in the broad sense to also include information in forms as needed and used by managers and supervisors. For example, alternatives are needed by managers in decision-making. Therefore, alternatives would be considered as information.



Since supervisors and managers need to measure departures from goals and objectives, a performance indicator report would be considered as information for a supervisor or manager.

It is notable that information per se is critical in diagnostics, and information flow is critical in decision-support applications. By comparison, other methods of problem identification and corrective action determination dwell on a functional level defined by the organizational task differentiation. For example, the functions referred to by Accounting, Marketing and Production are simply department- level functions.

Reshuffling departmental organization charts to match the environment will do little good if the OS functions (information circulation level) are ignored. Furthermore, in problem organizations, corrective action that results in reorganizing the department to create a new “seating chart” and redefine formal authority and responsibility succeeds only if the new players recognize the deficiencies in information circulation. The correct thing to do is to revamp the management systems in the department to ensure there is an adequate OS functional capability to implement the organizational directives and make the action decisions to keep operations on track to achieve objectives. As shown in Fig. 1, the OS includes the following six basic functions:

  1. Strategy–The ability of management to convert the goals and enabling objectives into detailed action plans (This helps visualize what management wants the operation to look like in the future.)
  2. Data Collection–The types of data that are acquired by the organization, their accuracy, their relevance and methods of acquisition (This is, largely, a key responsibility of supervision in the form of implementing the record-keeping aspect of control of maintenance activities.)
  3. Data Processing–The manner in which data is arranged, manipulated, summarized and formatted to develop relevant management reports and indicators of performance (This often is confused with analysis; it is not.)
  4. Analysis–How decision-makers use the information to compare actual performance to predicted/required performance and what analytical methods must be developed and applied to support decision making
  5. Feedback Networks–The ability to measure deviation from objectives and report this to management and supervision for resolution. (Coordination among departments is also part of the feedback network. Note that feedback consists of at least three separate loops. First, there is the day-to-day loop for supervisors to keep operations on track. Second, there is the longerrange loop for supervisors to adjust communication and implementation methods to ensure proficiency of the workforce. The third loop applies to the longer time frame that is required of managers to evaluate and revise strategic elements. Each of the feedback loops is unique with different analysis support requirements, different time horizons and different distribution requirements.)
  6. Control–Activities that are used by the decision maker to communicate goals and enable objectives, and to determine if management directives are being effectively implemented. The effectiveness of the control function is related to the organization’s historical ability to meet its objectives. Often defined as part of managerial work, it is in fact the dominant activity of supervisory work. When managers inject themselves into the control
    function (sometimes referred to as “hands-on” management)
    micromanagement results, and supervisors may
    lose interest.

The OS is an approach to organizational diagnostics based on information flow networks that can support management problem solving. It is used to identify problems and to determine what decisions need to be made to solve those problems including what information is required to support decision making. This OS concept is based on the information flow networks in the organization’s management systems. The OS concept is the notion that problem identification and corrective action can most effectively be performed at the functional level.

The OS has two application modes. First, it can support organizational programmatic assessment and evaluation. Second, it can be used as a tool to support organizational and programmatic development or upgrade. Both applications are related and are generally applied in the sequence given.

In the following discussion, the OS emerges as a problem identification tool (diagnostic) in organizational and programmatic applications.

As in any diagnostic, success depends on the experience and expertise of the evaluator and the extent and depth of his/her analytical skills. The reason for this, especially with OS, is that OS is not a checklist, expert-system type diagnostic. The evaluator will be challenged in diagnosing organization and program problems in the true sense of analysis. Consequently, he/she will need to decompose the organization and programs into the elementary OS functional levels, then develop and choose problem solutions to synthesize into a new organizational and programmatic whole. To do this, the evaluator must categorize each organizational and programmatic element as one or more of the six OS functional elements. Where OS deficiencies exist or OS functions are non-existent, he/she can characterize such problems in terms of OS functions. Problem solutions in terms of OS functional development or upgrade follow directly.

Elegant Maintenance Management starts with the fundamental understanding of managerial work and the operating structure to manifest the big picture of organizational strategy, structure and process. With a basis in information and an information flow network in the organization, ignorance does not have to be an option or default for anyone associated with the organization. With information comes the power of effective delegation, so decision-making can occur at the operator level in the organization as desired (empowering your staff).

Elegant Maintenance Managers also are always mindful of appearance. They focus on two behaviors in all they and there subordinates do. These behaviors are:

  • Businesslike–Always act with a sense of concern and responsibility about your organization’s mission and your specific assignments.
  • Professional–Demonstrate experience, expertise and intellectual skills in all you do. Be respectful of the opinion of others. Never lie, tell stories, blame others or make excuses; simply tell the truth.

Or, to sum it up in a more colloquial sense, never argue with an idiot. Observers may not be able to tell the difference.

Dr. Huzdovich is the service contract manager for Raven Services Corporation at the Bureau of Engraving and Printing’s Western Currency Facility in Ft. Worth, TX. He directs the O&M and engineering work performed by the Raven staff of 58 employees, which is responsible for the 24/7 operation and maintenance of all stationary and production support equipment in these operations, including their 850-ton chilled water units, 800 hp low-pressure steam boilers, 3600 KW of diesel generator capacity, the environmental management system and currency mutilation destruction equipment. He also is the principal engineer and consultant providing maintenance and reliability services and expert witness services for Forensic Action Services, LLC, in Denton, TX. E-mail:; telephone: (817) 847-3674.

Continue Reading →


6:00 am
June 1, 2007
Print Friendly

Comprehensive Failure Investigation

0607_failureanalysis1Looking for a basic, step-by-step approach to improved problem solving around your plant? Start here.

Failure analysis, incident investigation and root cause analysis are among the terms used by organizations to refer to their various problem-solving approaches. Regardless of the name, these types of investigations typically boil down to three basic questions:

  1. What’s the problem?
  2. Why did it happen?
  3. What should be done to prevent it from recurring?

These questions, or steps, are the framework for information collection that is then organized with the help of tools such as timelines, diagrams/photos and process maps. Together, these steps and tools lead to comprehensive failure investigation, which can be defined as the collection and organization of all necessary information to answer the three questions thoroughly and completely, supported by clear, concise documentation of the incident.

Two important points apply to every aspect of an investigation: focusing on principles and being specific.


Principles are constants. They do not change from problem to problem. The cause-and-effect principle is fundamental to all investigations. This principle applies to equipment failures, supply chain problems, production outages, customer service issues and people problems the same way. By focusing on the principle of cause-and-effect, an organization can develop a consistent approach to investigating and solving all problems. If you confront a problem that appears to contradict basic physics, check your assumptions, because some are not accurate.

Remember: There are no equipment failures or problems in a facility that defy the laws of physics and chemistry.

There always will be an explanation or truth to what has already happened. Think of this in terms of the terrain in your hometown. The map of the town represents the terrain. The creation of the town map should be an objective exercise because the roads already fit together in a particular way. The map should match the actual terrain, just as the investigation should match the incident that occurred.

Many people think of cause-and-effect as a linear relationship, where an effect has a cause. In fact, cause-andeffect is an example of a system. A system has parts just like an effect has causes. The equipment downtime came about because a part failed. We find that the part failed because of fatigue. The next question is: “Why did it fatigue?”… and the why questions just keep coming. Most organizations mistakenly believe that an investigation is about finding the one cause, or “root cause.”

Remember: An effect doesn’t have a single cause—it has “causes,” which reveal different ways to solve the problem.

0607_failureanalysis3Be specific…
The word “analysis” means to “break down into parts.” Failure analysis, problem analysis and root cause analysis all start with a problem, then break it down into parts—which are the causes. Identifying the causes reveals additional ways that the problem may be solved. As the causes become more specific (detailed), the solutions also become more specific.

Remember: Problems are solved when specific action is taken. Problems are not solved in general—the devil is in the details.

One common mistake that many organizations make is trying to group an entire investigation into one category. This makes the incident more general, not more specific. The five most common generalizations are: human error, procedure not followed, equipment failure, inadequate training and design. Many groups believe that the end of an investigation has been reached if they can get to one of these five categories.

Remember: Don’t generalize an investigation—ask more “why” questions and be specific.

0607_failureanalysis4Conducting the investigation Step #1: What’s the problem? (the definition)
Everyone seems to know that defining the problem is the first step in an investigation. How this is done varies widely. Some groups write a lengthy problem statement and then debate the wording for 30 minutes or more. A facilitator should remember that people see problems differently. When someone states his/ her view of the problem, be prepared for the fact that someone else is going to disagree and offer a different problem. The word “problem” itself is problematic, in that people use it for whatever they see as the “bad thing.” To accurately define a failure, there are four more simple questions answer:

  1. What’s the problem?
  2. When did it happen?
  3. Where did it happen?
  4. How were the overall goals impacted?

Instead of writing a long problem description, simply answer these four questions in an outline format—and don’t write responses as complete sentences, just short phrases.

The question, “How were the overall goals impacted?” captures the overall magnitude of any issue. While the first question—“What’s the problem?“— reflects individual views of the problem, the company is going to view the problem as any deviation from the ideal state. For a manufacturing company, the overall goals (or ideal states) typically include: no safety injuries, no environmental issues, no customer service issues, no production problems and no excess materials or labor spending.

The overall goals that were negatively impacted by a failure incident provide the starting point for the “why” questions. Step #2 does not start with what people see as “the problem,” rather, it begins with the impact to the overall goals (the 4th question). People see problems differently, but defining every failure by how it negatively impacts the goals provides a consistent starting point. Start with the impact to the overall goals to define your next problem.

Step #2: Why did it happen? (the analysis)
It’s important to remember in this step that every effect has “causes” (plural). While people may try to identify the single cause of an issue (commonly referred to as the “root cause”), the fact is there is not just one cause of an incident—there are causes.


The fire triangle in Fig. 2 shows us, there is no single cause for a fire; there are causes—heat, fuel and oxygen. Controlling any one of these causes will reduce the risk of the fire. Most people mistakenly believe oxygen is a “contributing factor” to a fire, meaning on its own it can’t produce a fire. In reality, there is no difference between a contributing factor and a cause. A cause, by definition, is required to produce an effect. Oxygen is required for fire; therefore, it is a cause of fire. On its own, oxygen will not produce a fire. Neither will heat nor fuel. Fire requires all three causes, heat, fuel and oxygen. Every effect requires all of its causes.

The most effective way to communicate all causes of an incident is through a visual format, similar to that in Fig. 2. The cause-and-effect analysis should start with a discussion of the goals that were impacted, followed by the asking of “why” questions moving to right. The simple convention is effect on left, cause on right. “Why” questions take us backwards through the failure. Visually breaking down the cause-and-effect relationships is the simplest way to document an incident during the investigation

The focus of Step #2 is on generating an accurate causeand- effect analysis with a sufficient level of detail. During this step detail is added to the timeline, diagrams and photographs are utilized and specific steps of the processes are identified to ensure that the analysis is accurate. The facilitator is typically moving back and forth between the different tools and the cause-and-effect analysis as information becomes available. A complete analysis identifies the causes and validates them with evidence.

Step #3: What should be done? (the solutions)
The solutions step is where specific actions are defined to prevent the issue from occurring. This step begins once Step #2, the analysis step, is complete. The solutions step breaks into two parts:

  • possible solutions are identified first;
  • then they are pared down to the best solutions.

The analysis step is objective and based on evidence, while the solutions step is subjective and creative.

Possible solutions are the different ideas that people think up by examining each of the causes. Ideas come from those who are involved with the problem. Managers, engineers and supervisors will have some ideas, as will designers, manufacturers and vendors. People who operate and maintain the system or equipment on a daily basis also will have ideas. To get their ideas, ask—most importantly, ask those who are closest to the work. It is crucial for people who are involved in the problem to be part of the problem-solving process. There is a significant amount of knowledge and brainpower within organizations that is underutilized because it is not asked for regularly.

The best solutions are selected based on how effective they are and the level of effort required for their implementation. The effectiveness of a solution is a function of its reduction on the impact to the overall goals, while the level of effort is a function of the resources, cost and time to implement the solution. Possible solutions can be ranked based on effectiveness and effort so that the best ones are revealed. These best solutions become the action plan with specific owners and due dates.

Organizing the investigation
Defining the failure and its impact on the overall goals in Step #1 is based on answering a very specific set of four questions, something that typically takes less than five minutes. In the analysis step (Step #2), when the causeand- effect relationships are being identified, information is being discussed using timelines, diagrams and processes. People may offer some causes, explain the sequence of events, then review a process step, draw a picture and then go back to discussing causes. Regardless of what people offer it should be captured with the appropriate tool.

Some information will appear in both the timeline and the cause-and-effect analysis. A diagram may contain a drawing of the part; the timeline may contain some history about the part and when it failed; the cause-and-effect analysis will contain the causes of why the part failed.

The facilitator’s role is to keep the group focused on those three basic questions common to every investigation— “What’s the problem?”… “Why did it happen?“… “What can be done to prevent it from recurring?“—and to appropriately organize all information. The following notes highlight the tools needed for organization of the collected information:

Capture the timeline… A timeline, also known as a sequence of events, defines the chronological order of occurrences for a given issue. The simplest way to create a timeline is in a table format with date, time and description headers. Each entry, which should be a short phrase, not a complete sentence, corresponds to a specific date and time.

The timeline shows what happened at a specific date and time, but it does not explain why it happened. A timeline is dependent on time. A causeand- effect analysis is dependent on causes (the “why” questions). The timeline entry may be “9:05AM, Valve opened,” but the causes of why the valve opened are located in the causeand- effect analysis.

A timeline should always be constructed for larger issues. Background information also can be added to the timeline instead of being written in a separate paragraph. The time scale on a timeline can be based on years, days, hours, minutes or seconds—but it also can change throughout the timeline, as long as entry is placed in the proper chronological order.

Timelines are very helpful tools in investigations. They complement thorough cause-and-effect analyses, but they don’t replace them. Many organizations mistakenly consider a timeline the analysis of the failure. Make sure that your organization doesn’t.

Remember: Simply identifying the sequence of events does not explain the cause-and-effect relationships.

Use diagrams, drawings and photos…
Visual tools, such as diagrams, drawings, sketches and photographs, give people a common view of the issue. Without these, everyone has his/her own mental picture of the failure. A simple sketch on paper or a dry-erase board immediately provides a group with a picture that everyone can edit, improve, point to and comment on.

Don’t overlook the importance of a simple sketch. Mechanical drawings and diagrams from manuals or the original equipment manufacturers also can be used during the investigation to improve the accuracy of the analysis.

Photographs are especially helpful because they create such a simple and accurate record. Digital cameras allow people to take plenty of pictures so that the most relevant can be selected later. Digital photos easily can become part of the investigation record.

Remember: The more detail that’s included in a diagram, drawing, sketch or photo, the more specific the discussion can be.

Review the processes…
Identifying the processes that were in place before the failure occurred is extremely important in order to prevent the incident from happening again. Recurring problems are symptomatic of not managing by process.

A thorough investigation includes a review of the processes that produce the failure. (It’s much like a mechanic who must know how the transmission works in order to explain why the transmission failed.) During the investigation, a clear understanding of the current work process helps explain what specifically was being done to lead up to the failure. Secondly, the process needs to be well understood so that specific improvements can be made within the process to ensure that the failure doesn’t happen again.

Remember: The best solutions are all actions that will be implemented within the work processes.

A complete investigation
The ultimate output of an investigation is the implementation of the action items to prevent failures from occurring. The purpose of a comprehensive investigation is a thorough and accurate understanding of the incident so that the most effective solutions can be identified. An investigation has a very specific purpose. Everyone participating in the investigation should be focused on positively impacting the overall goals of the organization.

The steps and tools covered in this article are all parts of a complete investigation. Each step and each tool has a specific way of capturing and presenting information. They are intended to simplify and organize all of the different pieces of information that become part of an investigation.

Documenting the investigation as it is being conducted plays a significant role in how well people understand what happened and why. How clearly the incident is documented can affect how well the investigation goes. The rate that the information is collected also can affect how well the investigation goes. The point of all this is that effective investigations cause organizations to become more effective. Likewise ineffective investigations are a competitive disadvantage.

Experiment with and practice any one or all of the steps and tools in this article. Begin, now, to improve the way your group analyzes, documents, communicates and solves problems.

Mark Galley’s practical investigation experience spans many different types of industries and issues. He has been leading investigations and Cause Mapping workshops for ThinkReliability since 2000. Prior to that, he had worked with the Dow Chemical Company, as a reliability engineer for almost nine years. Galley holds a B.S. in Mechanical Engineering from the University of Colorado, and is a Certified Reliability Engineer through the American Society of Quality. E-mail: mark.galley@

Continue Reading →


6:00 am
June 1, 2007
Print Friendly

Analyzing The Relationship Of Preventive Maintenance To Corrective Maintenance

What constitutes adequate PM frequency in your plant? How do you know when enough is enough? Where do you want your PM program to take you? Think before answering. No guessing allowed.

When establishing an effective maintenance program, one must determine not only which preventive maintenance (PM) routines to accomplish, but how often they should be done. On the surface, the answer to this question would seem to be quite simple. Is it really?

One proven theory is that the PM to corrective maintenance (CM) work order ratio should be about 6 to 1. This theory assumes that PM inspections reveal some type of corrective work that should be completed on an asset on average every six times it is performed. The assumption is that, if the ratio is greater than 6:1, you are performing the PM too often; if the ratio is less than 6:1, you are not performing it often enough. (The “6 to 1 Rule” was proven by John Day, Jr., manager of Engineering and Maintenance at Alumax of South Carolina, during the period when Alumax of South Carolina was certified as the first “World-Class” maintenance organization.) You might accept this theory, put it in place in your Maintenance program and forget about reading the remainder of this article. Or, you might choose to continue reading, as we attempt to prove or disprove this theory.

How effective?
Preventive maintenance is that activity performed in some routine or regularly scheduled fashion designed to keep equipment in an existing state, prevent deterioration or failure and identify work of a corrective nature to keep equipment from causing non-productive time in any capacity. This is the detection phase of the PM investment— the conditions that we identify and correct prior to failure are the return for this investment.

Each PM that we develop and implement in our Maintenance organization will require some definite period of time for a Maintenance or Operations person to accomplish. How many PMs, how often, can we accomplish with our workforce, reserving a certain percentage of each day for emergencies, unplanned work and planned corrective work? Should we attempt to implement some type of control over how often we conduct PMs? You may realize, as we continue on this path, that the frequency assigned to many PMs has as much to do with effective manpower utilization as it does with discovery of potential asset problems.

All too often, our organization sees Maintenance departments being overwhelmed by the number of PMs required on a daily and weekly basis. It’s not that these clients have so many PMs that all cannot be completed. It’s because so many PMs must be accomplished, that there is little time allowed for emergency work and NO time for either solid corrective work to prevent emergencies (which capitalizes on the payback for our investment) or to perform other unplanned work. We also must consider the potential to “PM the equipment to DEATH!” Such a scenario—performing more PMs than we should—creates more problems than it solves.

The obvious question is: “How effective is your PM program?” The short answer is: “If your PM program isn’t finding problems, it isn’t effective.”

How often?
How often should we perform any one PM procedure? It occurs to me that the frequency of performing a PM should be based on asset failure rate, or Mean Time Between Failure (MTBF) rather than on the NON-failure rate.

If I run a particular piece of equipment to failure, fix it, then run it to failure again, what is the MTBF? Knowing that number, I should be able to calculate a realistic time frame where, if I perform some routine checks and preventive measures on this asset, I have the opportunity to identify potential problems and fix them, significantly extending the MTBF. Is this not our primary goal in maintenance? More does not necessarily equal better in preventive maintenance.

Let’s consider the air compressor PM that is scheduled once every month. It requires that you observe the compressor in normal operation and look for any obvious signs of potential problems, such as air leaks, knocking in cylinder areas and improper operation of the automatic drain on the accumulator. It also requires that you shut down the compressor and check the condition of the air filter and oil level.

This air compressor PM is estimated to take one hour to complete. If you accomplish it as scheduled for six months and do not note any problems, or do not have to change the air filter or replenish the oil, what can you say about this PM?

First, you have used at least five man-hours of valuable craft labor performing a PM that you did not need to do after the first time, and the compressor was “off-line” for five hours during this PM activity. What other work could you have completed with those five man-hours?

Second, you can make a logical assumption that once per month is simply too often to schedule this PM. What to do? Should you change the PM frequency to “Annual” right away? If you did not generate a CM work order in six months, perhaps three or four months would be a good place to start. In this case, extending the length of time between PMs by a factor of 12 (to Annual) might result in missing an indicator of a potential problem and prove more costly in the long run.


Work into the appropriate frequency carefully. If by extending the periodicity of the compressor PM to three months, you identify corrective work every fifth or sixth time you accomplish the PM, you are in the right neighborhood for an effective PM program for the compressor. If not, keep adjusting the frequency until it gives you the outcome you need for an effective, proactive PM program. Good PM work order history will point exactly to the right frequency as you continue to experiment with periodicity for the PM.

How proactive?
When first implementing a proactive preventive maintenance program, you should establish frequency of PMs with a conservative view. Many sites implement their programs using manufacturer-suggested PM frequencies which, although usually on the conservative side, are a good starting point. Adjusting the frequency of PMs as good history develops is a benchmarking exercise driven by the desire to identify deficiencies and correct them before they become emergencies—while at the same time striving for the best possible use of your available Maintenance resource hours.

Consider the types of maintenance actually being performed at your facility. There is preventive, predictive, condition monitoring, corrective and emergency work in your basket, with every type of facility and industry expending certain percentages of Maintenance resources on each type.

We’ve already discussed preventive maintenance to some degree.

  • Predictive maintenance (PdM) technology uses some proven testing method, such as thermography, tribology or ultrasonics, to trend equipment performance and “predict” when certain preventive maintenance activity should be performed, thereby heading off a potential failure.
  • Condition monitoring is the practice of closely monitoring equipment on a continuous basis to provide early detection of symptoms that could cause problems or failure, then performing some corrective actions to preclude the problem or failure.
  • Corrective maintenance (CM) is the act of performing some repair or adjustment for a condition that was identified during the accomplishment of a PM or PdM evolution (and cannot reasonably be corrected within the allowed labor time for accomplishing the PM or PdM).
  • Emergency work requires little definition; it is work performed in direct response to a failure that causes process downtime or imminent hazard to assets or personnel.

This is the appropriate time to note that, with the exception of emergency work, all of the other types of work are predicated on finding and fixing a problem before it becomes a “downtime” event. Perhaps we could categorize all maintenance except “emergency” under the umbrella of “preemptive” maintenance.”

How close are you?
We have taken a “long, strange trip” around the Maintenance horn. Are we any closer to determining the proper ratio of PMs to CMs in a proactive Maintenance organization?

Two true statements that can be found in nearly every maintenance-related publication on the market are that:

  1. You must evaluate the ratio of preventive maintenance actions to corrective maintenance actions to determine the effectiveness of your PM program.
  2. In a proactive Maintenance environment, PM activities should account for approximately 30% of total Maintenance resource time.

The real answer could be that the ratio of PM to CM is dependent on several variables, including:

  1. Asset criticality—If the asset fails, what is the impact on production or safety.
  2. Asset age—Equipment histories will prove that most failures occur during infancy (newly installed or overhauled) and old-age (self-explanatory).
  3. Asset history—How many times has the asset failed in the past (MTBF)? Answer the what, how, why, when and how much.
  4. Asset technology—Do you need to PM a state-of-theart digital measuring device as often as the analog one installed next to it?
  5. Trust—How much do you trust the asset to perform as designed when scheduled to run?
  6. What percentage of your total Maintenance resources is expended on PMs? On CMs? On emergencies?
  7. Can you change PM frequencies within the guidelines you have established for yourself (ref: ISO, CFR, MILSTD, etc.)?

Summing it all up
If you work through all of these exercises, you will find that the proper PM to CM ratio for the majority of your assets likely will be close to the 6:1 ratio mentioned at the beginning of this article. This is not to say that ALL PMs should be targeted for that ratio, however.

The “6 to 1 Rule” worked for John Day and Alumax, and it works for many other facilities that have chosen reliability excellence and the proven processes that come with dedication to a proactive approach to maintenance. Still, each asset in your facility must stand on its own merits. Furthermore, the proper ratio of PM to CM must be determined as a direct result of analyzing past performance and PM work order history, not a guess on the part of the Planner, Supervisor or Maintenance Manager.

Bob Call, based in Charleston, SC, with Life Cycle Engineering (LCE), has over 20 years experience in the Maintenance and Reliability field. He has developed numerous training programs and has a solid background in the research, development and implementation of mechanical integrity programs, preventive maintenance programs and computerized maintenance management systems (CMMS) for a wide range of commercial/industrial facilities. For more information, circle 1 or visit; or e-mail:

Continue Reading →


6:00 am
June 1, 2007
Print Friendly

Solution Spotlight: World-Class Vibration Programs

0607_solspotlight2Up until now, one of the greatest challenges in implementing world-class vibration programs has been the cost to support in-house personnel and efforts. Today, there’s a cost-effective alternative. According to Vibration Analysts Inc. (VAI), with its powerful new vibration data analysis process, companies now have access to an entire team of professional vibration specialists who can do the job for less than what a quality in-house program might cost. In fact, this supplier notes that it can quickly, economically and accurately analyze mass quantities of vibration data for any customer—anywhere. The only   requirement is Internet connectivity.

VAI’s patented process ensures that five Professional Vibration Analysts will review all vibration data, on three levels of increasing experience, with multiple internal peer checks to ensure quality results. This service also guarantees that a Certified Level-III/Category-IV Vibration Analyst (of which there are fewer than 100 worldwide) will analyze all vibration issues. By processing all data through five Vibration Analysts on three levels of increasing experience, each Analyst is able to maximize the efficiency of his or her data analysis.


Comparing his company’s services to in-house vibration programs, president Ray Rhoe points out that VAI’s process typically can reduce vibration program operating costs by $50,000 annually. For example, VAI’s typical monthly charge to analyze vibration data for a nuclear power plant is $2000. This allows for up to 80 machine-train analyses per month. “Larger operations, with multiple sites,” Rhoe says, “could save substantially more.” In fact, the hardest part about saving clients money is just getting their attention.

Less waiting, wondering, worrying
While the VAI process clearly can save money, it also can cut down on many of the nagging questions that go along with managing vibration programs, including how quickly data is being analyzed. For example, since VAI offers the process on a global basis, and since data can be reviewed as soon as it is collected and downloaded, immediate 24/7/365 analysis of critical component issues is now available. In addition, many of the issues associated with in-house vibration program efforts are totally eliminated, including those of hiring and training qualified personnel, ongoing technical support and expert assistance, employee turnover, sick leave and vacation, among others.

VAI process overview

Step 1…
VAI’s Certified Category-IV Vibration Analyst starts the process by reviewing the entire database for all new customers. This ensures that all component issues have been identified. Corrective recommendations and technical support are then provided as needed to help resolve those issues.


Step 2…
Every month thereafter, two Category- II Analysts independently analyze all new vibration data. They look for step changes, increasing trends and new frequencies that may indicate degraded equipment conditions, then peer check each other to ensure quality results. Any issues are forwarded to the next higher-level Analyst for a more detailed evaluation.

Step 3…
Two Category-III Analysts independently analyze all new issues in greater detail. They determine the cause of the issues and provide recommended corrective actions, then peer check each other to ensure quality results. At that point, they forward all issues to the Category-IV Analyst for the final and highest level of review possible. They then go back and review all new data to ensure that nothing has been overlooked.

Step 4…
The Senior Category-IV Analyst reviews all issues to ensure that the analyses and corrective recommendations are complete and accurate. A determination is made regarding the need for additional data to support the analysis, and the customer is notified of any immediate concerns. When finished, the Senior Analyst also goes back and reviews all new data to ensure that nothing has been overlooked.

Step 5…
A detailed monthly report is issued. These reports identify component ID, point locations that were affected, frequency(s) of concern, screen print of the vibration data, problem description, analysis of what the problem is, and corrective recommendations.

Every month thereafter, VAI repeats Steps 2-5 to ensure that customers are receiving the most accurate vibration data analysis possible. This leads to improved equipment reliability and reduced program costs for the customer.

Can it work for you?
To help prove that its patented process works, VAI now offers customers a
free one-month demonstration of its
services at no cost or obligation.

Vibration Analysts Inc
Byron, IL

Continue Reading →


6:00 am
June 1, 2007
Print Friendly

All Shook Up

Designing a best-in-class vibration analysis program starts at the beginning.

It’s no secret that when machinery operates effectively and efficiently, it will run longer,
more safely and without unscheduled downtime. That surely can improve the bottom line.

0607_techupdate1Vibration analysis can be a very effective part of a good predictive maintenance (PdM) program and an integral part of an overall condition-based maintenance (CbM) approach. The theory itself is pretty straightforward. An increase in vibration almost always accompanies deterioration in running conditions. Therefore, monitoring vibration levels can indicate the general condition of a machine.

Unfortunately, not all vibration analysis programs are as clear-cut. Many programs today aren’t as effective as they could be, due largely to lack of resources, time and formal program processes.

Is your CbM vibration program living up to its full potential? There are several things to consider when evaluating it. Do you struggle with:

  • allocating time and/or resources to run the program effectively?
  • turning raw data into actionable information?
  • implementing a “close loop” program process that ensures recommended actions are handled quickly while correctly fixing the diagnosed problem?
  • guaranteeing consistency and quality of data?
  • sharing equipment problems/diagnostics effectively throughout your facility or corporation?
  • assigning annual cost benefit dollars to the effectiveness of your program?
  • achieving (or even tracking) the stated goals set when approval was given for the program’s inception?

If you are struggling with any (or most) of these issues, you’re not alone. Many programs today simply survive by keeping their basic lifelines functioning, collecting mountains of data and performing basic single-pass analyses, resulting in quick and dirty maintenance recommendations. Little, if any, time is left to perform needed program functions such as tracking predictive work actions to establish repair effectiveness, establishing new baselines, reviewing alarm effectiveness and fine-tuning alarm parameters, database management, failure analysis, bad actors review, analysis technique review and sharing valuable information across sites, business units and corporations. The problem worsens significantly when an attempt is made to truly integrate all predictive technologies into a comprehensive program.

The most effective vibration analysis efforts ensure that the program doesn’t just stop after the monthly vibration rounds are complete, points in alarm have been identified and machines with high levels are passed to Maintenance Planning for action. Communication of the diagnosed problems through clear, concise, easy-to-navigate reporting tools and the ability to track the repair actions through completion are crucial to a program’s success. Evaluating the benefit of those actions also is critical. This ensures that a program truly turns the raw vibration data that is meaningless to most Operations and Maintenance personnel into useable, actionable recommendations.

The right design decisions
For many organizations, these problems are real—and seemingly insurmountable. The problem often begins with the initial program design. Critical design decisions might be overlooked or minimized. More time is typically spent trying to choose the best hardware and software platform and not on ensuring that the program design is appropriate to achieve the stated goals. Whether the goal is to lessen unplanned failures and downtime, minimize emergency or overtime work load, increase or improve availability or just reduce maintenance costs, the program needs to be effectively designed from the beginning in order to achieve stated goals and provide a return on investment. There are several critical decisions to make during the program design phase (before moving forward in purchasing equipment and toward implementation):

Program goals or expectations…
What should the program achieve to be deemed successful? What is the payback required? What is the timeframe? Are the metrics required to fully evaluate the cost benefits of the program available? Programs that are developed without a good foundation for tracking benefits or ROI usually become less effective over time and harder to justify from the standpoint of resources, costs and ongoing training requirements.

Critical equipment assessment…
You probably cannot—or do not want to—cover all assets. Determine what equipment is critical to achieving your goals and to maintaining continued operations. Use an 80/20 rule approach to make sure that you spend your time wisely and realize the maximum return on your investment.

Technology assessment…
Which predictive technologies (vibration, infrared, oil analysis, ultrasound, MCE, etc.) should be applied to the critical equipment list to ensure effective detection of impending problems or problem types?


Required frequency for collection…
How often does trend data need to be collected and stored for proper analysis and predicting typical machine failure modes? Machine types with typically short failure modes will require data collection intervals to be equally short or even continuous. Failure modes that usually develop in a longer period of time can be handled with manual collection and at longer intervals, and they still will be effective.

Required program equipment…
Information will need to be gathered on the necessary program equipment to achieve the desired data at the desired frequency. This process will probably require working with multiple vendors for hardware and software to support their technologies. Program quality and effectiveness… Most programs are designed to focus solely at a site level. Consistent program traits and quality are very difficult to manage without the proper tools. There is tremendous benefit in establishing program templates for data collection, data analysis, reporting and deriving cost benefits. Additional benefits come from the ability to analyze equipment type trends over similar operations across multiple sites. These trends can prove OEM quality or flaws that can cause significant problems throughout your operations.

In-house vs. contract services…
Possibly the single most important decision is whether to run the program with existing resources or to outsource it to certified experts who can manage the program for you. This decision needs to take into account the availability of resources, the training requirements of those resources, the initial program investments (including all hardware and software), the ongoing training and program costs and the other inevitable distractions that will come when using in-house employees with many other job duties to run your program.

Depending on your situation, contracting with experts whose core expertise is designing integrated programs to achieve program goals and provide services such as vibration analysis, infrared and oil analysis as primary offerings might make the most sense in that such arrangements can allow your operations to focus on its core maintenance, operational and production goals.

The right partner(s) should be able to assist with all of the previously discussed design decisions. They should be focused on providing integrated program services, as well as tools to track the program at the data level and analyze that data with true sortable, Web-based integrated reporting, repair tracking and real-time, customizable cost-benefit tracking. In short, the right partner(s) will have the right approach and the right technology for running a successful, corporate-wide integrated PdM program.

John Pucillo is manager, Vibration Services, with Predictive Service (PSC). Telephone: (216) 378-3500.

Continue Reading →


6:00 am
June 1, 2007
Print Friendly

Uptime: Leadership, Reliability & High Gas Prices


Bob Williamsom, Contributing Editor

As evidenced by our current gas pump crisis, insufficient maintenance, training, staffing levels, work methods, procedures and communications, combined with out-of-date drawings, aging assets, risk mitigation apathy and leadership focused on short-term, bottom-line results are a formula for disaster.

Most of us have heard about recent refinery problems, not to mention the pipeline failures in the oil and gas production side of the business. Fires, accidents and unplanned shutdowns happen all the time, in many different businesses around the world. It’s only when catastrophic events happen close to home and impact our pocketbooks that they seem to stay in the news. Let’s examine the role “leadership” has played in getting us to our current situation.

Corporate leadership sets the stage for equipment, process and facility performance in the areas of safety, environmental, quality and profitability. Why, then, does 0607_uptime1corporate leadership often seem to ignore the importance of Maintenance and Reliability?

A constrained process hiccups
Today’s oil and gas business, from the wellheads to the refinery terminals, are part of our aging infrastructure. No new refinery has been built in the past 30 years or so. Old facilities have been expanded and improved, but domestic refinery capacity simply cannot keep up with demand for refined petroleum products.

Drivers in the U.S. use about 36 million more gallons of gasoline per day than our domestic refineries produced in 2005, compared to 17 million gallons per day in 1995. This “gap gasoline” we use comes to us as imports. Being a global commodity, gasoline is sold to the highest bidder in a free market. The domestic supply of petroleum products decreases when major events disrupt the supply—Hurricane Katrina and the Texas City, TX refinery explosion in 2005; the 2007 fire at the Sunray, TX, McKee-Valero refinery; the April 2007 explosion at the Williams Energy refinery in Oklahoma; the May 2006 mechanical failure and fire at Valero’s St. Charles, LA, refinery. When consumer demand exceeds supply, prices climb. If prices did not climb, slowing the demand, we would quickly run out of fuel. (This actually happened in some areas after Hurricane Katrina shut down Gulf Coast refineries.) The U.S. typically has about a 20-day supply of fuel on hand in the system.

Supply and demand goes a bit deeper too— down-hole. Crude oil from wells around the world is sold on the world market for high bid, as is gasoline. When major events hint at disrupting this supply—or when actual events really do disrupt it—the supply prices climb. Recent events in Prudhoe Bay, AK, are a case in point. There, two pipeline leaks in 2006, followed by a water line leak in May 2007, significantly hurt crude oil production, with a partial shutdown extending from August through October 2006. These types of events cut both supply and oil company revenues. For example, the May 2007 water line leak at the BP Prudhoe Bay field was reported to have stopped nearly 100,000 barrels of oil production per day. As a result, the price for a barrel of crude oil increased, both short term and long term. The associated revenue loss is hard to imagine: 3 days x 100,000 bbl x $65 = $19,500,000 in lost revenue in three days! Because $19 million revenue was included in the budget for the year, what happens after the financial loss?

Continue Reading →