Archive | 2007


7:39 pm
May 5, 2009
Print Friendly

Part I… Building Cultures Of Reliability-In-Action

Development of effective decision-making skills and behaviors is the foundation of human reliability. This human element is crucial to your equipment and process reliability.


Process-oriented organizations drive value by improving their business processes and equipment performance. At the same time, however, a number of applications, including asset management, work process improvement, defect elimination and preventive maintenance, among others, can be powerful but incomplete applications when seeking to sustain a competitive edge.

To implement and sustain high-performing, reliable cultures, managers need to be as rigorous about diagnosing, designing and implementing changes to the human decision-making process as they are with their business and equipment processes. Equipment and process reliability ultimately rest with human reliability. Thus, cultural change at its deepest level requires examining human reasoning and its resulting decisions.

To establish a culture-of-reliability requires going beyond the traditional stew of copycat approaches and learning how to: (1) use actionable tools to implement and sustain reliability improvements and bottom-line impact by (2) collecting cultural action data and (3) learning how to use that data to uncover hidden bottlenecks to performance.

In the quest for high performance, well-intentioned managers often launch cultural change efforts using what they believe to be applied methods, like employee surveys, team building, empowerment, leadership style, systems thinking, formal performance appraisal, 360° feedback, you name it, only to be disillusioned in the end by the fact that more change efforts fail than succeed. Although they may be well-accepted, traditional change methods are not precise enough to create and sustain cultures-of-reliability and typically evolve into the next flavor of the month.

The learning exercise
For the past 16 years I have been conducting a specific learning exercise related to cultural change. The purpose is to help participants understand why implementation is so hard. There are five objectives for the session:

  1. To discover root cause of implementation barriers;
  2. To illustrate the interdependent relationship between learning and error;
  3. To determine how participants personally feel when they make mistakes;
  4. Based on their experience of error, to understand how humans design a culture-in-action to avoid errors and mistakes; and
  5. To determine the costs of error avoidance to business and human dignity.

To start, participants construct a definition of competitive learning which, at its root, is defined as the detection and correction of mistakes, errors, variance, etc., at ever-increasing rates of speed and precision—the heart of reliability. Through poignant illustrations, they learn that their organizations tend to focus on making fast decisions (“time is money”), timelines, milestones etc., but at a cost to precision, the quality of the decision.

Based on that definition, the participants are asked to reflect on a recent performance mistake they have made on the job or in life. The response from hundreds of them—male and female, Fortune 500 executives, managers, supervisors, engineers, technicians and craftsmen—are very consistent. When they make an error they feel: shame, anger, frustration, stupid, embarrassed, inadequate with an impulse to hide the error and, at the same time, a desire to fix it. The result is an emotionally charged picture of wanting to fix mistakes coupled with an overwhelming response to hide them for fear of blame.

As the exercise unfolds, participants gain insight into how learning and mistakes, trial and error shape performance and how ineffective learning patterns persist for years. For example, individuals from process industries have revealed they’ve known that less-than-effective outages and turnarounds have existed for years; that “lessons-learned” sessions don’t successfully address operations and maintenance infighting and squabbles over what quality work means and the validity of data; that stalled work management initiatives or reprisals for management decisions are a fact of life; etc. The list goes on and on. Discovering why his division had not been able to penetrate a market for over 20 years, one vice president-level participant summed up the dilemma this way: “The costs [of ineffective learning] are so high, they are un-estimateable.”

Through collective reflection in a larger group, participants come to realize that they all experience learning in very similar ways. They also come to learn that their reasoning is very similar. They typically espouse that continuous learning is important and mistakes are OK, but, in the final analysis, mistakes are categorized as critical incidents on performance appraisals or simply seen as ineffectiveness.

When performance appraisal is tied to pay, rewards and promotion, participants indicate that they would have to be foolish, if they “didn’t put the best spin” and save face at any cost. “I have a mortgage to pay” is how many respondents put it. At the same time, they acknowledge learning does occur, but at a rate that leaves much to be desired. “It’s not all bad,” is how many participants put it. Yet, this is not really a case of being bad. Rather, it is a case of sincere, hard-working people unknowingly designing a culture with a set of unintended outcomes.

At this point, participants begin to gain insight: they say one thing and do another. Moreover, they come to understand that it is easy to see defensive patterns in others, but not so easy to see defensive patterns in themselves. Not surprising, being defensive is espoused as not ok. Hence, good team players should be open to feedback. Not being open would be admitting a mistake, the very essence of pain.

In the final phase of the learning exercise, participants come to recognize that they have a strong desire to learn and they seek noble goals, but that fears of retribution for telling the truth, blame, fear of letting someone down or fear of failure, whether in substance or perception, contribute to a sense of loss of control. Unfortunately, this situation violates the first commandment of management: BE IN CONTROL.

The need for control translates into a hidden performance bottleneck, given the complexity of job interdependencies and systemic error. As one individual noted, “I can’t control what I can’t control, but I am held accountable. Accountability translates into who to blame.” Participants acknowledge that they subtly side-step difficult issues and focus on the more routine, administrative issues, thereby reducing emotional pain and conflict in the short term. They acknowledge that they bypass the potential for higher performance by not reflecting on gaps in decision-making.

Ironically, as these decision bottlenecks limit performance, expectations for better performance increase, often resulting in unrealistic timelines and more stress. Executives complain they just don’t get enough change fast enough, and middle managers and individual contributors complain of “micro-management.” Sound familiar?

The end result is that sincere attempts to improve the status quo slowly are cocreatively undermined and inadequate budgets and unrealistic timeframes are set. Good soldiers publicly salute the goals, but privately resist because their years of experience have taught them to think in terms of “what’s the use of telling the truth as I see it; this, too, will pass.” Ultimately, many see the “other guy(s)” or group as the problem and wonder why we can’t “get them” in line. This is the heart of an organizational fad—something that often is labeled as the lack of accountability.

Based on participants’ data generated from this learning exercise and action data recorded and collected from the field (see Part III of this series for the data collection method), a culture-inaction model, similar to that shown in Fig. 1, is created and verified with illustrations. Participants consistently agree this type of model is accurate and reflects their own current cultures-in-action.

Underlying assumptions…
The culture-in-action model is rooted in human reasoning. Given the assumptions of avoiding mistakes and being in control to win and look competent in problem resolution, the reasoning path is clear. The behaviors make perfectly good sense.

When seeking solutions, multiple perspectives will proliferate on which solution is best, some with more risk, some with less. Think of it as inference stacking. A complex web of cause and effect, solutions and reasons why something will or will not work are precariously stacked one upon the other, up to a dizzying height.

Determining whose perspective is right is problematic (“Your guess is as good as mine”). Hence, controlling the agenda to reduce frustration either by withholding information (“Don’t even go there”) or aggressively manipulating people to submit or comply with someone else’s views to get things done is a logical conclusion based on the underlying assumptions.

It is not surprising that executives seek to control their organizations and focus on objectives—and when they do this that middle managers privately feel out of control because they think they are not trusted to implement initiatives or handle day-to-day routines. This leads to the following managerial dilemma: If I voice my real issues, I will not be seen as a good team player. If I stay silent, I will have to pretend to live up to unrealistic expectations. Either way is no win (a real double bind).

To overcome this dilemma, people verify and vent their emotions one-on-one, i.e. in hallways, restrooms and offices. This way, they avoid confronting the real issue of how they are impacted by others, which is diffi- cult to discuss in a public forum (“Don’t want to make a career-threatening statement”). Instead, they seek thirdparty validation that their beliefs are the right ones to hold (“Hey, John, can you believe what just happened in that meeting? I don’t think that strategy is going to work; didn’t we try it 10 years ago?”). Even the best-performing teams demonstrate some of these performance-reducing characteristics. The culture becomes laden with attributions about others’ motivation, intent and effectiveness and it is labeled “politics.”

Routine problems often are uncovered, organizations do learn, but the deeper performance bottlenecks, hidden costs, sources of conflict and high-performance opportunities are missed because the focus is on putting the “best spin” on “opportunities for improvement” with a twist of language to avoid the “mistake” word. That’s because mistakes are bad and people don’t like to discuss them. Interestingly enough, there are even objections to using the word “error” during the process of the exercise. It is not surprising that when trying to learn and continuously improve a turnaround, business process or project, for example, people privately will conclude “Oh, boy, here we go again. Another wasted meeting debating the same old issues.” Negative attributions proliferate (“They don’t want to learn”) and underlying tension grows.

At this stage of the process, the pattern begins to repeat itself. As the project effort falls behind, expectations build. Typically, someone will be expected to “step up” and be the hero. With eyes averted, looking down, uncomfortable silence, someone “steps up” and often gets rewarded. Yet this heroic reward doesn’t address root cause (i.e. what accounted for the errors and frustration in the first place). Side-stepping or avoiding the more difficult-to-discuss issues don’t help uncover root cause, but, rather, lead to fewer errors being discovered. As a result, the business goal is pushed a little further out and economic vulnerability is increased.

If the market is robust, errors and mistakes may mean little to a business. The demand can be high if you have the right product, at the right time. As competition increases, however, or the market begins to falter, the ability to remain competitive and achieve what the organization has targeted is crucial. Competitive learning is the only weapon an organization has to maintain its edge in the marketplace.

Major culture-in-action features
In summary, the major features of a true culture-in-action are:

  • Avoidance of mistakes and errors at all cost;
  • Little active inquiry to test negative attributions;
  • Little personal reflection (i.e. “How am I a part of the problem?”);
  • Little discussion of personal performance standards by which we judge others; and
  • Little agreement on what valid data would look like.

As the exercise winds down, it’s not long before someone asks, “So how do you get out of this status quo loop?” When this question comes, because it always does, I turn it back to the group and ask how they would alter this cultural system? The reaction is always the same—silence and stares. No wonder. The answer is not intuitively obvious, even to the most seasoned of practitioners and theorists.

The short answer is rather than “get” anyone anywhere, change has to be based on individual reflection and actionable tools driven through collaborative design and invitation. These actionable tools balance the playing field, at all levels, by helping create informed choice through daily decision-making reflection. Traditional intervention methods focus on changing behavior, learning your style or type, building a vision, etc. There are any number of approaches, all very powerful but incomplete without addressing the underlying reasoning (root cause) that is informing the behavior in the first place.

Coming next month
In Part II, a culture of reliability will be defined, as well as the role of reflection in organizational performance and the actionable tools of collaborative design. MT

Brian Becker is a senior project manager with Reliability Management Group (RMG), a Minneapolis-based consulting firm. With 27 years of business experience, he has been both a consultant and a manager. Becker holds a Harvard doctorate with a management focus. For more information, e-mail:

Continue Reading →


8:09 pm
April 29, 2009
Print Friendly

Going Wireless: Wireless Technology Is Ready For Industrial Use

Wireless works in a plant, but you’ll want to be careful regarding which “flavor” you choose

Wireless Technology now provides secure, reliable communication for remote field sites and applications where wires cannot be run for practical or economic reasons. For maintenance purposes, wireless can be used to acquire condition monitoring data from pumps and machines, effluent data from remote monitoring stations, or process data from an I/O system.

For example, a wireless system monitors a weather station and the flow of effluent leaving a chemical plant. The plant’s weather station is 1.5 miles from the main control room. It has a data logger that reads inputs from an anemometer to measure wind speed and direction, a temperature gauge and a humidity gauge. The data logger connects to a wireless remote radio frequency (RF) transmitter module, which broadcasts a 900MHz, frequency hopping spread spectrum (FHSS) signal via a YAGI directional antenna installed at the top of a tall boom located beside the weather station building. This posed no problem.

However, the effluent monitoring station was thought to be impossible to connect via wireless. Although the distance from this monitoring station to the control room is only one-quarter mile, the RF signal had to pass through a four-story boiler building. Nevertheless, the application was tested before installation, and it worked perfectly. The lesson here is that wireless works in places where you might think it can’t. All you have to do is test it.

There are many flavors of wireless, and an understanding is needed to determine the best solution for any particular application.Wireless can be licensed or unlicensed, Ethernet or serial interface, narrow band or spread spectrum, secure or open protocol,Wi-fi…the list goes on. This article provides an introduction to this powerful technology.

The radio spectrum
The range of approximately 9 kilohertz (kHz) to gigahertz (GHz) can be used to broadcast wireless communications. Frequencies higher than these are part of the infrared spectrum, light spectrum, X-rays, etc. Since the RF spectrum is a limited resource used by television, radio, cellular telephones and other wireless devices, the spectrum is allocated by government agencies that regulate what portion of the spectrum may be used for specific types of communication or broadcast.

In the United States, the Federal Communications Commission (FCC) governs the allocation of frequencies to non-government users. FCC has limited the use of Industrial, Scientific, and Medical (ISM) equipment to operate in the 902-928MHz, 2400-2483.5MHz and 5725-5875MHz bands,with limitations on signal strength, power, and other radio transmission parameters. These bands are known as unlicensed bands, and can be used freely within FCC guidelines. Other bands in the spectrum can be used with the grant of a license from the FCC. (Editor’s Note: For a quick definition of the various bands in the RF spectrum, as well as their uses, log on to: http://encyclopedia.thefreedictionary. com/radio+frequency )

Licensed or unlicensed
A license granted by the FCC is needed to operate in a licensed frequency. Ideally, these frequencies are interference-free, and legal recourse is available if there is interference. The drawbacks are a complicated and lengthy procedure in obtaining a license, not having the ability to purchase off-the-shelf radios since they must be manufactured per the licensed frequency, and, of course, the costs of obtaining and maintaining the license.


License-free implies the use of one of the frequencies the FCC has set aside for open use without needing to register or authorize them. Based on where the system will be located, there are limitations on the maximum transmission power. For example, in the U.S., in the 900MHz band, the maximum power may be 1 Watt or 4 Watts EIRP (Effective Isotropic Radiated Power).

The advantages of using unlicensed frequencies are clear: no cost, time or hassle in obtaining licenses; many manufacturers and suppliers who serve this market; and lower startup costs, because a license is not needed. The drawback lies in the idea that since these are unlicensed bands, they can be “crowded” and, therefore, may lead to interference and loss of transmission. That‘s where spread spectrum comes in. Spread spectrum radios deal with interference very effectively and perform well, even in the presence of RF noise.

Spread spectrum systems
Spread Spectrum is a method of spreading the RF signal across a wide band of frequencies at low power, versus concentrating the power in a single frequency as is done in narrowband channel transmission. Narrowband refers to a signal which occupies only a small section of the RF spectrum, whereas wideband or broadband signal occupies a larger section of the RF spectrum. The two most common forms of spread spectrum radio are frequency hopping spread spectrum (FHSS), and direct sequence spread spectrum (DSSS). Most unlicensed radios on the market are spread spectrum.

As the name implies, frequency hopping changes the frequency of the transmission at regular intervals of time. The advantage of frequency hopping is obvious: since the transmitter changes the frequency at which it is broadcasting the message so often, only a receiver programmed with the same algorithm would be able to listen and follow the message. The receiver must be set to the same pseudo-random hopping pattern, and listen for the sender’s message at precisely the correct time at the correct frequency. Fig. 1 shows how the frequency of the signal changes with time. Each frequency hop is equal in power and dwell time (the length of time to stay on one channel). Fig. 2 shows a two dimensional representation of frequency hopping, showing that the frequency of the radio changes for each period of time. The hop pattern is based on a pseudo random sequence.


DSSS combines the data signal with a higher data-rate bit-sequence-also known as a ‘chipping code’-thereby “spreading” the signal over greater bandwidth. In other words, the signal is multiplied by a noise signal generated through a pseudo-random sequence of 1 and -1 bits. The receiver then multiplies the signal by the same noise to arrive at the original message (since 1 x 1 = 1 and -1 x -1 = 1).

When the signal is “spread,” the transmission power of the original narrowband signal is distributed over the wider bandwidth, thereby decreasing the power at any one particular frequency (also referred to as low power density). Fig. 3 shows the signal over a narrow part of the RF spectrum. In Fig. 4, that signal has been spread over a larger part of the spectrum, keeping the overall energy the same, but decreasing the energy per frequency. Since spreading the signal reduces the power in any one part of the spectrum, the signal can appear as noise. The receiver must recognize this signal and demodulate it to arrive at the original signal without the added chipping code. FHSS and DSSS both have their place in industry and can both be the “better” technology based on the application. Rather than debating which is better, it is more important to understand the differences, and then select the best fit for the application. In general, a decision involves:

  • Throughput
  • Colocation
  • Interference
  • Distance
  • Security

Throughput is the average amount of data communicated in the system every second. This is probably the first decision factor in most cases. DSSS has a much higher throughput than FHSS because of a much more efficient use of its bandwidth and employing a much larger section of the bandwidth for each transmission. In most industrial remote I/O applications, the throughput of FHSS is not a problem.

As the size of the network changes or the data rate increases, this may become a greater consideration. Most FHSS radios offer a throughput of 50-115 kbps for Ethernet radios.Most DSSS radios offer a throughput of 1-10 Mbps. Although DSSS radios have a higher throughput than FHSS radios, one would be hard pressed to find any DSSS radios that serve the security and distance needs of the industrial process control and SCADA market. Unlike FHSS radios, which operate over 26MHz of the spectrum in the 900MHz band (902-928MHz), and DSSS radios, which operate over 22MHz of the 2.4GHz band, licensed narrow band radios are limited to 12.5kHz of the spectrum.Naturally, as the width of the spectrum is limited, the bandwidth and throughput will be limited as well.Most licensed frequency narrowband radios offer a throughput of 6400 to 19200 bps.

Collocation refers to having multiple independent RF systems located in the same vicinity. DSSS does not allow for a high number of radio networks to operate in close proximity as they are spreading the signal across the same range of frequencies. For example, within the 2.4GHz ISM band, DSSS allows only three collocated channels. Each DSSS transmission is spread over 22MHz of the spectrum, which allows only three sets of radios to operate without overlapping frequencies.

FHSS, on the other hand, allows for multiple networks to use the same band because of different hopping patterns. Hopping patterns which use different frequencies at different times over the same bandwidth are called orthogonal patterns. FHSS uses orthogonal hopping routines to have multiple radio networks in the same vicinity without causing interference with each other. That is a huge plus when designing large networks, and needing to separate one communication network from another. Many lab studies show that up to 15 FHSS networks may be collocated, whereas only 3 DSSS networks may be collocated. Narrowband radios obviously cannot be collocated as they operate on the same 12.5MHz of the spectrum.

Interference is RF noise in the vicinity and in the same part of the RF spectrum. A combining of the two signals can generate a new RF wave or can cause losses or cancellation in the intended signal. Spread Spectrum in general is known to tolerate interference very well, although there is a difference in how the different flavors handle it.When a DSSS goingwireless4receiver finds narrowband signal interference, it multiplies the received signal by the chipping code to retrieve the original message. This causes the original signal to appear as a strong narrow band; the interference gets spread as a low power wideband signal and appears as noise, and thus can be ignored.

In essence, the very thing that makes DSSS radios spread the signal to below the noise floor is the same thing that allows DSSS radios to ignore narrowband interference when demodulating a signal. Therefore, DSSS is known to tolerate interference very well, but it is prone to fail when the interference is at a higher total transmission power, and the demodulation effect does not drop the interfering signal below the power level of the original signal.

Given that FHSS operates over 83.5MHz of the spectrum in the 2.4GHz band, producing high power signals at particular frequencies (equivalent to having many short synchronized bursts of narrowband signal) it will avoid interference as long as it is not on the same frequency as the narrowband interferer.Narrowband interference will, at most, block a few hops which the system can compensate for by moving the message to a different frequency. Also, the FCC rules require a minimum separation of frequency in consecutive hops, and therefore the chance of a narrowband signal interfering in consecutive hops is minimized.

When it comes to wideband interference, DSSS is not so robust. Since DSSS spreads its signal out over 22MHz of the spectrum all at once at a much lower power, if that 22MHz of the spectrum is blocked by noise or a higher power signal, it can block 100% of the DSSS transmission, although it will only block 25% of the FHSS transmission. In this scenario, FHSS will lose some efficiency, but not be a total loss.

In licensed radios the bandwidth is narrow, so a slight interference in the range can completely jam transmission. In this case, highly directional antennas and band pass filters may be used to allow for uninterrupted communication, or legal action may be pursued against the interferer.

802.11 radios are more prone to interference since there are so many readily available devices in this band. Ever notice how your microwave interferes with your cordless phone at home? They both operate in the 2.4GHz range, the same as the rest of 802.11 devices. Security becomes a greater concern with these radios.

If the intended receiver of a transmitter is located closer to other transmitters and farther from its own partner, it is known as a Near/Far problem. The nearby transmitters can potentially drown the receiver in foreign signals with high power levels. Most DSSS systems would fail completely in this scenario. The same scenario in a FHSS system would cause some hops to be blocked but would maintain the integrity of the system. In a licensed radio system, it would depend on the frequency of the foreign signals. If they were on the same or close frequency, it would drown the intended signal, but there would be recourse for action against the offender unless they have a license as well.

Distance is closely related to link connectivity, or the strength of an RF link between a transmitter and a receiver, and at what distance they can maintain a robust link. Given that the power level is the same, and the modulation technique is the same, a 900MHz radio will have higher link connectivity than a 2.4GHz radio. As the frequency in the RF spectrum increases, the transmission distance decreases if all other factors remain the same. The ability to penetrate walls and object also decreases as the frequency increases.Higher frequencies in the spectrum tend to display reflective properties. For example, a 2.4GHz RF wave can bounce off reflective walls of buildings and tunnels. Based on the application, this can be used as an advantage to take the signal farther, or it may be a disadvantage causing multipath, or no path, because the signal is bouncing back.

FCC limits the output power on spread spectrum radios. DSSS consistently transmits at a low power, as discussed above, and stays within the FCC regulation by doing so. This limits the distance of transmission for DSSS radios, and thus this may be a limitation for many of the industrial applications. FHSS radios, on the other hand, transmit at high power on particular frequencies within the hopping sequence, but the average power on the spectrum is low, and therefore can meet with the regulations. Since the actual signal is transmitting at a much higher power than the DSSS, it can travel further.Most FHSS radios are capable of transmitting over 15 miles, and longer distances with higher gain antennas.

802.11 radios, although available in both DSSS as well as FHSS, have a high bandwidth and data rate, up to 54Mbps (at the time of this publication). But it is important to note that this throughput is for very short distances, and downgrades very quickly as the distance between the radio modems increases. For example, a distance of 300 feet would drop the 54Mbps rate down to 2Mbps. This makes this radio ideal for a small office or home application, but not for many industrial applications where there is a need to transmit data over several miles.

Since narrowband radios tend to be a lower frequency, they are a good choice in applications where FHSS radios cannot provide adequate distance. A proper application for narrow band licensed radios is when there is a need to use a lower frequency to either travel over a greater distance, or be able to follow the curvature of the earth more closely and provide link connectivity in areas where line of sight is hard to achieve.

Since DSSS signals run at such low power, the signals are difficult to detect by intruders. One strong feature of DSSS is its ability to decrease the energy in the signal by spreading the energy of the original narrowband signal over a larger bandwidth, thereby decreasing the power spectral density. In essence, this can bring the signal level below the noise floor, thereby making the signal “invisible” to would-be intruders. On the same note, however, if the chipping code is known or is very short, then it is much easier to detect the DSSS transmission and retrieve the signal since it has a limited number of carrier frequencies. Many DSSS systems offer encryption as a security feature, although this increases the cost of the system and lowers the performance, because of the processing power and transmission overhead for encoding the message.

For an intruder to successfully tune into a FHSS system, he needs to know the frequencies used, the hopping sequence, the dwell time and any included encryption. Given that for the 2.4GHz band the maximum dwell time is 400ms over 75 channels, it is almost impossible to detect and follow a FHSS signal if the receiver is not configured with the same hopping sequence, etc. In addition, most FHSS systems today come with high security features such as dynamic key encryption and CRC error bit checking.

Today,Wireless Local Area Networks (WLAN) are becoming increasingly popular. Many of these networks use the 802.11 standard, an open protocol developed by IEEE.Wi-fiis a standard logo used by the Wireless Ethernet Compatibility Alliance (WECA) to certify 802.11 products. Although industrial FHSS radios tend to not be Wi-fi, and therefore not compatible with these WLANs, there may be a good chance for interference due to them operating in the same bandwidth. Since most Wi-fiproducts operate in the 2.4 or 5GHz bands, it may be a good idea to stick with a 900MHz radio in industrial applications, if the governing body allows this range (Europe allows only 2.4GHz, not 900MHz). This will also provide an added security measure against RF sniffers (a tool used by hackers) in the more popular 2.4 band.

Security is one of the top issues discussed in the wireless technology sector. Recent articles about “drive-by hackers” have left present and potential consumers of wireless technology wary of possible infiltrations. Consumers must understand that 802.11 standards are open standards and can be easier to hack than many of the industrial proprietary radio systems.

The confusion about security stems from a lack of understanding of the different types of wireless technology. Today, Wi-fi(802.11a, b, and g) seems to be the technology of choice for many applications in the IT world, homes and small offices. 802.11 is an open standard in which many vendors, customers and hackers have access to the standard.While many of these systems have the ability to use encryption like AES and WEP, many users forget or neglect to enable these safeguards which would make their systems more secure.Moreover, features like MAC filtering can also be used to prevent unauthorized access by intruders on the network. Nonetheless, many industrial end users are very wary about sending industrial control information over standards that are totally “open.”

So, how do users of wireless technology protect themselves from infiltrators? One almost certain way is to use non- 802.11 devices that employ proprietary protocols that protect networks from intruders. Frequency hopping spread spectrum radios have an inherent security feature built into them. First, only the radios on the network that are programmed with the “hop pattern” algorithm can see the data. Second, the proprietary, non-standard, encryption method of the closed radio system will further prevent any intruder from being able to decipher that data.

The idea that a licensed frequency network is more secure may be misleading. As long as the frequency is known, anyone can dial into the frequency, and as long as they can hack into the password and encryption, they are in. The added security benefits that were available in spread spectrum are gone since licensed frequencies operate in narrowband. Frequency hopping spread spectrum is by far the safest, most secure form of wireless technology available today.

Mesh radio networks
Mesh radio is based on the concept of every radio in a network having peer-topeer capability. Mesh networking is becoming popular since its communication path has the ability to be quite dynamic. Like the worldwide Web, mesh nodes make and monitor multiple paths to the same destination to ensure that there is always a backup communication path for the data packets.

There are many concerns that developers of mesh technology are still trying to address, such as latency and throughput. The concept of mesh is not new. The internet and phone service are excellent mesh networks based in a wired world. Each node can initiate communication with another node and exchange information.

In conclusion, the choice of radio technology to use should be based on the needs of the application. For most industrial process control applications, proprietary protocol license-free frequency hopping spread spectrum radios (Fig. 5) are the best choice because of lower cost and higher security capabilities in comparison to licensed radios.When distances are too great for a strong link between FHSS radios with repeaters, then licensed narrowband radios should be considered for better link connectivity. The cost of licensing may offset the cost of installing extra repeaters in a FHSS system.

As more more industrial applications require greater throughput, networks employing DSSS that enable TCP/IP and other open Ethernet packets to pass at higher data rates will be implemented. This is a very good solution where PLCs (Programmable Logic Controllers), DCS (Distributed Control Systems) and PCS (Process Control Systems) need to share large amounts of data with one another or upper level systems like MES (Manufacturing Execution Systems) and ERP (Enterprise Resource Planning) systems.

When considering a wireless installation, check with a company offering site surveys that allow you to install radios at remote locations to test connectivity and throughput capability. Often this is the only way to ensure that the proposed network architecture will satisfy your application requirements. These demo radios also let you look at the noise floor of the plant area, signal strength, packet success rate and the ability to identify if there are any segments of the license free bandwidth that are currently too crowded for effective communication throughput. If this is the case, then hop patterns can be programmed that jump around that noisy area instead of through it. MT

Gary Mathur is an applications engineer with Moore Industries-International, in North Hills, CA. He holds Bachelor’s and Masters degrees in Electronics Engineering from Agra University, and worked for 12 years with Emerson Process Management before joining Moore. For more information on the products referenced in this article, telephone: (818) 894-7111; e-mail:

Continue Reading →


6:00 am
December 1, 2007
Print Friendly

Maintenance Quarterly: Do You Really Know Where Your Machines Are?

Becoming a “Reliable Plant” and staying there requires keeping abreast of constantly changing and improving technologies and practices.

In today’s leaner maintenance departments, companies rely heavily on the reliability of their machinery. While the practice of reliability engineering has been around for many years, it has never been focused on as much as it is now. In today’s maintenance world, reliability engineering positions—not to mention entire departments—have been created to put 100% of their time and effort toward the prevention of unscheduled machinery downtime and critical failures.

Even though the goal of a “Reliable Plant” remains much the same as it has for years, methods and practices for getting to that state are constantly changing and improving with the development of new technologies and practices. A case in point is proper shaft alignment of rotating machinery in the running condition, through the derivation and application of proper coupling target values.

With today’s laser alignment tools and proper training, alignment of machinery has become an easier task than in years past. However, in some cases, companies are finding that even while machines are within excellent alignment tolerances, they still have problems associated with misalignment. This often is a result of thermal growth issues with the machine, dynamic loads, downstream (or upstream) piping movement and other variables.

Many manufacturers supply their equipment with thermal expansion data and recommended alignment targets. The idea is to purposely misalign a machine when the alignment is done “cold,” or offline, so that when the machine reaches its normal running condition the machine is aligned. Compensating with target values is one step closer to proper alignment, but often these values are not as accurate as they were originally intended to be, due to flaws in the methods of their calculation.

Hypothetical applications
Two identical steam turbine-hot water pump machine trains are sold and supplied with factory-calculated target values. It is late October. One unit is installed in a Louisiana refinery at 90 F, the other in an identical plant in Washington State at 40 F. Both operate at the same temperature, but which machine will be in alignment when it reaches its normal running condition? Consider that the factory calculated the target values using an arbitrary cold temperature of 70 F. Because of the temperature differences, it is possible that both units may be out of alignment at running condition using the factory supplied alignment targets.


Using the “TLC” thermal growth calculation method we can see how much the growth can differ depending on what the ambient temperature is when the alignment is performed. The TLC method is the product of the change in Temperature, the Length of material from base of machine to the centerline of rotation and the Coefficient of expansion for the material involved. Each support foot of each machine needs to be calculated. The calculations for one of the feet at each location are shown in Fig. 1.


These variations at the feet could mean an even greater misalignment at the coupling center, or point of power transmission. The graph in Fig. 2 is based on the thermal growth values shown in Fig 1. It illustrates how these growth values could result in even greater misalignment at the coupling center.

Dealing with “problem” machines
Many companies seem to have some “problem” machines that they too often accept as being uncorrectable. Extra spare parts become part of the yearly budget and it’s no surprise to anyone when those particular machines break a bearing or lose a seal every few months—while similar machines run without a problem for years.

This type of situation became clear for a South California refinery several years ago. As part of its growing reliability program, the refinery decided to do something about the site’s “problem” machines, as well as those machines without accurate target values. The company utilizes the best laser alignment tools and trains its employees to do correct alignment incorporating target values wherever necessary. Even with these good practices in place, however, some of the machines still have high-failure rates.


Whenever refinery personnel identify a machine that is still having problems with failures associated with misalignment, they install a system called PERMALIGN® to accurately measure any relative movement between the machines from cold to hot or normal running condition. This laser-based system measures and records any movement, whether across a coupling or an absolute movement relative to Earth, and is accurate to 1 micron. (It is the only linearized laser monitoring system with a resolution of 1 micron throughout the entire 0.630″ detector range.) The system measures any offset and angular movement over separations of up to 30′, so it can also record data on the site’s large cooling tower fans. Even in the harsh environment that the refinery offers, temperature variations and vibration do not diminish accuracy.

The data collected by the PERMALIGN system can be trended, analyzed and archived using software called WINPERMA®. This software uses the data to translate the relative machine movement into movement at the coupling center in both axes; Vertical Offset, Vertical Angularity, Horizontal Offset and Horizontal Angularity are calculated. A baseline established at the ambient temperature becomes the zero point, then the machines are turned on and allowed to reach their normal running condition. The graph in Fig. 3 shows all four axes of movement so the new alignment targets are easily read. Flags can be marked on the graph to record system events such as when the system was brought on-line, to mark different running loads, a valve opening or any other system event. Let’s look at a recent example of a “problem” machine where the California refinery utilized the PERMALIGN system to measure the movement across the coupling.

In one of its distillation units, the refinery has a set of residuum pumps that are vital to the continuous operation of the unit. If the pumps were to shut down unexpectedly, the whole process would follow suit—leading to a major shutdown, resulting in significantly higher repair cost than just replacing a bearing on a pump. Since these pumps are redundant, if one fails the other picks up the load. On the other hand, when one “problem” pump is out of commission for repair, there is no backup. Of the two pumps, only one of them has a very high failure rate. They are identical pumps and the reason(s) why one of them has a high failure rate and the other does not remains a mystery. They both are aligned using the factory recommended targets, yet only one pump continues to have bearing failures. Vibration readings also are significantly higher on the one pump compared to the other, and vibration analysis points to misalignment. While there are myriad possible causes for this problem, correcting it is the priority. Thus, the PERMALIGN system was installed on the unit to measure the relative movement of the pump and motor.

Once the system was installed on the unit and started recording data, a baseline was established. Since these pumps operate at a very high temperature, they are slowly brought up to operating temperature, as marked on the graph with an event flag. A second flag was placed to note when the pump was brought on line. As the pump reaches its normal operating condition and the data levels out—in this case about eight hours—it can be shut down and allowed to cool.

The data shown in the box near the center of the graph in Fig. 3 are the new target values used for the alignment. These targets were input into the refinery’s ROTALIGN® ULTRA shaft alignment system and the alignment was performed once the unit cooled to ambient temperature. The unit was then put back on line.


A four-month trend of the overall velocity levels measured on the pump using the VIBXPERT® vibration data collector is shown in Fig. 4. The final reading on the trend was taken several days after the alignment was performed using the new target values.

After further investigation into the root cause of the problem pump, it was found that the concrete base had been cracked during a repair on an adjacent machine several years earlier. After the base was repaired, the “cold” position had apparently moved from its original setting, causing the targets to change. This cause was luckily found by a senior millwright reporting the repair after overhearing a conversation concerning the investigation. There was no documentation of the accidental damage or of the repair, so this information may never have been known if not for the millwright coming forward.

Utilizing the latest technologies, the refinery was able to identify a piece of critical machinery that had uncommon characteristics and quickly apply an accurate solution. A complete maintenance history of the machines is now stored in the site’s alignment and condition monitoring software. Proper use of these tools has put this refinery one step closer to what it truly wants to be—a Reliable Plant!

Deron Jozokos is an engineer with LUDECA, INC. Telephone: (305) 591-8935; e-mail:

Continue Reading →


6:00 am
December 1, 2007
Print Friendly

Viewpoint: Achieving Excellence


Richard L. Dunn, Executive Director, Foundation for Industrial Maintenance Excellence

Two U.S. plants have been selected to receive the 2007 North American Maintenance Excellence (NAME) Award presented by the Foundation for Industrial Maintenance Excellence. The Alcoa Mt. Holly plant, Goose Creek, SC, and the Baldor Dodge Reliance – Dodge Marion plant, Marion, NC, were selected as award winners after evaluation of their applications and onsite audits of their operations by the NAME Award Board of Directors.

Now in its seventeenth year, the NAME Award is widely regarded as the most prestigious recognition in the maintenance function. Awards are presented to individual plants on the basis of their maintenance departments’ ability to provide “capacity assurance for operational excellence” in the areas of organization, work processes and materials management.

In many ways, the two winners represent the breadth of the possible paths to maintenance excellence. One is a large plant, the other small; one a large maintenance organization, the other not. One plant is primarily a round-the-clock continuous process operation, the other a manufacturer of discrete products. One has a long tradition of striving for and exemplifying maintenance excellence, the other has come to this level only recently.

Alcoa Mt. Holly is a 1.5 million-square-foot aluminum smelter that produces about 500 million pounds of aluminum ingots annually. Its 160 maintenance employees support the 24/7 operation of the plant through a wide variety of preventive and predictive maintenance activities, major equipment overhauls and operation and maintenance of the plant’s substation. In recommending the plant for the NAME Award, evaluators noted its long history of outstanding work planning and scheduling, as well as its excellent communications and cooperation with all production areas.

Dodge Marion manufactures mounted tapered/spherical roller bearings in its 174,000- square-foot facility. Its nine-person maintenance department has developed a strong preventive and predictive maintenance program using various total productive maintenance (TPM) processes.

Both plants have demonstrated enviable records for reliability. Furthermore, both demonstrate that a foundation of sound preventive maintenance practices coupled with a plant-wide respect for the value of maintenance is essential to overall excellence.

Established in 1990 as a way to encourage best maintenance practices and a way to honor those who achieve them, the NAME Award program has presented 20 awards over the years with several awards in some years and none in others. In 2000, the volunteers who administer the award program incorporated as the not-for-profit Foundation for Industrial Maintenance Excellence (FIME) to ensure the program’s continuance and independence from commercial influence. The Board of Directors is made up of past award winners and others with a demonstrated devotion to the values the award represents.

To be eligible, a plant must submit a comprehensive application by June 30 in the year of entry. This application is reviewed by the Board of Directors to determine eligibility for an onsite audit. Following this audit, the Board of Directors again meets to decide if the applicant qualifies in all respects for the award.

The NAME Award recognizes that the Alcoa Mt. Holly and Dodge Marion plants have demonstrated their maintenance competence at a world-class level. The Foundation for Industrial Maintenance Excellence is proud to honor their achievements.

Rick Dunn participated in the establishment of the North American Maintenance Excellence Award and has been active in its activities since inception. He was appointed Executive Director when the NAME Award program was incorporated as the Foundation for Industrial Maintenance Excellence. Information on the NAME Award program is available online at

Continue Reading →


6:00 am
December 1, 2007
Print Friendly

Who's Got Time To Train Anymore?


Bob Williamson, Contributing Editor

Maintenance & Reliability is, and has been, a woefully overlooked career. We need our nation’s best and brightest young minds in Maintenance & Reliability careers NOW! What are we doing to attract and retain them?

What are we doing to train them to maintain the highest levels of equipment performance and reliability? What are we doing to promote pride in workmanship? The situation in many plants is already dire…and getting worse. You can see, hear and sense it everywhere, especially out on the plant floor.

Who’s got time for training
“I learned this job years ago from one of the best. I was under his wing for nearly eight months learning all the aspects of the precision work on this one type of machinery. In the 35 years I have worked here, I have never seen such a lack of training of our new guys. They get a few days training at best. Why, we even have some of the new employees teaching the newer employees how to work on this equipment. Pretty scary if you ask me! Most of them have never even seen the manual that came with these machines, the one that I learned from years ago. The only copy we have now is locked up in the maintenance office. Doesn’t anyone in top management care anymore?”

The skilled mechanic quoted above was truly concerned. We had just discovered that another mechanic at one point cranked down on one of the precision adjustments so far that it badly damaged the machine. The procedure in the equipment manual was not followed. Even though it was still running and making acceptable parts, the $10,000 precision cylinder had been scored beyond repair and there was no spare in stock. After a 12-week estimated delivery time, it would take several more days to replace the damaged parts.

We’ve always done it that way
In another plant, I noticed that four finethreaded machine adjustment bolts had been beaten severely with a hammer. They were so mushroomed that a wrench would no longer fit. (“That’s why we have Channel Lock pliers.”) Logically, and mechanically, any adjustment had to be made by turning the threaded adjusters. No other movement was possible. When asked, the mechanics all responded:

“Why do we hit the adjusters with a hammer? That’s the way we were taught. I guess we’ve always done it that way.”

We couldn’t find the manual
A one-year-old machine’s programmable controller was operated with a touch screen panel. While working on a processing line that fed this final stage unit, we noticed a gaggle of people gathered around the panel poking at it. Then they just wandered away. As we attempted to start up the machine, we discovered that the program had been erased and the machine would not cycle properly. Searching for the machine’s O&M manual, we discovered it underneath a workbench…and half of it was missing! As one individual later explained:

“Somebody must have messed with the program, again. If you touch this icon, then this one, it erases the program. I figured that out the hard way since we’ve never really had training on the programming controls. The manual has some of the control panel information, but it’s still not easy to understand.”

Sure we do regular preventive maintenance
During a hands-on PM workshop on a large integrated manufacturing line, one person discovered a loose bolt (no, it was not a maintenance person). Upon further investigation, we discovered that only one of the four bolts holding this unit together and in alignment was actually in place. One was missing, another one was completely broken off and a third bolt had the head sheared off. The remaining bolt was doing the work of four and was the only link between full operation and catastrophic downtime. After two hours of disassembly and repair, the broken bolt problems were corrected. The situation, evidently, came as surprise to at least one staffer:

“I don’t understand how we could have missed that one. Our monthly PM was just completed a few days ago.”

What’s changed
We are in the midst the “de-skilling” of the American industrial workforce—not by design, but by default. It’s not a new phenomenon either. This frightening trend has been overlooked by far too many of our business, government and academic decision-makers for far too long. We are at a near-critical point-of-no-return as the critical mass of skilled and knowledgeable people leave today’s workplace. Too many of today’s maintenance, reliability and operations personnel have not been adequately trained and qualified to do the jobs they are asked to do day in and day out. Many, if not most, younger and newer employees may not have the same basic skills and knowledge as those whom they are replacing.

Unfortunately, today’s decision-makers often ASSUME the fundamental skills and knowledge that were “common” when they began working 30-plus years ago are the same today. While we hate to be the bearer of bad tidings, these decision-makers are sorely wrong! There has been a fundamental paradigm shift and it is hurting our capital-intensive industries’ performance and reliability.

Think about it. How many of today’s older teenagers and twenty-somethings ever have:

  • Built a birdhouse, a utility box or a shed?
  • Changed the oil and filter in a car or truck?
  • Disassembled a lawnmower, a motorcycle, a jet ski or a snowmobile engine, put it back together and have it run?
  • Assembled a radio, a computer or an electronic robot?
  • Glazed a wood frame window?
  • Rebuilt an automobile engine?
  • Made something useful on a lathe or milling machine?
  • Owned and used a set of mechanic’s or carpenter’s tools?
  • Used a volt-ohmmeter to check a circuit?
  • Welded an angle iron frame or built a metal stand?
  • Soldered copper tubing or brazed steel tubing?
  • Installed and wired a doorbell?

Not many parents spend time with their children and teenagers making things, building projects or doing repairs around the home these days. Many of the fundamental skills and knowledge we took for granted in the 1960s, 70s and early 80s are apparently no longer valued. Luckily, there still are some very good high school vocational programs out there and some very good post-secondary technical colleges too—despite thousands of schools and programs being closed over the years. But, there simply are not enough schools and programs to address the problem we have now—a problem that’s going to get worse before it gets worse.

An overlooked career
As shown in the findings of our 2007 Salary Survey beginning on page 38, Maintenance & Reliability technician jobs can pay quite well. Some industries pay in the $30 per hour range and higher. So, why do countless newly-minted high school grads take jobs that pay less than $10 per hour—and, hop from job to job for years until they find their niche? Why do they go on to a four-year college to try and figure out what career they want to pursue in life? (If you are asking me, that is really an expensive “career education” program!)

We should promote careers in Maintenance & Reliability (not just “maintenance jobs”)! Clean up the workplace and give career-day tours. Help teachers and students understand that good money can be made in a rewarding career with a one- or two-year technical degree. Begin attracting the best and the brightest. Offer high-school cooperative education experience in your plant.

Trainers and coaches
Recruit a few of your senior, highly skilled maintenance personnel to be trainers and on-job coaches. Have them dedicate time documenting proper maintenance and reliability procedures for your critical equipment. Set new expectations; insist that critical maintenance tasks follow “standard procedures” or “standard job plans.” Train everyone who needs to know—everyone who touches the critical equipment—to follow these new standards. Then, hold everyone accountable for following these procedures. Problems will begin disappearing!

Show everybody that you care about how your equipment and plant are maintained. Be proud of your workmanship. Share a positive vision for careers in this arena. Let’s make 2008 the year of “Transforming Careers in Maintenance & Reliability.”

Continue Reading →


6:00 am
December 1, 2007
Print Friendly

Keeping things moving… Capture Problems Faster With High-Speed Video Technology


Jane Alexander, Editor-In-Chief

1207_solspot_1Industrial Video Solutions (IVS) supplies high-speed digital video technologies to packaging, manufacturing and paper industries around the world. These systems combine the latest in GigE technology, digital video developments, efficient lighting and an intuitive, feature-rich user interface. The goal: keep that product moving!

Quick-Eye digital video systems help manufacturing and packaging line operators improve production efficiency. Quick-Eye captures high-speed video and replays product and equipment issues in slow motion. It is portable and can be moved to problem areas with little setup time. Operators can eliminate bottlenecks and address the root causes of problems faster.

1207_solspot_2Quick-Eye offers high frame rates, high resolution, multihour video buffer, image analysis, etc. According to IVS, this affordable and simple-to-use technology provides an immediate return on investment (ROI).

WebScanPRO provides advanced monitoring and sheet break analysis for the paper industry and other web process manufacturers, such as non-woven fabrics and plastic sheet. Fast, precise and digitally simple, it, too, offers fast return on investment by continuously recording events that cause machine problems, poor quality and sheet breaks with some of the industry’s most advanced technology, including:1207_solspot_3

  • 100% noise-free digital video;
  • 90 or 200 frames per second at 659×493 resolution, assuring 100% monitoring on the fastest paper machine;
  • Up to 1/100,000 sec shutter speed;
  • Video synchronized to 1-frame and sheet break events saved without operator’s assistance. WebScanPRO offers exhaustive image analysis, including:
  • Grayscale of each frame is displayed with buffered and event video;
  • Real-time regions of interest (ROI) alert operators to changes in video. ROI can be defined for any camera;
  • Digital live video broadcast over the mill network accessible on any computer;
  • WebScanPRO is always on. It never misses a frame; simultaneous video capture, live video, viewing video in the buffer, viewing sheet breaks, ROI image analysis and grayscale analysis;
  • Paper-machine proven lights and camera enclosures.

Industrial Video Solutions, Inc.
McLean, VA

Continue Reading →


6:00 am
December 1, 2007
Print Friendly


We asked the questions. Here are our findings. How do you stack up?

After a three-year absence, our annual Salary Survey is back to help you determine how your income stacks up in relation to other maintenance and reliability professionals in today’s industrial arena.

1207_salary1Please note that our 2007 Salary Survey goes well beyond anecdotal information to reflect concrete data regarding the actual state of this industry’s employment marketplace. The data we used to compile this survey was obtained from a random sample of Maintenance Technology and Lubrication Management & Technology readers who completed an anonymous on-line survey. We believe the survey findings reported here to be both accurate and representative of what’s happening in the maintenance and reliability community.

A basic profile
When Maintenance Technology conducted its first salary survey in 1998, average respondent income was $58,748, (including overtime and bonus, which all averages in our findings reflect). Nine years later, the average expected income for 2007 is $86,251—a 32% increase. This also reflects a 3% increase from the average salary of $83,678 that this year’s respondents report having received in 2006.

Furthermore, expected income for 2007 is ranging from $26,000 to $250,000, in comparison to a range of $12,000 to $160,000 in 1998 and $26,000 to $235,000 in 2006.


For those paid on an hourly basis—23.68% of our survey respondents—the average pay rate is $28.30 per hour, equating to an average expected 2007 income of $69,238.

As shown in Fig. 1, the highest percentage of our respondents report an expected 2007 income in the $70,000 to $79,999 range. This also is where the median income, $78,000, is found.

Changes with age
Age of our survey respondents ranged from 26 to 71 years old, with an average of 50.2 years. Half of them are between 45 and 56 years of age. In addition, a large number of respondents are seasoned veterans, having spent an average of 22.2 years working in their fields.

1207_salary_fig2Based on age, the average income increased from $63,333.33 for respondents in their 20s to a high of $88,674 for those in their 50s. For those in their 60s and above, the average reported income dropped by slightly more than $4000. More results are shown in Fig. 2.

The learning curve
Of the survey respondents, 30.2% indicate a trade school diploma as their highest level of educational achievement; 25.9% have a two-year community college degree; 34.5% have a four-year college or university degree; and 9.6% have a masters or doctorate graduate university degree. So how do these educational levels relate to salary compensation?

1207_salary_fig31Typically, the higher the level of education respondents have achieved, the higher their average level of income is. Trade school graduates report an average 2007 income of $74,355; two-year community college graduates report $77,439; four-year college or university graduates report $97,375; those with advanced degrees report $107,301. Each level of education includes a wide range of salaries, as depicted in Fig. 3.

Outside of a formal education, 19% of respondents also hold one or more professional licenses or certifications, which include P.E., CMRP, CPMM and CPE. The average income for Professional Engineers (P.E.) is $113,316; the average income for those designated solely as Certified Maintenance and Reliability Professionals (CMRP) is $85,340; the average income for those designated solely as AFE Certified Plant Maintenance Managers (CPMM) is $77,000. (Note: Too small a number of AFE Certified Plant Engineers (CPE) or those with combinations of certification provided their expected 2007 income to report an accurate average.)

1207_salary_fig4Income by facility size
Survey respondents were asked to indicate the number of workers at their location of employment. The results were as follows: 12% are employed at facilities of one to 49 employees; 9% at facilities of 50 to 99 employees; 20% at facilities of 100 to 249 employees; 18% at facilities of 250 to 499 employees; 13% at facilities of 500 to 999 employees; and 28% at facilities with 1000 or more employees.

Related to salary, respondents working at facilities of 50 to 99 employees report the lowest average income at $68,998. Respondents working at facilities with 1000 or more employees record the highest average salary at $96,748. Fig. 4 displays the results from the remaining facility sizes.

1207_salary_fig5Industry type
We also asked survey respondents to specify the industry sector of their company/facility. The results, combined into five general categories derived from the North American Industry Classification System (NAICS), include processing, manufacturing, utilities, service and nonindustrial industries.

Based on responses, 40.1% of respondents work in processing industries; 22.1% in manufacturing; 14.6% in utilities; 6.8% in services; and 16.3% in non-industrial. Those in processing report the highest average salary at $94,346. The lowest average salary based on industry, $71,673, is reported by those working in the non-industrial sector. Fig. 5 displays full results.


Who’s doing what
Our survey asked respondents to indicate their level of work involvement. Results show that 13% chose corporate or multiplant; 15% plant or facility manager; 21% reliability or maintenance manager; 6% reliability engineer; 6% reliability technician; 7% maintenance engineer; 9% maintenance technician; 13% supervisor; 10% “other.”

As might have been expected, the average expected income for 2007 was the highest for those involved with corporate or multiplant levels, at $104,746, as is seen in Fig. 6. This is the same result we have found in the seven previous years of our survey. Those involved at the level of maintenance technician indicate the lowest average income at $62,100.

Continue Reading →


6:00 am
December 1, 2007
Print Friendly

Why Some Root-Cause Investigations Don't Prevent Recurrence

It doesn’t matter what type of industry you’re in, if failure isn’t an option at your plant, you’ll want to understand why these investigations sometimes fail their mission.

In the nuclear power industry, the primary mission of a root-cause investigation is to understand how and why a failure or a condition adverse to quality has occurred so that it can be prevented from recurring. This is a good practice for many reasons—and a lawful requirement mandated by 10CFR50, Appendix B, Criterion XVI.

To successfully carry out this mission, a root-cause investigation needs to be evidence-driven in accordance with a rigorous application of the bedrock of all root-cause methodologies: the Scientific Method. Consistent with the Scientific Method, underlying assumptions have to be questioned and conclusions have to be consistent with the available evidence, as well as with proven scientific facts and principles.

Sometimes root-cause investigations fail to fulfill their primary mission and the failure recurs. In that regard, diagnosing the root cause of root-cause investigation failures is, in itself, an interesting topic. Here are three common reasons why some root-cause investigations fail their mission.

Reason #1: The Tail Wagging the Dog
As a root-cause investigation proceeds and information about the failure event accumulates, some initial hypotheses can be readily falsified by the preliminary evidence and dismissed from consideration. The diminished pool of remaining hypotheses will likely have some attributes in common. More work is then usually needed to uncover additional evidence to discriminate which of the remaining hypotheses specifically apply.

At this point in the investigation, it may become apparent what the final root cause might be—especially if the remaining pool of hypotheses is small and they all share several important attributes. At the same time, it also becomes apparent what the corresponding corrective actions might be.

By anticipating which corrective actions are more palatable to the client or management, the investigator may begin to unconsciously—or perhaps even consciously—steer the remainder of the investigation to arrive at a root cause whose corresponding corrective actions are less troublesome.

Evidence that appears to support the root cause and lead to more palatable corrective actions is actively sought, while evidence that might falsify the favored root cause is not actively sought. Evidence that could falsify a favored root cause may be dismissed as being irrelevant or not needed. It may be tacitly assumed to not exist, to have disappeared or to be too hard or too expensive to find. It may even just be ignored because so much evidence already exists to support the favored root cause that the investigator presumes he already has the answer.

In logic, this is defined as an a priori methodology. This is where an outcome or conclusion is decided beforehand, and the subsequent investigation is conducted to find support for the foregone conclusion. In this case, the investigator has decided what corrective actions he wants based on convenience to his client or management. Subsequently, he uses the remainder of the investigation to seek evidence that points to a root-cause that corresponds to the corrective actions he desires.


What Really Happened: Failure Of A Zener Diode

This X-ray radiograph shows a 1N752A-type Zener diode that was manufactured without a die-attach at one end of the die, and with only marginal die-attach at the other end. This die-attach defi ciency caused the component to fail unexpectedly in an intermittent fashion. In turn, this led to a failure in the voltage regulator system of an emergency diesel generator system, causing it to be temporarily taken out of service.

The failure of this Zener diode occurred in a circuit board that had seen less than 40 hours of actual service time, although the circuit board itself was over 27 years old. It had been a spare board kept in inventory.

Going to this level of detail to gather evidence might seem extreme. This particular evidence, however, was fundamental to validating the hypothesis that the rootcause in this case was a random failure due to a manufacturing defect, and falsifying the hypothesis that the failure was caused by an infant mortality type failure. In the nuclear power industry, this distinction is significant.

Here is an example: A close-call accident involved overturning a large, heavy, lead-lined box mounted on a relatively tall, small-wheeled cart. The root-cause investigation team found that the box and wheeled cart combination was intrinsically unstable. The top-heavy cart easily tipped when the cart was moved and the front wheels had to swivel, or when the cart was rolled over a carpet edge or floor expansion joint.

The investigation team also found that the personnel who moved the cart in the course of doing cleaning work in the area had done so in violation of an obviously posted sign. The sign stated that prior to moving the cart a supervisor was to be contacted. The personnel, however, inadvertently moved the cart—without contacting a supervisor—in order to clean under and around it.

The easy corrective actions in this case would be to chastise the personnel for not following the posted rules and to strengthen work rule adherence through training and administrative permissions. There is ample evidence to back-fit a root cause to support these actions. Also, such a root-cause finding—and its corresponding corrective actions—are consistent with what everyone else in the industry has done to address the problem, as noted in ample operational experience reports. In the nuclear power industry, the “bandwagon” effect of doing what other plants are doing is very strong.

In short, the aforementioned corrective actions are attractive because they appeal to notions of personal accountability, are cheap to do and can quickly dispose of the problem. Consequently, the root cause of the close-call accident was that the workers failed to follow the rules.

Unfortunately, when the cart and box combination is rolled to a new location, the same problem could recur. The procedure change and additional training might not have fixed the instability problem. While the new administrative permissions and additional training could reduce the probability of recurrence, they would not necessarily eliminate it. When the cart is rolled many times to new locations, it is probable that the problem will eventually recur and perhaps cause a significant injury. This situation is similar to the hockey analogy of “shots on goal.” Even the best goalkeeper can be scored upon if there are enough shots on goal.

Reason #2: Putting Lipstick on a Corpse
In this instance, a failure event has already been successfully investigated. A root cause supported by ample evidence has been determined. Vigorous attempts to falsify the root-cause conclusion have failed. Ok…so far, so good.

On the other hand, perhaps the root-cause conclusion is related to a deficiency involving a friend of the investigator, a manager known to be vindictive and sensitive to criticism or some company entity that, because of previous problems, can’t bear criticism. The latter could include an individual that might get fired if he is found to have caused the problem, an organization that might be fined or sued for violating a regulation or law or a department that might be re-organized or eliminated for repeatedly causing problems. In other words, the root-cause investigator is aware that the actual consequences of identifying and documenting the root cause may be greater than just the corrective actions themselves.

When faced with this dilemma, some investigators attempt to “word-smith” the root-cause report in an eff ort to minimize perceived negative findings and to emphasize perceived positive findings. Instead of using plain, factually descriptive language to describe what occurred, less precise and more positive- sounding language is used. This is called “word-smithing” a report.

“Word-smithed” reports are relatively easy to spot. Instead of using plain modifiers like “deficient” or “inadequate” to describe a process, euphemistic phrases like “less than sufficient” or “less than adequate” are used. Instead of reporting that a component has failed a surveillance test, the component is reported to have “met 95% of its expected goals.” Likewise, instead of reporting that a fire occurred, it is reported that there was a “minor oxidation-reduction reaction that was temporarily unsupervised.”

In such cases, the root-cause report becomes a quasi-public relations document that sometimes has conflicting purposes. Since it is a root-cause report, its primary purpose is supposed to be a no-nonsense, fact-based document that details what went wrong and how to fix it. However, a secondary, perhaps conflicting, purpose is introduced when the same document is used to convince the reader that the failure event and its root cause are not nearly as significant or serious as the reader might otherwise think.

With respect to recurrence, there are two problems with “word-smithing” a root-cause report. Corrective actions work best when they are specific and targeted. A diluted or minimized root-cause, however, is oft en matched to a diluted or minimized corrective action. There is a strong analogy to the practice of medicine in this instance. When a person has an infection, if the degree of infection is underestimated, the medicine dose may be insufficient and the infection may come back.

The second problem is that by putting a positive “spin” on the problem, management may not properly support what needs to be done to fix the problem. Thus, the report succeeds in convincing its audience that the failure event is not a serious problem.

Reason #3: Elementary My Dear Watson
In some ways, root-cause investigations are a lot like “whodunit” novels. Some plant personnel simply can’t resist making a guess about what caused the failure in the same way that mystery buffs often try to second guess who will be revealed to be the murderer at the end of the story. It certainly is fun for a person—and perhaps even a point of pride—if his/her initial guess turns out to be right. Unfortunately, there are circumstances when such a guess can jeopardize the integrity of a root-cause investigation.

The circumstances are as follows:

  • The guess is made by a senior manager involved in the root-cause process.
  • The plant has an authoritarian, chain-of-command style organization.
  • The management culture puts a high premium on being “right,” and has a zero-defects attitude about being “wrong.” the scenario goes something like this:
  • A failure event occurs or a condition adverse to quality is discovered.
  • Some preliminary data is quickly gathered about conditions in the plant when the failure occurred.
  • From this preliminary data, a senior manager guesses that the root-cause will likely be x, because:
    • (1) he/she was once at a plant where the same thing occurred; or
    • (2) applying his/her own engineering acumen, he/she deduces the nature of the failure from the preliminary data, like a Sherlock Holmes or a Miss Marple.
  • Not being particularly eager to prove their senior manager wrong and deal with the consequences, the root-cause team looks for information that supports the manager’s hypothesis.
  • Not surprisingly, the teams find some of this supporting information; the presumption is then made that the cause has been found and field work ceases.
  • A report is prepared, submitted and approved, possibly by the same senior manager that made the Sherlockian guess.
  • The senior manager takes a bow, once again proving why he is a senior manager.

The deficiency in this scenario that can lead to recurrence is the fact that falsification of the favored hypothesis was not pursued. Once a cause was presumed to have been found, significant evidence gathering ceased. (Why waste resources when we already have the answer?) As a result, evidence that may have falsified the hypothesis, or perhaps supported an alternate hypothesis, was left in the field. Again, this is another example of an a priori methodology: where the de facto purpose of the investigation is to gather information that supports the favored hypothesis.

In this regard, there is a famous experiment about directed observation that applies. Test subjects in the experiment were told to watch a volleyball game carefully because they would be questioned about how many times the volleyballs would be tipped into to air by the participants. This they did.

In fact, the test subjects did this so well, they ignored a person dressed in a gorilla suit who sauntered through the gaggle of volleyball players as they played. When the test subjects were asked about what they had observed, they all reported dutifully the number of times the ball was tipped but no one mentioned the gorilla. When they were told about the gorilla, they were incredulous and did not believe that they had missed seeing a gorilla…until they were shown the tape a second time. At that point, they all observed the gorilla. MT

Randall Noon is currently a root-cause team leader at Cooper Nuclear Station. A licensed professional engineer in both the United States and Canada, he has been investigating failures for 30 years. Noon is the author of several articles and texts on failure analysis, including the Engineering Analysis of Fires and Explosions and Forensic Engineering Investigations. He also has contributed two chapters to the popular college text, Forensic Science, edited by James and Nordby. E-mail:

Continue Reading →