Repairing Unwanted Change
Despite our best efforts (or because our efforts were actually quite meager) unwanted and undesirable changes creep into our organizations. Sometimes they show up with a dramatic entrance – literally they arrive with a bang. In other cases, these changes are detected by the systems that are put in place to control the quality of processes and systems, such as inspections, testing, and measurement. However, It is not uncommon for unwanted changes to remain undetected until they are brought to light through assessment systems, such as customer surveys, audits, and peer reviews.
Leaders need to champion the diagnosis of the causes of deviations, encouraging the organization to dig deeply enough to ascertain the root cause of the deviation, and to identify and implement the actions required to correct the immediate cause of the deviation as well as actions that address the systemic issues (root causes) that enabled the deviation to occur.
Start with Understanding the Immediate Cause
When there is a deviation in performance that leads to undesired results in a process or system, we call it a problem and jump in, wanting to fix it. Dr. Charles Kepner and Dr. Benjamin Tregoe pointed out that the use of the word “problem” is in itself problematic, since we use this word in our language to describe many different things, such as the need to make a decision, or the need to develop a plan, as well as the need to find the unknown cause of a deviation in a process or system.(1)
The first task is to define the “problem” as a situation where the performance of the process or system does not conform to expected, proven levels of performance. Something has changed and this change has adversely impacted the performance. These situations would fall into Shewhart and Deming’s definition of a special cause of deviation in that they are not built into the system.
The process of correcting a deviation starts with the identification of the immediate cause. The immediate cause will be “the action, or series of actions that directly creates the difference between expected and real performance.” Sometimes the immediate cause is readily apparent. A valve was perhaps left open, a sponge was left inside a patient, an error was made in reading a procedure, or someone forgot to send in a report.
But, in other cases, the immediate cause of a deviation may not be so obvious and may require extensive investigation, such as in the crash of an airplane, a series of infections in a hospital, or a rash of customer complaints. Any investigation into the immediate cause of a deviation focuses on defining the deviation and narrowing the scope of possible causes by describing what the problem is, where it exists, when it occurs, and whether it is increasing in scope, staying the same, or decreasing in scope. Dr. Charles Kepner and Dr. Benjamin Tregoe described one of the best questioning methods for identifying the immediate cause of a deviation through their research with the Rand Corporation on how scientists ask questions to solve problems. (2)
The Kepner-Tregoe questioning methodology is powerful because it requires the practitioner to ask what the scope of the problem is, and what it is not. One must ask where the problem exists and where it does not. This questioning process seeks to define when the problem occurs and when it does not. And what is the extent of the problem, while considering what the extent could possibly be, but is not.
Most people have encountered this type of questioning process at work when they visit a doctor. It is the standard approach to diagnosing a health issue by carefully asking IS and IS NOT questions to define the situation and using this information to quickly eliminate potential causes of an illness. Having a cough, but not having a fever, immediately rules out a variety of possible immediate causes of the patient’s symptoms.
For complicated issues, it is advisable to collect the IS and IS NOT information on a grid and place this information on a flip chart pad or project it on a screen where people can collectively examine the information simultaneously in order to generate possible causes that fit the description.
It is important not to generate possible causes until after a deviation has been thoroughly described. Possible causes should be based on what the IS and IS NOT information reveal and should be tested destructively against the description. Your doctor does this routinely. You may think you have bubonic plague, but the doctor rules it out because the plague does not explain your specific set of symptoms. The true cause of a deviation will explain the description of the deviation.
We Want to Take Action
When there is a deviation, everyone wants to take action. Without a careful description of the deviation, along with a thorough generation of potential causes, and the testing of potential causes against the description of the deviation, the actions that are taken may be a waste of time, resources, good will, and energy and may actually make the situation even worse. Unfortunately, when a deviation occurs, some people will see it as an opportunity to take actions that are completely unrelated to the problem, but which they can tie to the problem in some arcane manner. A scapegoat may be identified to blame and prejudices might displace rational thinking.
If the immediate cause of a deviation is truly identified, then people must make a difficult choice in terms of the type of action that should be taken. Some actions provide short-term, temporary relief that may be inexpensive and that only require that one does know the immediate cause of the problem. These are adaptive actions and they offer some benefits and drawbacks. A permanent corrective action, on the other hand, may require further investigation to determine the systemic issues in the organization that enabled the deviation to occur in the first place, which would be the root cause. This could take longer and may have greater immediate costs, but costs in the long term will be lower if it prevents recurrence of the deviation.
Around the house, we might decide to use a short-term solution to a deviation. If our washing machine throws itself out of balance when it goes into the spin cycle, we might decide to simply wedge a piece of wood under one corner of the machine if that stops it from going out of balance. In doing this, you may congratulate yourself for avoiding an expensive service call.
In the same manner, people in the workplace may implement an adaptive action to solve a problem. Perhaps some tubing that supplies coolant to a pump needs repair. Proper procedure might require consulting an engineer to analyze the failure and ensure that maintenance procedures have been adequate. However, the supervisor might just go ahead and replace the tubing and not enter the fact that this was done in any type of log or record. It saved time and money, after all, so we just took a quick shortcut.
Adaptive actions are seductive. These quick fixes to immediate causes of deviations may seem to save money, time and effort. And, to implement the fix, one only needs to understand the immediate cause of the deviation and does not need to spend time digging deeper into the conditions that enabled the deviation to exist – questions that might turn out to be embarrassing to someone.
However, settling for adaptive actions can lead to disaster. When, over time, adaptive actions are heaped on top of other adaptive actions, and no one records these fixes, the process or system can lose what is referred to as its configuration management. Someone designed the process to work in a certain manner and while an individual adaptive action may not compromise the design of the process, repeated adaptive actions may gradually result in what is, in essence, a redesign of the process or system, without anyone really being aware of it.
So, while adaptive actions are appealing, in order to truly address the causes of a deviation, and to ensure that adaptive actions do not introduce other undesirable changes, it is important to engage in root cause thinking and analysis.
Digging for Root Causes
The root cause of a deviation is “the most basic causal factor or factors that, if corrected or removed, will prevent the recurrence of the situation.” Root cause thinking encourages us to understand where root causes reside within organizations and how to ask appropriate questions to dig down to them.
Just as in nature, where roots are found in the soil, in organizations the soil consists of the cognitive factors pertaining to what people in the organization think and believe. These thoughts and beliefs (which are organized as cognitive systems as mentioned in the Introduction) will determine how the organization assesses itself, how it controls undesirable changes, how it addresses deviations, how it improves itself, how it re-invents itself over time.
Systematic root cause thinking has many of its origins in the nuclear NAVY, being attributed to the rigorous approach to questioning required by Admiral Hyman Rickhover. A robust set of root cause analysis methods was developed by the engineers who designed, fabricated, and operated nuclear power reactors for the submarine fleet. Early publications in this field originated with the Idaho National Engineering Lab (4) and the Savanah River National Laboratory (5), both of which were involved in the design of naval reactors and their fuel cycles. These methods were infused into the nuclear power generation industry through the Institute of Nuclear Power Operations, which was led largely by former naval nuclear officers, such as my own mentor, Admiral Hugo Marxer. All of the reliable publications on root cause analysis have been written by individuals who came up through the nuclear industry, such as Dean Gano (6), Max Ammerman (7), and Mark Paradise (8). This author was directed to study and teach root cause analysis by Admiral Paul Early in order to clear the restart of the High Flux Research Reactor at the Oak Ridge National Laboratory. Admiral Early later sent the author to conduct root cause analysis workshops for the Brookhaven National Laboratory in addressing their numerous environmental issues, and taught root cause analysis methods at the nuclear weapons complex and at the Department of Energy’s uranium enrichment facilities.
Pulling the Thread
Admiral Rickhover insisted that every officer and sailor on board a nuclear vessel must take personal responsibility for ensuring the conduct of operations. If anyone observed any condition, action, or potential problem that did not conform to the exacting requirements onboard a submarine powered with a nuclear reactor, they were to point this out immediately. The practices is referred to by some as “pulling the thread.” If you see a loose thread, then you pull on it to find out what it connects to.
What “pulling the thread” means is that you actively question operating conditions. You ask questions, even though the answer to the question may prove to be embarrassing. Others in the quality field have referred to this as the Five Whys – not letting go of a deviation until you have asked why at least five times. The Five Why methodology is attributed to Dr. Toyoto, founder of the Toyata Motor Company. Kepner and Tregoe referred to this as “thinking beyond the fix.”(3)
The search for the root cause is a questioning process that sometimes means asking difficult and potentially embarrassing questions about how an organization is managed. This questioning process may be avoided because of many reasons internal to an organization that will be discussed in the later chapters on autocratic leadership and endullment. Suffice it to say that asking embarrassing questions may not always be great for one’s career or situation in life, as Socrates may have noted.
A great deal of good can come about simply by pulling the thread and seeing where it leads you. Of course you do not want to simply ask Why? Over and over again – sounding like a broken record. The artful practice of pulling the thread will employ a variety of ways of asking why. “What do we know about this?” “How do you think it came to be that people would interpret things in this way?” “What makes us think this is so?”
How Far Do You Question?
Root causes reside in the beliefs that the people in an organization hold to be the truth about their organization and in the systems those beliefs support. You will know when you have reached the root cause when you get to a point where people hate to admit what the answers to the questioning process reveal about their organization.
You can see it in people’s eyes. I have found that in leading a questioning process, people will avoid eye contact. They start by looking at my feet, as if to question whether this southerner is even wearing shoes, given the crazy questions he is asking. Then, as they start to speak up, they start looking away, searching for some anonymity as they start to reveal the answers to the questions. Finally, they end up looking at their own feet, either in embarrassment or relief that what could not have been said – what should not have been said about the prevalent beliefs and practices in the organization – has finally been said. Then, healing can begin.
But, do not go too far in pulling the thread by roaming into theology. Why did the operator make the mistake? Because he was human. Why do humans make mistakes? Because man is born to trouble as surely as the sparks fly upward. You’ve gone too far when you wander into theology and you need to back up.
Tools for Digging
It is sometimes beneficial to use some tools to bring structure to the questioning process when the information you seek is difficult to sort out. Not every immediate cause to a deviation lends itself to simply pulling the thread. Sometimes there are some knots that must be worked out.
There are three tools that were developed in the nuclear industry that are taught and used to provide focus for root cause questioning: Event and Causal Factor Analysis, Barrier Analysis, and Change Analysis. It should be noted that none of these methods are merely the Cause and Effect Diagrams that were advanced by Dr. Kaoru Ishikawa in his Guide to Quality Control. Those tools are highly effective for considering all of the factors that may come into play while improving a process, but do not necessarily offer a systematic approach to discovering root causes.
Event and Causal Factor Analysis
In some cases it is important to clarify the chronological sequence of events that led up to a deviation. This is particularly true if the deviation is an accident or incident that occurs within a temporal set of circumstances.
An Event and Cause Factor Analysis starts by defining the events (in chronological sequence) that happened leading up to the deviation (or accident, spill, explosion) and the events immediately following the deviation. These are organized as a flow chart using opaque boxes to describe each step before and after the deviation and showing the deviation in a transparent box, as illustrated in Figure One, which is a diagram for an automobile accident investigation.(4)
Figure 1
Steps leading up to and after the event are placed in opaque boxes. The event itself is represented in a transparent box.
After identifying the stages leading up to the accident, you move into the questioning process. For each box, you ask, “What caused this?” or “What do we know about this?” or “How did this come about?” The diagram provides focus for pulling the thread. Place each answer that you develop on the chart. When you learn that the truck was moving too slowly for highway conditions, ask why again. When you learn that the truck was overloaded, continue to ask why. Place these revelations on the chart as ovals, with arrows connecting them to the appropriate action step.
Barrier Analysis (Safeguard Analysis)
Barrier Analysis offers another structured way to visualize events related to a deviation when some component of the system that was supposed to protect the work process failed or was not in place.(9) This methodology was developed for use in the nuclear industry but was embraced by the medical community after the Joint Commission for Accreditation of Health Care Organizations launched a major campaign to reduce adverse and sentinel events. JCAHO’s primary reference on root cause analysis was an article written by this author entitled “In Search of the Root Cause” published in Quality Progress in 1991.(10) Meri Curtis and I included Safeguard Analysis in Diagnosing and Preventing Adverse and Sentinel Events, published in 2001 for the healthcare profession.(11)
Barrier Analysis and Safeguard Analysis are used to identify barriers, safeguards, and controls that will remove or reduce hazards, enforce compliance with procedures, and make targets (including people) safe from hazards. The analysis is shown visually with a chart that identifies the source of a problem, safeguards that are in place, and the target or victim of an event, as shown in Figure 2.
Figure 2
A common example of a barrier analysis concerns the transportation of oil in tankers, most notably a well-known oil spill. The source of the problem is the crude oil being carried in a tanker. The potential target or victim is the marine life and shorelines in the area that will be affected by a spill.
There are several possible barriers (or safeguards) that will keep the oil from affecting the marine life and minimize the potential of an accident and the results of an accident. There are design barriers, training and qualification barriers, measurement equipment such as radar to keep from running aground, containment equipment, and clean-up procedures.
As a design barrier, an oil tanker might be designed with a double hull to prevent rupturing and the oil within might be stored in several internal holds. A tanker designed with a single hull and a small number of holds might be more efficient in passage, but ineffective as a barrier in the event that the tanker runs aground.
A specific number of staff, having special qualifications to handle certain valves or navigate certain waters, might be required to operate the tanker. However, if an inadequate crew is allowed to operate the ship and if an underqualified helmsman pilots the tanker, there will be no effective barrier against collision or grounding.
A calibration program will serve as a safeguard to ensure that the instrumentation being used (in this case, radar) provides the pilot with accurate information, if the calibration program is actually implemented.
Even if a tanker runs aground and begins to spill its cargo, strategically located containment and clean-up equipment can minimize the damage. However, if the equipment is not available or the operators are not well trained, the clean-up operation might be an inadequate barrier to protect the victim.
Barriers might be physical requirements, special procedures, or implemented assurance activities. An organization can impose upon itself administrative and verification controls that serve as barriers against potential problems. For instance, an organization might require that designs be independently reviewed by a qualified engineer, separate from the original designer, to ensure the quality of the design. The barrier analysis allows one to consider what barriers were in place and worked, what barriers were in place that failed, and to consider what barriers might have been effective but were not in place.
The identification of effective, ineffective, and missing barriers launches the process of asking questions and pulling the thread. Why was this barrier ineffective? Why did people work around the safeguards? Why was a vital potential barrier absent?
It is well recognized that there is a hierarchy of effectiveness among safeguards and barriers. The most effective are physical safeguards that are built into the system, such as a double hull on a tanker or a lock on a door. Alarms can be an effective safeguard unless they are so common that people in the workplace ignore them, which has been the case in some control rooms and in some airplane cockpits. Written warnings have some potential value as safeguards, if the organization actually follows its procedures. Having procedures and following them are two different things and this is influenced by the culture of the organization. The weakest safeguards (and often the ones most preferred by management) are the administrative safeguards – the written policy – which may be completely ignored by employees and supervisors alike. Checklists, for example, may be effective safeguards, but only if they are rigorously used. Pull the thread in these areas and you will most likely get into the depths wherein the root causes exist.
Change Analysis
The third tool for facilitating the questioning process for root cause analysis is referred to as change analysis, which brings us back to Whitehead’s notion of what is going on all around us all of the time. In change analysis the questioner compares the present state of the system (the real non-functioning situation) with a prior point in time when the system was working effectively. This then prompts the question, “What has changed?” The object is to identify what has changed in the system between the time it worked and the time the deviation occurred. Investigating these changes will determine whether they had a significant effect.(7)
In many cases it is useful to ask questions about what has changed in the major components of a system that Dr. Kaoru Ishikawa used to organize his cause and effect analysis. Study what has changed among people, processes, equipment, material, and the environment. For each change, pull the thread to see what has happened and if it created the condition for the immediate cause of the deviation.
Taking Corrective Actions
The deviations that cause unwanted change may be corrected by taking actions that address the immediate cause of the deviation. Some actions directed at the immediate cause will be sufficient to resolve the problem and correct the situation, with no further analysis needed. Under the concept of a graded approach to quality, organizations are advised that the formality in identifying and correcting deviations should increase as the potential seriousness of a work process increases. Organizations should consider implementing a formal corrective action system to document when problems are reported, the immediate cause that is determined for a problem, the corrective action, and the verification that corrective actions were actually taken and were effective.
However, there is also the temptation to employ adaptive actions that allow you to temporarily live with a problem. As previously noted, the repeated reliance on adaptive actions may be appealing but it may undermine the configuration of a process or piece of equipment and can potentially lead to unforeseen and dire consequences.
When a significant deviation occurs, or when a pattern of deviations occur in an organization, it is important to not just stop the investigation once the immediate cause is understood. There is too much at stake to stop here.
Digging deeper to understand the root cause of a deviation enables leaders to more fully understand what is going on in a work system. Root causes often consist of belief systems that encourage people to ignore facts, break rules, take shortcuts, refuse to listen, and cover things up. The development of solutions to root causes often requires the use of methods for studying systems, such as Kurt Lewin’s approach to force field analysis. Leaders must be willing to state the undesired beliefs or practices and ask what forces inside and outside the organization are reinforcing these beliefs and what forces would be necessary to change these beliefs. This leads to developing strategies to weaken the forces that reinforce the undesired behaviors and beliefs and to strengthen the forces that will encourage a desirable change in beliefs and behaviors.
(1) Charles Kepner and Benjamin Tregoe. The New Rational Manager. Princeton Research Press, 1982.
(2) Ibid.
(3) Ray Fielding. “Accident Investigation Methodology” Idaho National Engineer Laboratory, 1989.
(4) “Events and Causal Factors Charting. System Safety Development Center, EG&G, 1978.
(5) Dean Gano. “Root Cause and How to Find It” Nuclear News, August, 1987.
(6) Max Ammerman. The Root Cause Analysis Handbook, Quality Resources, 1998.
(7) Mike Paradise. TapRoot Training handouts.
(8) Kepner and Tregoe.
(9) Institute of Nuclear Power Operation, OE-904, Root Cause Evaluation of Human Performance Events, 1985.
(10) Dean Gano.
(11) John Robert Dew. “In Search of the Root Cause.” Quality Progress, March 1991.
(12) Linda Blankenship. “Accident Investigation Techniques.” Martin Marietta Energy Systems, 1987.