Root Cause Failure Analysis (RCFA) Overview

Root Cause Failure Analysis (RCFA) Overview

Root Cause Failure Analysis Overview

RCFA is a systematic and structured process to identify the most effective solutions to eliminate completely, or to manage to the satisfaction of the asset owner, the causes of failures, events or incidents that prevent the asset from achieving and sustaining the business objectives. RCFA is an important part of the continuous improvement process to not just “put out fires” but prevent them from happening or reoccurring.
The goal of an RCFA is to identify:
  1. What happened
  2. How it happened
  3. Why it happened
  4. Actions to prevent reoccurrence
There are many different root cause failure analyses methodologies. The “5 Whys?” was first used during the development of Toyota’s manufacturing process in the 1950s. Other tools exist such as:
  1. Barrier analysis
  2. Change analysis
  3. Defect elimination
  4. Cause and Effect analysis
  5. Fishbone (Ishikawa) • Kepner-Tregoe
ACTOR supports both the informal "5 Whys" process and the formal RCFA process using cause and effect diagrams. The “Informal” approach is used when the effect of the event or failure falls inside the threshold or trigger criteria. It is normally used for on-the-job problem solving or when more detailed feedback is required for work management and control. 
The Formal RCFA process should be used on critical equipment failures or failures with a complex interaction of causes. This approach is used when the effect of the event or failure falls outside the threshold or trigger criteria requiring a formal RCFA approach. 

The Formal Approach

The formal RCFA process should be used on critical equipment failures or failures with a complex interaction of causes. RCFA is usually triggered by an undesirable event which manifest in different ways. Two main categories are sporadic events and chronic events. Sporadic events are normally once-off failures that lead to a high consequence event. Chronic failures occur over a period of time and may accumulate to become a situation that becomes unacceptable to the organization.

Steps to be followed for the formal approach

  1. Define the Undesirable Event. It is important that the actual event is clearly defined using objective measures. Different people see things differently. Often different people will look at the same problem but interpret it differently. When defining the actual problem, do not try to identify the solution - DEFINE THE PROBLEM ONLY! The Undesirable Event description should:
  1. Provide a detailed description of the event as it was observed or experienced (based on facts and not assumptions).
  2. Describe HOW the event took place not WHY!
  3. Should also state: where and when the event occurred, the timing, sequence of events and pattern or trend of the undesirable event.  
  1. Define equipment and boundaries for analysisOnce the Undesirable Event has been defined it is important to define the equipment, system, process involved and the boundaries of the analysis. The boundaries of the system under examination may expand once the analysis is underway because the underlying root cause of the problem may be within an external system, e.g., “insufficient instrument air supply”. Regardless, it is still important to manage the boundaries of the analysis. 
  1. Define the Problem statement and/or performance gap.
The problem statement is a functional description of the Undesirable Event/Failure. The statement includes:
  1. The definition of the function that was affected by the undesirable event - What is the equipment, process or system supposed to do or what is the equipment not doing because of the Undesirable Event?
  2. The impact on safety (if any).
  3. The impact on the environment (if any).
  4. The impact on production including loss of production.
A function statement always starts with the word “TO” contains a “VERB” and “OBJECT” and at least one “PERFORMANCE STANDARD.”
For example: “To safely transport 20 tons of broken coal from coalface to the tipping point at a maximum speed of 2.1 mph.” 
  1. Perform a Combined Cause and Effect analysis.
Failures can be caused by a combination (chain) of events or a system of causes and conditions; there is not necessarily one correct answer to the problem. This requires a thinking process for viewing the problem holistically (define and quantify the problem), and secondly for analyzing the relationship between the problem causes and conditions.
For example: What is the cause of a car being stuck on the highway? Start by listing the “obvious” things that can cause the problem and then work down the chain. We need to understand the relationship (links) between the different causes and contributing conditions and then determine the most effective way of breaking the link(s) in the chain to prevent the Undesirable Event.
The main characteristics of the Combined Cause & Effect Analysis:
  1. Causes can become effects depending on the level of analysis.
  2. Causes and effects can become a “never ending” chain of causes and effects.
  3. We need to establish whether a specific condition(s) other than the cause, could have contributed to the previous level effect; these are Contributing Conditions.
  4. Sometimes the presence of a functional secondary protective system or device could have prevented the chain of events from developing. This is called a “Barrier Analysis.” For example, when a breaker could have prevented a fault or when a pressure relief valve could have prevented overpressure.
  5. An effect exists only if its causes and conditions exist at the same point in time and space.
  6. Cause and effect analysis should consider all categories of possible causes.

Barrier Analysis:
Sometimes the presence of a functional secondary protective system or device (barrier) can prevent the effect or could reduce its severity.



To complete the Cause-and-Effect Analysis, initial “assumptions” must be identified to ensure an auditable and credible result. 
  1. Identify Effective Solutions. 
  1. An effective solution is a specific action that when applied to the Root Cause of an Undesirable Event results in the elimination of the Cause, or the management of the effect(s) of the Root Cause, to a level defined as tolerable.
  2. RCM proactively seeks to identify failure management strategies to manage the risks associated with likely Failure Causes or Failure Modes. Similarly, RCFA (reactively) seeks to manage risks associated with Failure Modes that have already occurred through considering the same strategies. 
  1. Implement Effective Solutions.
Implementing the recommended Effective Solutions is probably the most important step in the process. This will not only ensure the return on investment and realized opportunities, but also the total buy-in and commitment from:
  1. Management
  2. All people who participated in the process
Successfully implementing the suggested Effective Solutions will ensure the long-term endurance of the RCFA program. 
To ensure the solutions are sustainable, the team needs to:
  1. Demonstrate the value.
  2. Present verified facts, not assumptions.
  3. Demonstrate sustainability of results.  
  1. Monitor results as part of continuous improvement.
Once the solutions to prevent the recurrence or manage the consequences of the Undesirable Event have been identified and implemented, continuous monitoring of the results must take place to ensure that the solutions are effective and the benefits of implementing the solutions can endure.

The Outcome of RCFA Analyses 

The outcome of the RCFA analysis are action items to prevent recurrence including:
  1. Executive summary
  2. Findings
  3. Recommendations and responsibilities
    1. New or updated maintenance tasks
    2. One-time changes 
      1. Redesigns and modifications
      2. Operating and maintenance procedures
      3. Training 
  4. Cause and effects worksheets
  5. Supporting documentation 

The 5 Whys Approach

The 5 Whys process is an informal approach to Root Cause Analysis and is used for incidents or failures with less critical impacts. By repeatedly asking the question "Why?" (five is a good rule of thumb), you can peel away the layers of symptoms which can lead to the root cause of a problem. Very often the ostensible reason for a problem will lead you to another question. Although this technique is called "5 Whys," you may find that you will need to ask the question fewer or more times than five before you find the issue related to a problem.

The steps of the “5 Whys” technique

  1. Identify and define the equipment/process on which the event or failure has happened. The first step is to identify an event, incident or failure that matters to an organisation or has met a trigger for an RCFA “5 Whys” analysis to prevent a repeat episode. The equipment or process could involve physical equipment failure, system failure, procedural failure etc. It is critical to ensure initial evidence is preserved and data is collected for the RCFA investigation.
  2. Identify the people involved or who can help with the investigation (artisan, operator, foreman, CBM technician). The event, incident or failure could involve physical equipment failure, system failure, procedural failure, etc.
    1. People who were directly involved or affected.
    2. People who are knowledgeable about the equipment or process involved in the undesirable event.
  3. Define the problem (event or failure). The most important step in an RCFA process is to define the problem and verify all participants understand and agree upon the problem.
    1. Write down and communicate the problem to be analysed.
      1. How did the failure become evident? (What was observed?)
      2. Where did the event occur?
      3. When did the event occur?
      4. What is the evidence of the event (Sequence of events)?
    2. Reach consensus on the definition of the problem. 
  4. Brainstorm and list the possible causes. Brainstorming combines an informal approach to problem solving with lateral thinking. It encourages participants to come up with thoughts and ideas that can, at first, seem a bit crazy. Some of these ideas can be crafted into original, creative solutions to a problem, while others can spark even more ideas. This helps to get people unstuck by "jolting" them out of their normal ways of thinking.”
    1. Brainstorming is useful because it can help a group of people utilise its collective brainpower to generate many ideas in a short period of time. It stimulates creativity and promotes involvement and participation.
      1. Assemble a diverse group for input to the analysis.
      2. Use observable, verifiable data to describe problems and effects.
      3. Refrain from blame or judgement of ideas during divergent thinking steps.
      4. Identify assumptions and biases as they arise only in convergent thinking steps. 
  5. Select the most likely cause (and justify why it was selected). Based on knowledge, information and mutual consensus select the most likely cause out of the ones listed.
  6. Apply the “5 Whys” principle (on the selected “most likely” cause). The 5 Whys method is the simplest Root Cause Analysis process and involves repeatedly asking “Why?” at least five times or until the question yields no answers. Five is an arbitrary figure; success may sometimes require more than 5 “whys?” before the actual root cause is identified, but after asking “why?” five times, one is likely to arrive at the root cause. The root cause has been identified when asking “why?” no longer provides any useful information. This method produces a linear set of causal relationships and uses the experience of the problem owner to determine the root cause and corresponding solutions.
  7. Repeat asking why until the “Root Cause” is identified. Verify the 5 Whys logic in the following way:
    1. A logical link between the event/failure being analyzed and the identified causes must exist going “down” the cause chain.
    2. A logical link between the identified solution, causes and the event/failure being analyzed must exist going “up” the cause chain. 
  8. Suggest possible solutions. At the lowest root cause level, propose solutions to prevent reoccurrence of the problem. Limit solutions to those that can be implemented within the organization’s control. 
  9. Validate the solution. 
    1. Test the solution to ensure that it will prevent the identified problem from happening again or will satisfactorily manage risk (reduce risk to a tolerable level).
    2. Ensure that the solution will be “easy” to implement.
    3. Ensure that the solution will be cost effective. 

 
    • Related Articles

    • How to Construct the Cause-and-Effect Diagram for a Formal RCFA Analysis

      Constructing the Cause-and-Effect Diagram Remember, the purpose of a Cause-and-Effect Diagram is to identify the causes or conditions that produces an effect. By understanding this, we can then identify the relationships (links) between the different ...
    • ACTOR Overview

      ACTOR Modules: ACTOR comes standard with four modules namely: Administration Module: The Administration module is used to set-up profiles, roles and security. Asset Management Data (AMD): This module is a centralized platform to set up location and ...
    • How to add the RCM Failure Mechanism Base file

      Failure Mechanisms (Failure Mode Mechanisms) Failure Mechanisms describe the process or sequence of events leading to the cause of the failed state and is the root cause of the failure. For example: Dirt buildup Normal wear Loose connection This Base ...
    • How to add an RCM Failure Mode Base File

      Failure Mode RCM Type is used to distinguish between an RCM2™ and RCM3™ "Failure Mode". RCM3™ Failure Modes Explained Failure Mode is a term used to describe any event which causes a functional failure. A failure mode is the combination of a cause, ...
    • How to Add Problem Details to a Formal RCFA Analysis

      Adding the Problem Details The Problem Details are the functional description of the Undesirable Event/Failure. It includes: The definition of the function that was affected by the undesirable event - What is the equipment, process or system supposed ...