Root cause analysis (RCA) refers to the process of uncovering the primary sources of equipment, production, and system problems to eliminate future upsets. Instead of just focusing on “the symptom,” RCA focuses on:
- Identifying the problem
- Analyzing how it happened
- Developing steps for prevention
However, it’s important to recognize that this problem-solving process isn’t a one-size-fits-all methodology. There are several different processes, tools, and philosophies used to accomplish RCA in the maintenance world.
RCA Basics
World-class maintenance programs work smarter, not harder. This requires conducting deeper diagnoses to identify underlying causes to fix problems once and for all. Often, equipment failure isn’t caused by a single mishap. Instead, it’s the result of several process failures, technical issues, and oversights that could be avoided with proper planning. RCA takes a systematic approach to problem-solving, ensuring diagnostic conclusions are always backed by evidence.
Although mostly used in corrective maintenance situations, root cause analysis becomes proactive maintenance when performed correctly. As previously mentioned, RCA has the following components:
- Defining the Problem: Beyond describing the issue, defining the problem involves recognizing its impact on organizational goals.
- Performing the Analysis: Problems are broken down using Cause Maps, which seek to outline all possible causes within every step of the workflow.
- Implementing the Solution: Ideal solutions are implemented to change maintenance processes, protect company goals, and achieve stable production.
To get to the root cause of equipment failure and streamline production, organizations often use a combination of the Five Whys Technique and the Lean Sigma Strategy (more on those in a moment).
When conducting RCA, cross-functional teams must work together to understand the problem from all viewpoints. This ensures an all-around solution is developed to deal with the problem. Industries that most commonly use RCA include telecommunications, manufacturing, nuclear energy, aviation, rail transport, and healthcare.
Examples of Root Cause Analysis
Here are a few examples of RCA application in real-life situations:
- Turbine Failure: Say a turbine fails because it’s out of balance. On the surface, the problem may be caused by misalignment. However, a root cause analysis may reveal the misalignment occurred because preventive maintenance (PM) inspections weren’t conducted in time. Recurring work order inspections might now be scheduled to reduce the likelihood of future turbine failures. In this case, the root cause of the problem was a procedural issue.
- Blown Fuse: Say a machine overloads and blows a fuse. Initial investigations reveal the incident was probably caused by an insufficiently lubricated bearing. At this point, the facility could simply replace the broken parts. However, it’s only a matter of time before the problem occurs again. A root cause analysis may reveal the pump responsible for automatically lubricating the bearing was damaged from a lodged metal scrap. In this instance, the root cause of the problem would be a design failure.
Notably, RCA also is used in industrial processes for quality control, in IT and communications to investigate security breaches, and in systemic risk and change management. In healthcare, RCA is useful in medical diagnosis and identifying the source of infectious diseases in epidemiology.
The Six Phases of Root Cause Analysis
An organization should decide which problems will be subjected to RCA before starting the process. Failure can be a result of procedural, material, equipment, personnel, or environmental factors. Organizations should give priority to problems that impact safety, production, or costs. A complete RCA cycle usually has six phases:
Phase 1: Note All the Possible Causes
First, list all the potential causes of a problem or an occurrence. Organizations should embrace neutrality and focus on facts as they strive to put everything related to the problem in context. If facts aren’t available, then the organization should look for secondary sources or construct possible scenarios. Historical data can help to understand the problem better.
Phase 2: Gather Additional Data and Evidence
In this phase, the Five Whys come into play. The Five Whys is simply a method for drilling down to find the root cause of any given maintenance mishap. For example:
- Why did the maintenance tech slip?
- Because there was an oil leak near the equipment.
- Why was there a leak?
- Because a seal had deteriorated.
- Why had the seal deteriorated?
- Because it was too weak for its application.
- Why was the seal too weak for its use?
- Because we purchased a cheaper seal.
- Why did we purchase a cheaper seal?
- The service manual didn’t specify a seal type.
Essentially, the organization should collect as much relevant data as possible toward identifying the issue’s underlying cause. Sources of information may include data from computerized maintenance management systems (CMMS), other databases, digital files, and documents. Facility managers and personnel can also be asked to provide clarity on the problem.
Phase 3: Identify Contributing Factors
Everything that contributed to the problem should be identified. No matter how little, every change should be noted, and evidence of the changes besides the main problem gathered. Evidence can be physical, paper, personnel, or recorded.
Phase 4: Analyze Data
At this stage, the organization should analyze the collected data and evidence. Events and changes should be categorized depending on the level of influence the organization has over them. Events can be unrelated, a correlating factor, a contributing factor, or the root cause.
- Unrelated events have no effect or impact on the problem.
- Correlating factors may be related to the problem but may not have a direct impact.
- Contributing factors have a direct relation to the problem. They can either be wholly or partly responsible for the problem.
- The root cause, as defined earlier, is the underlying problem the organization seeks to address.
Phase 5: Develop a Plan to Prevent Future Occurrences
At this stage, the goal is to develop a plan to curb future breakdowns. The organization aims to eliminate the root cause of the problem. Examples include a change in suppliers, a new approach to maintenance, and more robust operator training. The solution should apply to more than a single situation and be repeatable. Preventive actions developed in this phase shouldn’t lead to other problems.
Phase 6: Implement the Plan
Lastly, the plan developed in the previous phase needs to be implemented. Several factors to consider in this phase depend on the type of the problem, its severity, complexity, and the prevention plan. They include:
- Maintenance personnel who handle the assets
- Non-maintenance personnel who handle the assets
- Status and condition of the assets
- Maintenance processes related to the assets
- Process that impact the root cause but have no relation to asset maintenance
After completing the RCA cycle, periodic reviews should be performed to determine the effectiveness of the RCA exercise. The organization should weigh the costs of the RCA process against the costs of equipment downtimes and other impacts caused by the problem.
Popular Tools in Root Cause Analysis
The most common tools to conduct root cause analysis include:
- The Five Whys: As previously mentioned, “The Five Whys” is a problem-solving method designed to uncover the underlying source of any issue. The point is to ask “why” until a problem’s root cause is revealed. Popularized by the [Toyota Production System](https://global.toyota/en/company/vision-and-philosophy/production-system/#:~:text=The%20Toyota%20Production%20System%20(TPS,concept%2C%20in%20which%20each%20process) during the 1970s, the number five is admittedly arbitrary to the process. Some issues may be resolved by drilling down with three questions, while others may require several more. Ultimately, The Five Whys provides a simple path for organizations wishing to move from reactive maintenance to preventive maintenance strategies; it allows for systematic problem-solving without statistical analysis.
- Fault Tree Analysis: Fault Tree Analysis takes a top-down approach to identify the root cause of a problem. It relies on logical expressions of problems by adding every situation-causing effect to the tree before analyzing them.
- Pareto Analysis: This is a statistical technique in which only a select number of tasks are analyzed depending on their impact on the overall effect. It’s based on the Pareto Principle that argues 80 percent of problems are the result of 20 percent of causes.
Other tools sometimes used in root cause analysis include Barrier Analysis, Change Analysis, Fish-Bone/ Ishikawa Diagram, Failure Tree Analysis, and Failure Mode and Effects Analysis.
How Do you Perform a Root Cause Analysis?
No matter the specific tool or problem-solving method used, the effectiveness of RCA depends on deeply digging into the following:
- Preparing Data: It’s essential first to collect all the key information. This includes an account of events and the number of unscheduled downtimes, among other metrics.
- Mobilizing Personnel: Secondly, a cross-functional team should be put together to solve the problem. This ensures that the solution developed covers all bases and is effective.
- Addressing Process Issues: Even though 12 percent of failures result from operator error, the focus should be on procedural problems. Too often, organizations blame someone for a failure instead of addressing the process gap that allowed human error to occur.
- Asking Why: A comprehensive approach is important if the team is to uncover the root cause of the problem. The process should continue to ask “Why” until the team gets to the bottom of the matter.
Finally, some problems have multiple root causes. That’s why organizations should look at all areas and possibilities when addressing a problem.
Other FAQs
How Long Should a Root Cause Analysis Take?
A single RCA cycle can wrap up within weeks or drag on for a few months. It depends on several factors, such as the complexity of the problem, the complexity of the troubleshooting tests, the availability of data and witnesses, regulatory requirements, how deep the investigations will go, and the level of proof needed to support conclusions.
What Is the Relationship between Root Cause Analysis (RCA) and Reliability-Centered Maintenance (RCM)?
RCA looks to answer why a failure occurred. On the other hand, RCM seeks to identify different failure modes for an asset or process. RCA is reactive, while RCM is proactive. Even with a proactive approach to maintenance, failure is inevitable. By performing RCA, organizations can understand better how assets and processes fail and how to stop the failures from recurring. This helps them to increase proactive maintenance.
Caroline Eisner
Caroline Eisner is a writer and editor with experience across the profit and nonprofit sectors, government, education, and financial organizations. She has held leadership positions in K16 institutions and has led large-scale digital projects, interactive websites, and a business writing consultancy.