Loading...

Please Wait...

PT Notes

Part 1 - Understanding Dependent Failures in Process Safety

PT Notes is a series of topical technical notes on process safety provided periodically by Primatech for your benefit. Please feel free to provide feedback.

Introduction

Dependent failures in processes must be identified and managed as they can result in catastrophic process safety incidents. Dependent failures occur when the failure of one component or element of a system is not independent of the failure of another. The likelihood of one component failing is influenced by the failure of another component, or by a common cause or condition. This type of failure contrasts with independent failures, where the failure of one component does not affect the likelihood of another component failing. Dependencies can arise from a variety of factors, including design, operational processes, environmental conditions, and human interactions with a process.

The significance of dependent failures is that they can be as likely to occur as independent single failures yet are harder to identify and address.

Dependent failures are particularly important in complex systems where components in processes are interconnected or influenced by shared external factors. This situation is common in the process industries. Understanding and managing these dependencies is crucial for accurate risk assessments and effective system design, especially in safety-critical environments.

Types of Dependent Failures

The starting point for addressing dependent failures is understanding ways in which they can arise. They include:

Common Cause Failures (CCFs): CCFs occur when two or more components or systems fail due to a single shared cause to which each component or system is vulnerable. For example, this cause can be a factor that affects multiple components such as a design flaw, manufacturing defect, human error, or external event (such as an environmental condition, e.g. flooding). CCF is a broad category that encompasses any scenario where multiple failures have a single root cause. An example of CCF is a power surge that damages multiple electronic components in a system simultaneously.

CCFs are of particular concern in redundant systems where multiple backup components are in place, under the assumption that they will fail independently. If a common cause leads to the failure of all redundant components, the system's reliability can be compromised.

A notable example of a process safety accident that involved common cause failures is the Flixborough disaster, which occurred on June 1, 1974, at a chemical plant in Flixborough, England. A massive explosion occurred at the plant that killed 28 people, seriously injured 36 others, and caused substantial damage to the plant and nearby homes. The CCF was not a failure of a physical component but rather a systemic failure in safety management and organizational culture. This failure manifested in multiple unsafe decisions and actions, leading to the catastrophic explosion.

Common Mode Failures (CMFs): CMFs are failures of multiple components in the same way, or mode, but not necessarily from a single shared cause. The failures do not necessarily result from the same event or condition but they manifest in the same or a similar manner. These failures are often due to a shared vulnerability or design flaw. For example, multiple control valves in a process plant might fail to operate for different reasons, such as corrosion or material degradation, control system error, power supply failure, or mechanical failure. The common mode of failure is the inability of the valves to regulate the flow as required. Regardless of the different causes, the outcome is the same, that is the failure of the control valves to function. This is not a CCF because the failures are not triggered by a single shared cause. Instead, they are due to varied and independent causes.

CMFs can be a CCF in which two or more components or systems fail in the same way due to a shared cause. For example, several pressure relief valves may fail to open under high pressure due to a shared design flaw. The shared cause (design flaw) leads to a similar failure mode (valves not opening). The distinction between CCFs and CMFs lies in the similarity of the failure mode in CMFs, as opposed to the broader range of failure manifestations that can occur in CCFs.

Recognizing whether a failure is likely to be a CCF or a CMF guides the development of appropriate prevention and mitigation strategies. Strategies for CCFs often focus on external protections and redundancy, whereas for CMFs, the focus might be on diversification and improving design or manufacturing quality of components.

Cascade Failures: These failures occur when the failure of one component triggers a chain of failures in interconnected components or systems, often due to direct physical or functional connections. The initial failure triggers a chain of events that leads to subsequent failures, often spreading across different components or systems. The triggered failures are not necessarily immediate or direct. For example, the failure of a cooling system that supports multiple processes will lead to overheating and subsequent failure of the processes that depend on it.

Cascade failures are characterized by the progressive and often escalating nature of the failures. They can have severe consequences, especially in complex and highly interconnected environments. The term “domino effect” is synonymous with a cascade failure. It emphasizes the interconnected and sequential nature of the failures that leads to a much larger and often more catastrophic event than the initial trigger event.

A notable example of a process safety accident that involved cascade failures is the Formosa Plastics explosion of April 23, 2005. The explosion resulted in five fatalities and several injuries. It also led to significant damage to the plant and forced the evacuation of nearby residents. A runaway reaction in a reactor set off a chain of events, starting with pressure build-up and leading to the release of flammable vapors, ignition, explosion, and subsequent fires.

Synchronous Failures: These are failures that occur at the same time, or in a closely related time frame, in multiple different components or systems, typically due to simultaneous exposure to the same external event or environmental conditions. They might or might not have the same underlying cause.

If the simultaneous timing of failures is due to a shared cause, the synchronous failure is a CCF, for example, the simultaneous failure of several electronic devices due to an electromagnetic pulse. Such synchronous failures typically occur due to simultaneous exposure to the same external conditions.

Synchronous failures can also occur independently due to unrelated causes happening to coincide in time. For example, a reactor in a process plant might overheat due to a faulty temperature controller, which triggers the shutdown of the reactor operation. At the same time, a power surge occurs that is unrelated to the reactor issue, due to a lightning strike. It causes several critical pumps used in various parts of the plant to stop operating which disrupts the flow of process chemicals. The synchronicity of the reactor overheating and the pump stoppages results in a compounded impact on the plant's operations that requires the plant to be shut down.

Synchronous failures are of concern in systems where multiple components are expected to operate independently, and their simultaneous failure can lead to significant system-level impacts.

Resource Sharing Failures: These failures occur when multiple components or systems that are reliant on a shared resource (such as power, cooling, or network connectivity), fail due to the depletion or failure of that resource. The shared resource represents a single point of failure. Resource sharing failures are a type of CCF. For example, several reaction vessels in a process plant might share a central cooling system. The cooling system could fail for various reasons, such as a pump failure or a leak in the cooling fluid circuit. All the reaction vessels would overheat at the same time, absent appropriate safeguards.

A notable example of a process safety accident that involved resource sharing failures is the 2011 Fukushima Daiichi nuclear power plant disaster in Japan that resulted in multiple nuclear reactor meltdowns, hydrogen-air explosions, and the release of radioactive materials to the environment. The power plant's reactors shared a common electric power supply system, including external power sources and backup diesel generators. The accident was triggered by a tsunami that flooded the plant, knocking out external power sources and disabling the backup diesel generators resulting in the loss of power to the reactor cooling systems, which were necessary to keep the reactors from overheating. With the loss of both primary and backup power, the cooling systems for multiple reactors failed simultaneously and the reactor cores overheated leading to core meltdowns in multiple reactors.

Simultaneous Demand Failures: These failures occur when multiple components or systems require the same critical resource at the same time due to an unexpected surge in demand, or usage that exceeds the capacity or capability of the resource. This type of failure is characterized by the inability of the shared resource to handle concurrent requests, leading to a breakdown or malfunction in one or more components of the system that relies on this resource. Simultaneous demand failures are a type of resource sharing failure.

Coupling Failures: These failures occur when two or more components or systems are interlinked in such a way that the failure of one directly leads to the failure of the others. This type of failure is characterized by a dependency created through a coupling mechanism, which can be a physical, functional, or logical link between the components or systems. The key aspect of coupling failures is that the components or systems involved do not fail independently but rather their failures are interconnected. One component or system experiences a failure which propagates to another component due to the coupling mechanism. For example, a chemical plant with several process units that rely on a shared cooling system experiences a cooling system failure. Unit 1 begins to overheat, leading to its shutdown, which increases the demand on the remaining operational units, which now have to handle more stressful process conditions. The additional stress on the units leads to their failure. The coupling mechanism is the shared cooling system.

A coupling failure can be a part of a cascade failure if the initial coupled failure triggers further failures down the line. However, not all coupling failures will result in cascade failures, especially if the system is designed with sufficient redundancies and safeguards to contain the impact of the initial failure.

Human-Induced Dependent Failures: These are dependent failures that are caused by human errors affecting multiple parts of a system. For example, operator errors might lead to a chain of incorrect responses in a control system, and design flaws might be introduced by an engineering team. Human-induced dependent failures can be considered a type of CCF, especially in contexts where the design, operation, or maintenance of a system is reliant on human input or decision-making. The human element is the common cause that leads to failures in multiple components or systems, either simultaneously or in a cascading sequence.

Not all human-induced failures are CCFs. They only become CCFs when they impact multiple components or systems due to a shared root cause. For example, in a process where a valve controls the flow of a chemical into a tank, an operator might incorrectly open the valve leading to an excess influx of chemicals into the tank resulting in its overpressurization failure. The human error (incorrect valve operation) leads to the consequence of tank failure. It is not a CCF.

Infrastructure Dependent Failures: These are failures that occur in the underlying infrastructure that supports processes likely leading to a widespread impact. For example, a piping support fails causing a section of piping to collapse onto several storage vessels and rupturing them.

Infrastructure for process plants includes process and administrative buildings, storage facilities, structural supports for process equipment, such as foundations and pipe racks, and roadways and pathways within the plant that allow for the movement of materials, personnel, and vehicles.

Environmental Stress Failures: These are failures that occur when multiple components or systems fail simultaneously due to extreme environmental stress. They differ from common cause environmental failures in that they are attributable to the intensity of the stress rather than a specific event. For example, multiple outdoor electronic control devices may fail during a heatwave owing to temperatures that exceed their operational limits.

Conclusions

Understanding dependent failures is a pre-requisite for identifying and managing them to ensure the overall reliability and safety of processes. Understanding the categories of dependent failures helps in conducting thorough risk assessments, designing systems for greater resilience, and implementing targeted mitigation strategies. It is often the interplay of these different types of dependent failures that contributes to the complexity of managing risk in large-scale and technologically advanced systems, such as modern process plants. The common thread in understanding and managing all these types of dependent failures is the need for comprehensive, systems thinking in process design, risk assessment, and process safety management.

If you would like further information, please click here.

To comment on this PT Note, click here.

You may be interested in:

Process Safety Training

Process Safety Consulting

Process Safety Certification

Process Safety Software 

Back to PT Notes