Functional Safety

Functional safety is the part of overall system safety that depends on a system or equipment operating correctly in response to its inputs. Where intrinsic safety relies on inherently safe materials or passive barriers, functional safety relies on the active behavior of control and protection systems: a sensor detects a hazardous condition, logic decides on a response, and an actuator brings the equipment to a safe state. When such a safety function performs reliably on demand, the risk of harm is reduced to an acceptable level.

The discipline matters because electronic and programmable systems increasingly mediate between people and hazardous processes, from chemical reactors and railway signals to automobiles and industrial robots. A failure of the control system can directly cause harm, so the system itself must be engineered to a quantified standard of reliability. Functional safety provides the framework for doing this: it defines how much risk reduction a safety function must deliver, how confident one must be that it will deliver it, and what engineering rigor is required to earn that confidence. This article introduces the foundational IEC 61508 framework, the concept of safety integrity levels, the safety lifecycle, hazard and risk analysis, the role of redundancy and diagnostics, and the sector-specific standards derived from the foundation.

The IEC 61508 Framework

IEC 61508 is the foundational international standard for the functional safety of electrical, electronic, and programmable electronic safety-related systems. It is deliberately generic, intended to apply across industries and to serve as the basis for sector-specific standards. Its central idea is that safety functions can be assigned a target level of integrity, and that achieving each level demands a corresponding degree of engineering discipline across the entire life of the system.

Safety Functions and Safety-Related Systems

A safety function is a specific action that must be performed to achieve or maintain a safe state for a defined hazard. An example is shutting off fuel to a burner when a flame-failure sensor reports loss of flame. The safety-related system is the combination of sensors, logic, and actuators that carries out the safety function, together with any supporting infrastructure such as power supplies and communications. Functional safety is concerned with how dependably this combination performs its function when called upon.

The framework distinguishes the safety function from the equipment under control and its ordinary control system. The equipment under control is the machinery or process that presents the hazard; its control system manages normal operation. The safety-related system exists specifically to reduce the residual risk that remains after all other measures, and its reliability is therefore subject to quantitative requirements that ordinary control functions are not.

Random and Systematic Failures

IEC 61508 separates failures into two kinds, because they are controlled by different means. Random hardware failures arise from physical mechanisms such as component wear-out and are described probabilistically; they can be quantified with failure-rate data and reduced with redundancy and diagnostics. Systematic failures arise from errors in specification, design, or implementation, including software faults; they cannot be meaningfully assigned a failure rate and are instead controlled by the rigor of the development process.

This distinction shapes the whole standard. Quantitative targets address random hardware failures, setting a maximum tolerable probability of dangerous failure for each safety function. Process requirements address systematic failures, prescribing techniques and measures, design reviews, and verification activities whose thoroughness scales with the required integrity. A safety function meets its target only when both kinds of failure are adequately controlled, since the most reliable hardware cannot compensate for a flawed specification.

Low Demand and High Demand Modes

The standard recognizes that safety functions are called upon at very different rates, and it quantifies integrity accordingly. A low-demand function operates rarely, such as an emergency shutdown that may be needed only once in years; its integrity is expressed as the average probability of failure on demand, the chance that it fails to act when finally required. A high-demand or continuous function operates frequently or continuously, such as a function that keeps a moving machine within safe limits; its integrity is expressed as a frequency of dangerous failure per hour. The mode of operation determines which metric applies and therefore how the target is stated and verified.

Safety Integrity Levels

The safety integrity level, or SIL, is the central measure of functional safety. It expresses how much one can rely on a safety function to perform when needed, and it ties together the quantitative reliability target and the engineering rigor required to achieve it.

Definition and Scale

IEC 61508 defines four safety integrity levels, from SIL 1 to SIL 4, with SIL 4 representing the highest integrity and the greatest risk reduction. Each level corresponds to a band of tolerable dangerous-failure probability. For a low-demand function, SIL 1 permits an average probability of failure on demand between one in ten and one in one hundred, and each higher level tightens the requirement by a factor of ten, so that SIL 4 demands a probability between one in ten thousand and one in one hundred thousand. The high-demand metric follows the same ten-to-one structure expressed as failures per hour.

A higher SIL is therefore not a vague indication of importance but a specific, order-of-magnitude tighter reliability requirement. Because each step demands roughly ten times better performance, the cost and difficulty of achieving a SIL rise steeply with the level. Designers aim to specify the SIL that genuinely matches the risk, since over-specifying wastes resources while under-specifying leaves unacceptable residual risk.

Hardware and Systematic Capability

Achieving a SIL requires meeting requirements on two fronts simultaneously. The hardware must satisfy quantitative targets for dangerous-failure probability and must respect architectural constraints that limit how much can be claimed from a given redundancy structure given its diagnostic coverage. In parallel, the development must demonstrate systematic capability, meaning that the process used to specify, design, implement, and verify the system was rigorous enough to control systematic faults to the level associated with that SIL.

The architectural constraints deserve emphasis because they prevent over-claiming. A safety function does not earn a high SIL merely by calculating a favorable failure probability; the standard also limits the maximum SIL claimable from an architecture according to its fault tolerance and the fraction of failures detected by diagnostics. This guards against designs that look reliable on paper but depend on optimistic assumptions, ensuring that high-integrity claims rest on genuinely fault-tolerant, well-diagnosed structures.

The Safety Lifecycle

IEC 61508 organizes functional safety around a safety lifecycle, a structured sequence of phases spanning concept through decommissioning. The lifecycle ensures that safety is addressed systematically rather than bolted on, and it provides the verification and documentation trail that demonstrates each requirement has been met.

Hazard Analysis and Requirements

The lifecycle begins by understanding the system, its boundaries, and the hazards it can present. Hazard and risk analysis identifies the hazardous events, estimates their consequence and likelihood, and determines the risk reduction that safety functions must provide. From this analysis the safety requirements are derived: each safety function is specified together with its required safety integrity level. This specification phase is critical, because an error here propagates through every later phase, and specification errors are a leading cause of systematic failure.

Design, Realization, and Verification

With requirements established, the safety functions are designed and realized in hardware and software. The architecture is chosen to meet the hardware integrity targets, components are selected with appropriate failure data, and software is developed under the discipline appropriate to its SIL. Verification accompanies each step, confirming through review, analysis, and testing that the realization satisfies its requirements. Software in safety systems receives particular attention, with techniques and coding constraints that grow stricter as the SIL increases, because software faults are purely systematic and cannot be reduced by redundancy of identical copies.

Validation then confirms, against the original safety requirements, that the integrated system actually performs its safety functions correctly under realistic conditions. Where the verification activities check that each phase was carried out correctly, validation checks that the finished system is the right system, delivering the intended risk reduction. The distinction matters because a system can be built correctly to a flawed specification, and only validation against the requirements catches that mismatch.

Operation, Maintenance, and Modification

Functional safety does not end when a system is commissioned. Operation and maintenance procedures preserve the integrity that was designed in: proof tests periodically reveal dangerous failures that diagnostics did not detect, and maintenance restores the system after faults. The interval between proof tests directly affects the achieved probability of failure on demand, so it is part of the safety design, not merely an operational detail. Any modification reopens the lifecycle, requiring impact analysis and re-verification so that a change intended to improve the system does not silently degrade its safety integrity.

Hazard and Risk Analysis

Every functional-safety effort rests on a sound understanding of risk, because the safety requirements are derived from it. Hazard and risk analysis identifies what can go wrong, how badly, and how often, and translates that understanding into integrity targets for the safety functions.

Risk as Consequence and Likelihood

Risk combines the severity of a possible harm with the likelihood of its occurrence. A hazard with grave consequences but a remote likelihood may carry the same risk as a less severe hazard that occurs frequently. Functional safety uses this combination to decide how much risk reduction is needed: the gap between the inherent risk of the unprotected equipment and the tolerable risk defines the required performance of the safety functions. The larger that gap, the higher the integrity demanded.

Several techniques support the analysis. Hazard and operability studies systematically examine deviations from intended operation. Failure mode and effects analysis works upward from component failures to their system consequences. Fault tree analysis works downward from an undesired event to the combinations of causes that could produce it. These methods complement one another, and rigorous practice often applies more than one to cross-check the identified hazards and their causes.

Risk Graphs and Target Setting

To assign integrity targets consistently, standards offer structured methods such as risk graphs and layer-of-protection analysis. A risk graph guides the analyst through a few parameters, the severity of harm, the frequency and duration of exposure, the possibility of avoiding the hazard, and the probability of the unwanted occurrence, leading to a recommended safety integrity level. Layer-of-protection analysis instead accounts for the independent protective layers already present and calculates the additional risk reduction the safety function must supply.

These methods discipline what could otherwise be a subjective judgment. By making the parameters explicit, they ensure that similar hazards receive similar treatment and that the basis for each SIL assignment is documented and reviewable. The goal is a defensible, traceable link from the identified hazard to the integrity level of the safety function that protects against it, so that the chosen target can be justified to operators, auditors, and regulators alike.

Redundancy and Diagnostics

Meeting demanding integrity targets, especially the quantitative targets for random hardware failures, depends on architecture. Redundancy and diagnostics are the principal architectural tools, and the standard rewards their combination because each addresses a limitation of the other.

Redundant Architectures

Redundancy provides fault tolerance by duplicating elements so that the failure of one does not defeat the safety function. Architectures are commonly described by the number of channels that must function out of the number provided. A one-out-of-two arrangement performs the safety action if either channel demands it, favoring safety over availability; a two-out-of-three voting arrangement tolerates the failure of any single channel while resisting spurious trips, balancing safety with availability. The choice reflects the relative cost of a dangerous failure and of an unnecessary shutdown.

Redundancy guards effectively against random hardware failures but not against systematic ones, because identical channels share identical design faults and would fail together under the same conditions. Where systematic failures, including software faults, are a concern, diversity is used: channels are implemented differently, with different designs or technologies, so that a common flaw is unlikely to disable them simultaneously. Diversity is costly and is reserved for the highest integrity requirements, where common-cause systematic failure would otherwise dominate the risk.

Diagnostic Coverage and the Safe State

Diagnostics detect failures before they can defeat the safety function, converting otherwise dangerous undetected failures into detected ones that can be acted upon. Diagnostic coverage is the fraction of dangerous failures that the diagnostics reveal; a high coverage allows more credit to be taken for a given architecture, which is why the standard ties the maximum claimable SIL to coverage as well as to redundancy. Self-tests, plausibility checks, watchdogs, and comparison between redundant channels all contribute to coverage.

What happens after detection is equally important. A safety function is designed so that detected failures, and ideally the loss of the function itself, lead to a defined safe state, the condition in which the equipment presents no unacceptable hazard. For many processes the safe state is de-energized or shut down, achieved by a fail-safe design in which loss of power or signal causes the protective action. Designing toward a clear safe state ensures that when the system cannot guarantee correct operation, it errs on the side of safety.

Sector-Specific Standards

Because IEC 61508 is generic, individual industries adapt it into sector-specific standards that reflect their particular hazards, operating modes, and engineering cultures. These derivatives inherit the core concepts while tailoring terminology and methods to their domain.

ISO 26262 for Automotive Systems

ISO 26262 adapts functional safety to road vehicles, addressing the electrical and electronic systems whose failure could endanger occupants or other road users. In place of the SIL it defines the automotive safety integrity level, or ASIL, with four grades from ASIL A to ASIL D, the last being the most stringent. The ASIL of a given hazard is determined not by a single parameter but by combining three: the severity of potential harm, the probability of exposure to the operating situation, and the controllability of the situation by a typical driver.

This three-factor approach reflects the automotive context, where the same fault can be far more dangerous in one driving situation than another, and where a driver may sometimes intervene to avoid harm. ISO 26262 carries the lifecycle, hazard analysis, and verification principles of IEC 61508 into the specific activities of vehicle development, from the concept phase through production and field operation, including the management of safety across the supply chain that characterizes the automotive industry.

IEC 62061 for Machinery

IEC 62061 applies functional safety to the safety-related control systems of machinery, providing a route for machine builders to specify and realize electrical, electronic, and programmable safety functions in compliance with machinery safety regulations. It retains the safety integrity level as its measure of integrity and derives its methods from IEC 61508, while focusing on the demand modes and architectures typical of industrial machines, where safety functions often operate frequently to keep machinery within safe limits.

In the machinery sector IEC 62061 coexists with ISO 13849-1, which addresses the same problem through the related concept of the performance level, graded from PL a to PL e. The two scales are intentionally aligned, so that PL e corresponds to SIL 3, PL d to SIL 2, and PL c to SIL 1; machinery safety functions are therefore typically claimed only up to SIL 3 rather than the full SIL 4 of the parent standard. The two standards offer complementary paths to the same goal of safe machine control, and machine builders select the one better suited to the complexity and technology of their safety functions. ISO 13849-1 applies regardless of the control technology, covering electrical, hydraulic, pneumatic, and mechanical elements, whereas IEC 62061 is specific to electrical, electronic, and programmable systems and addresses software techniques that the technology-neutral standard does not. Both descend from the same functional-safety principles and require hazard analysis, integrity targets matched to risk, and verification that the realized control system meets those targets.

Other Sector Adaptations

Many other industries maintain their own derivatives. The process industries apply IEC 61511, which interprets IEC 61508 for safety instrumented systems in chemical plants, refineries, and similar facilities; it retains the safety integrity level and, in practice, expects safety instrumented functions above SIL 3 to be avoided by reducing the underlying risk rather than relying on an ever more reliable instrumented layer. Railways follow the CENELEC standards EN 50126, EN 50128, and EN 50129 for the safety of signaling and control systems, while nuclear, medical, and aviation domains each have functional-safety requirements suited to their consequences and regulatory regimes. Despite differing terminology, these standards share the common DNA of IEC 61508: integrity levels matched to risk, a disciplined lifecycle, and the joint control of random and systematic failures.

Summary

Functional safety is the assurance that systems reduce risk by behaving correctly in response to their inputs, bringing equipment to a safe state when a hazard arises. IEC 61508 provides the foundational framework, distinguishing the safety function from the equipment it protects and separating random hardware failures, controlled quantitatively through redundancy and diagnostics, from systematic failures, controlled through the rigor of the development process.

The safety integrity level expresses, in order-of-magnitude bands, how dependably a safety function must perform, and it is achieved only by meeting both quantitative hardware targets, subject to architectural constraints, and systematic-capability requirements on the development. The safety lifecycle organizes the work from hazard analysis and requirements through design, verification, validation, operation, and modification, with hazard and risk analysis supplying the justified link from each hazard to the integrity of the function that guards it.

Redundancy provides fault tolerance against random failures, diversity extends protection to systematic ones, and diagnostics convert dangerous undetected failures into detected failures that drive the system to a defined safe state. Sector-specific standards, including ISO 26262 for automotive systems with its ASIL scale and IEC 62061 for machinery, carry these principles into particular industries, adapting terminology and method while preserving the common goal of matching engineering rigor to the risk that must be controlled.