
Lessons from a disaster
At 1:20 p.m. 23rd March 2005 - An explosion at the third-largest oil refinery in the United States, the BP Texas City Refinery, leaves 15 people dead and 180 injured. When a distillation tower was unknowingly overfilled, extreme pressure resulted in the release of flammable hydrocarbon which then caused the massive explosion.
Several factors contributed to this disaster which resulted in a financial loss of US$1.5 billion.
However, the Final Investigation Report of the incident, released in March 2007 by the US Chemical Safety and Hazard Investigation Board, highlighted lapses in alarm management that were critical:
Companies the world over are looking at these findings to understand not only how best to prevent such disasters from happening on their watch, but also to reassess their entire safety and risk management approach and specifically revisit their alarm management approach and practices.
What is Alarm Management?
Alarm management is a set of procedures, practices, tools and systems that jointly ensure that the alarm system in a plant is effective throughout the life of the plant.
When operating effectively an alarm system performs these important tasks:
Alarm systems have been an intrinsic part of plant safety management for a long time.
They play a critical role in alerting operators to a change in operations at a process plant, inform operators about the nature of the change and guide operators to implement corrective action.
Why Do I Need Alarm Management?
Poor alarm management results in:
Over time process alarms become less and less functional due to:
When control systems became mainstream, they also brought down the cost of alarms; thus increasing the proliferation of such alarms. After all, engineers did not have a strong cost disincentive to configuring excessive numbers of alarms. With this excess came reduced visibility of urgent and underlying problems, increased clutter that operators had to deal with and longer response time to undertaking appropriate corrective action.
Adopting a systematic approach
To help organizations move away from the ad hoc approach of the past and adopt a more systematic and rational approach to alarm management, in 1999 the Engineering Equipment and Materials Users Association (EEMUA) released 191 ‘Alarm Systems: A Guide to Design, Management and Procurement’.
This guide has rightly become the global reference point for alarm management. Its second edition – available from June 2007 – significantly updates and builds on the first edition.
Designers and operators have much to gain from using EEMUA 191 when undertaking improvement of their existing alarm systems or launching into a new alarm management program.
To understand how best to improve an existing alarm system or introduce a new alarm management program, it is useful to approach the task using the steps outlined in the well-known Six Sigma sequence.
Define
Successful alarm management is based on a comprehensive and consistent alarm philosophy document that defines:
Measure
This is where certain plant Historians (central data repositories that gather, historise, archive and distribute plant data) can simplify the task. For example, CitectHistorian, the plant-wide reporting solution from Citect, is capable of accurately recording all alarm data and tag values at high speed. Such a tool can help engineers and operators gather and organize alarm data from across the entire site.
The historian enables pre-defined reports to be used to analyse the alarm activity based on actual events.

Figure 1 – Alarm activity report based on EEMUA guidelines. Based on EEMUA benchmarks, we are quickly able to establish if the alarm activity can be managed by the operator. e.g. Number of alarms following a major plant upset
Analyse
If gathering data from thousands of alarms appears daunting, then analysing such data to derive useful insight can be even more formidable an undertaking. Some plant Historians provide assistance with this by helping engineers and operators with the following:
Cutting through the clutter
EEMUA 191 suggests that 150 alarms per day (one every 10 minutes) presented to an operator is "very likely to be acceptable" and 300 alarms per day (an alarm every 5 minutes) is considered "manageable". In reality it is not unusual to record tens of thousands of alarms per operator per day, which makes such a system self-defeating. Identifying nuisance alarms helps to eliminate unnecessary or ineffective alarms, thus bringing the number of alarms per operator to a more manageable ratio.
To do this, clear justification for each alarm is required. An alarm's reason for being should be related to a specific problem or abnormal situation and also to a specific and defined operator response.
If there is no problem or if the alarm is not intended to elicit specific operator action, then its legitimacy should be questioned. A process indicator or alert does not automatically equate to an alarm.
Engineering Equipment & Materials Users’ Association (EEMUA) established the “de-facto” industry standard for alarm management in their publication 191 – Alarm Management Guidelines
The standards cover Alarm System Design, Management & Procurement, including benchmarks for the average number of alarms operators can comfortably handle.
Identifying root alarms or consequential alarms helps ensure that in an alarm flood, prioritization models
have been configured such that the consequential alarm does not get lost or remain unnoticed. For consequential alarm and event analysis, a Historian would compare one set of alarm data with another set of alarm data (depending on the query placed). However, what is even more useful is to be able to compare alarm data with plant/process trend data.
Improve
The analysis stage seeks to assess each alarm from the standpoint of the alarm philosophy of the organization and typically leads to certain specific improvements:
This process of alarm rationalization and system improvement is clearly a laborious, expensive and disruptive effort, but the support of robust alarm analysis can help simplify this step.
Prioritise for Action Not Effect
EEMUA Recommendation on split of total alarms by priority assignment levels
– Source EEMUA 191
Control
Successful alarm management rests on what tools are used to ensure that the KPIs set out are achieved so that gains are sustained. This also involves creation of appropriate training material for new personnel who get involved, procedures and manuals for management of change (MOC) and ongoing review of analysis findings from the Historian.
Alarm Benchmark Levels
Level |
Description |
|
1 |
Overloaded |
A continuously high rate of alarms, with rapid performance deterioration during process upsets. The alarm system is difficult to use during normal operation & is essentially ignored during plant upsets as it becomes unusable. |
2 |
Reactive |
Some improvements compared to Overloaded, but the peak alarm rate during upset is still unmanageable. The alarm system remains an unhelpful distraction to the operator for much of the time. |
3 |
Stable |
A system well defined for normal operations, but less useful during plant upsets. Compared to Reactive, there are improvements in both the average alarm and peak alarm rates. Nuisance alarms are resolved and under systematic control. Problems remain with the burst alarm rate. |
4 |
Robust |
Average & peak alarm rates are under control for foreseeable plant operating scenarios. Dynamic alarming techniques are used to improve the real-time performance. Operators have a high degree of confidence in the alarm system, and have time to read, understand and respond to all the alarms. |
5 |
Predictive |
The alarm system fully encapsulates the aspirations of the EEMUA guidelines. The alarm system is stable at all times and provides the operator with the right information at the right time. Alarms are predictive & anticipate problems before they actually occur in order to avoid process upset or minimise their impact on production. |
Source: EEMUA Publication 191 - Alarm Management Guidelines
Sharing Data
Another limitation, in most current practice, is the lack of shared insight between the plant floor and other stakeholders in the organisation.
In a “learning organization”, the fruit of analysis of the alarm system is shared with other stakeholders who are not necessarily on the plant floor.
This is also helpful when plant engineers need to keep senior management informed of progress in alarm system improvements and to justify future investments in the alarm system to senior management. Utilizing an open Historian that is accessible using industry standard tools such as Microsoft SQL Server 2005, operators, engineers and management will be dealing with an industry-standard data storage and exchange tool.
Reports can be delivered in a variety of formats (such as pdfs for regulatory reports, Excel spread sheets that allow any user to immediately extract data for further analysis or web pages that can be integrated with other business systems in the organization).
To ascertain clearly what is the extent of improvement required in an alarm system (gap analysis) or to measure improvements after a new alarm management program has been initiated, it is useful to compare the system with industry best practice.
Some alarm KPIs that could probably form the basis for such benchmarking include:
In the final analysis, successful alarm management is not about the equipment or the alarm, but about people who impact and are impacted by the alarm system:
To implement a successful alarm management program requires factoring in the different expectations and priorities as well as the differing levels of awareness and understanding among these diverse groups
of stakeholders.
Tools that can help to effectively share alarm analytics and the resulting insight across these stakeholders in a simple, relevant, meaningful and easy-to-understand format will help ensure that alarm management is
fed back the multi-level and multi-disciplinary input it requires to validate it and keep it relevant to the business objectives and the alarm philosophy of the organization. Tools that can take alarm KPIs and benchmark them against industry best practice could take alarm management to the next level and provide the organization alarm report cards that can directly result in improved productivity, profitability and safety.
Summary
Alarm Management utilises the continuous improvement program and its tools and systems which jointly ensure that the alarm system in a plant is as effective as possible throughout the life of the plant. Typically this will include: