"The latest news in the global power and energy industry..."
New Account

The Magazine

Issue 5

This is a short description of the magazine.

E-magazine
  • Previous Issues

Blog

Daniel C. Jones
Editor

A renewing of vows

Much has been written about last years shambolic UN climate change summit in Copenhagen, yet to the vast majority of the general public little is actually know about the only notable progress made during it.
01 Feb 2010

Alarm Rationalisation Key Contributor to Plant and Operator Safety

No Comments

Lessons from a disaster

At 1:20 p.m. 23rd March 2005 - An explosion at the third-largest oil refinery in the United States, the BP Texas City Refinery, leaves 15 people dead and 180 injured. When a distillation tower was unknowingly overfilled, extreme pressure resulted in the release of flammable hydrocarbon which then caused the massive explosion.

Several factors contributed to this disaster which resulted in a financial loss of US$1.5 billion.

However, the Final Investigation Report of the incident, released in March 2007 by the US Chemical Safety and Hazard Investigation Board, highlighted lapses in alarm management that were critical:

  • “The tower’s high level alarm setpoint was exceeded 65 times during the last 19 startups, with more than 50 hours of operating time with the high level alarm activated ”
  • “The redundant high level alarm (for the distillation tower) did not activate. When the tower was filled beyond the set points of both alarms in the early morning on March 23, 2005, only one alarm was activated. The high level alarm was triggered at 3:09 a.m. The redundant hardwired high level alarm never sounded”
  • “The (redundant high level) alarm’s set-point was not known to operations personnel or provided in the procedure, control data, or training materials”
  • “A functionality check of all alarms and instruments was also required prior to startup, but these checks were not completed”
  • “Tower pressure alarm set-points were frequently exceeded, yet the procedure did not address all the reasons this might happen and the steps operators should take in response”

Companies the world over are looking at these findings to understand not only how best to prevent such disasters from happening on their watch, but also to reassess their entire safety and risk management approach and specifically revisit their alarm management approach and practices.

What is Alarm Management?

Alarm management is a set of procedures, practices, tools and systems that jointly ensure that the alarm system in a plant is effective throughout the life of the plant.

When operating effectively an alarm system performs these important tasks:

  • Alerts the operator that an important change has occurred
  • Informs the operator of the nature of the change
  • Guides the operator to take appropriate corrective action
  • Provides fault data to maintenance systems

Alarm systems have been an intrinsic part of plant safety management for a long time.

They play a critical role in alerting operators to a change in operations at a process plant, inform operators about the nature of the change and guide operators to implement corrective action.

Why Do I Need Alarm Management?

Poor alarm management results in:

  • Increased downtime (when source alarms cannot cut through the clutter, then real problems are ignored for too long; resulting in process breakdowns). This translates into lost production as well as increased operator costs through overtime, and higher lifecycle cost of equipment through increased maintenance costs
  • Reduced plant productivity. When operators do not read early the signs of a developing problem, their response to alarm floods (large numbers of alarms annunciated at the time of process upset) typically takes the form of stabilizing the process through reducing the rate of throughput
  • Reduced quality (when alarm systems fail to alert operators to corrective action at the right time, offspec product has to be contended with)
  • Reduced operator effectiveness, higher operator stress levels and increased operator staffing costs
  • In the worst-case scenario, alarm related confusion can result in or aggravate serious industrial accidents
  • Increased insurance premiums on plant equipment or fines incurred by not meeting regulatory requirements

Over time process alarms become less and less functional due to:

  • Alarm proliferation
    • Introduction systems with less costly software alarms removed the incentive to limit the numbers and encouraged process engineers to create and configure excessive number of alarms
  • Absence of strict alarm configuration and prioritisation guidelines
    • An individual operator is confronted with thousands of relatively meaningless configured alarms
    • Too many alarms distract the operator and conceal the actual nature of the problem rather than alerting the operator to the real problem

When control systems became mainstream, they also brought down the cost of alarms; thus increasing the proliferation of such alarms. After all, engineers did not have a strong cost disincentive to configuring excessive numbers of alarms. With this excess came reduced visibility of urgent and underlying problems, increased clutter that operators had to deal with and longer response time to undertaking appropriate corrective action.


Adopting a systematic approach

To help organizations move away from the ad hoc approach of the past and adopt a more systematic and rational approach to alarm management, in 1999 the Engineering Equipment and Materials Users Association (EEMUA) released 191 ‘Alarm Systems: A Guide to Design, Management and Procurement’.

This guide has rightly become the global reference point for alarm management. Its second edition – available from June 2007 – significantly updates and builds on the first edition.

Designers and operators have much to gain from using EEMUA 191 when undertaking improvement of their existing alarm systems or launching into a new alarm management program.

To understand how best to improve an existing alarm system or introduce a new alarm management program, it is useful to approach the task using the steps outlined in the well-known Six Sigma sequence.

Define

Successful alarm management is based on a comprehensive and consistent alarm philosophy document that defines:

  • Business objectives to be met
  • Needs and requirements of the users of the alarm system
  • Alarm system design principles
  • Compliance parameters
  • Roles and responsibilities
  • Criteria for alarm generation, setting, prioritization and presentation
  • Management of Change (MOC) (for example, tracking authorized and unauthorized changes to alarm settings or alarm suppression or shelving)
  • Training / maintenance parameters
  • Escalation guidelines (moving from normal status mode where operators are trying to keep the process within the ‘safe envelope’ to emergency/ disaster management)

Measure

This is where certain plant Historians (central data repositories that gather, historise, archive and distribute plant data) can simplify the task. For example, CitectHistorian, the plant-wide reporting solution from Citect, is capable of accurately recording all alarm data and tag values at high speed. Such a tool can help engineers and operators gather and organize alarm data from across the entire site.

The historian enables pre-defined reports to be used to analyse the alarm activity based on actual events.

Figure 1 – Alarm activity report based on EEMUA guidelines. Based on EEMUA benchmarks, we are quickly able to establish if the alarm activity can be managed by the operator. e.g. Number of alarms following a major plant upset

Analyse

If gathering data from thousands of alarms appears daunting, then analysing such data to derive useful insight can be even more formidable an undertaking. Some plant Historians provide assistance with this by helping engineers and operators with the following:

  • Event analysis: Pulling up all alarms that occurred at a given point in time, be they basic process alarms or aggregated alarms or even critical safety-related alarms
  • Alarm and event archiving:
    • Historising all alarms and events for long term analysis
  • Alarm analysis, which includes:
    • Identifying consequential/source alarms around which other alarms are triggered
    • Identifying nuisance alarms such as stale alarms (that remain present for extended periods of time), chattering alarms (that go in and out of alarm mode in a short span of time), or duplicate alarms (that persistently occur within a short period of time of another alarm). Pareto analysis can help rank nuisance alarms by frequency; to help detect the so called “bad actors”
    • Identifying shelved alarms (temporarily suppressed) or permanently suppressed alarms (that are prevented from appearing on the operator’s screen)
    • Alarm setting analysis by the state/mode of operation of the plant

Cutting through the clutter

EEMUA 191 suggests that 150 alarms per day (one every 10 minutes) presented to an operator is "very likely to be acceptable" and 300 alarms per day (an alarm every 5 minutes) is considered "manageable". In reality it is not unusual to record tens of thousands of alarms per operator per day, which makes such a system self-defeating. Identifying nuisance alarms helps to eliminate unnecessary or ineffective alarms, thus bringing the number of alarms per operator to a more manageable ratio.

To do this, clear justification for each alarm is required. An alarm's reason for being should be related to a specific problem or abnormal situation and also to a specific and defined operator response.

If there is no problem or if the alarm is not intended to elicit specific operator action, then its legitimacy should be questioned. A process indicator or alert does not automatically equate to an alarm.

Engineering Equipment & Materials Users’ Association (EEMUA) established the “de-facto” industry standard for alarm management in their publication 191 – Alarm Management Guidelines

The standards cover Alarm System Design, Management & Procurement, including benchmarks for the average number of alarms operators can comfortably handle.


Identifying root alarms or consequential alarms helps ensure that in an alarm flood, prioritization models

have been configured such that the consequential alarm does not get lost or remain unnoticed. For consequential alarm and event analysis, a Historian would compare one set of alarm data with another set of alarm data (depending on the query placed). However, what is even more useful is to be able to compare alarm data with plant/process trend data.

Improve

The analysis stage seeks to assess each alarm from the standpoint of the alarm philosophy of the organization and typically leads to certain specific improvements:

  • Reduction in needless alarms
  • Recalibration of alarm parameters where necessary (such as action, set point, detection time etc)
  • Bringing in consistency in alarm settings where desirable
  • Prioritization of alarms where required
  • Reorganization of the presentation of alarms if needed (to ensure relevance to operator, visibility etc)

This process of alarm rationalization and system improvement is clearly a laborious, expensive and disruptive effort, but the support of robust alarm analysis can help simplify this step.

Prioritise for Action Not Effect

  • Set alarm priority based on risk. The more important an alarm is for the operator to take action the more prominent it should be
  • To help the operator decide which alarms to deal with when several occur at the same time in a disturbance
  • EEMUA recommends
    • <1% Critical
    • 05% High
    • 15% Medium
    • 80% Low

EEMUA Recommendation on split of total alarms by priority assignment levels

– Source EEMUA 191


Control

Successful alarm management rests on what tools are used to ensure that the KPIs set out are achieved so that gains are sustained. This also involves creation of appropriate training material for new personnel who get involved, procedures and manuals for management of change (MOC) and ongoing review of analysis findings from the Historian.

Alarm Benchmark Levels

Level

Description

1

Overloaded

A continuously high rate of alarms, with rapid performance deterioration during process upsets. The alarm system is difficult to use during normal operation & is essentially ignored during plant upsets as it becomes unusable.

2

Reactive

Some improvements compared to Overloaded, but the peak alarm rate during upset is still unmanageable. The alarm system remains an unhelpful distraction to the operator for much of the time.

3

Stable

A system well defined for normal operations, but less useful during plant upsets. Compared to Reactive, there are improvements in both the average alarm and peak alarm rates. Nuisance alarms are resolved and under systematic control. Problems remain with the burst alarm rate.

4

Robust

Average & peak alarm rates are under control for foreseeable plant operating scenarios. Dynamic alarming techniques are used to improve the real-time performance. Operators have a high degree of confidence in the alarm system, and have time to read, understand and respond to all the alarms.

5

Predictive

The alarm system fully encapsulates the aspirations of the EEMUA guidelines. The alarm system is stable at all times and provides the operator with the right information at the right time. Alarms are predictive & anticipate problems before they actually occur in order to avoid process upset or minimise their impact on production.

Source: EEMUA Publication 191 - Alarm Management Guidelines


Sharing Data

Another limitation, in most current practice, is the lack of shared insight between the plant floor and other stakeholders in the organisation.

In a “learning organization”, the fruit of analysis of the alarm system is shared with other stakeholders who are not necessarily on the plant floor.

This is also helpful when plant engineers need to keep senior management informed of progress in alarm system improvements and to justify future investments in the alarm system to senior management. Utilizing an open Historian that is accessible using industry standard tools such as Microsoft SQL Server 2005, operators, engineers and management will be dealing with an industry-standard data storage and exchange tool.

Reports can be delivered in a variety of formats (such as pdfs for regulatory reports, Excel spread sheets that allow any user to immediately extract data for further analysis or web pages that can be integrated with other business systems in the organization).

To ascertain clearly what is the extent of improvement required in an alarm system (gap analysis) or to measure improvements after a new alarm management program has been initiated, it is useful to compare the system with industry best practice.

Some alarm KPIs that could probably form the basis for such benchmarking include:

  • Average number of alarms per hour
  • Maximum number of alarms per hour
  • Percentage of hours where there were >30 alarms per hour
  • Operator response time

In the final analysis, successful alarm management is not about the equipment or the alarm, but about people who impact and are impacted by the alarm system:

  • Operators
  • Process and control engineers
  • Maintenance personnel
  • Shift supervisors
  • Instrument and control system technicians
  • Designers
  • Safety officers
  • Training staff
  • Senior management

To implement a successful alarm management program requires factoring in the different expectations and priorities as well as the differing levels of awareness and understanding among these diverse groups

of stakeholders.

Tools that can help to effectively share alarm analytics and the resulting insight across these stakeholders in a simple, relevant, meaningful and easy-to-understand format will help ensure that alarm management is

fed back the multi-level and multi-disciplinary input it requires to validate it and keep it relevant to the business objectives and the alarm philosophy of the organization. Tools that can take alarm KPIs and benchmark them against industry best practice could take alarm management to the next level and provide the organization alarm report cards that can directly result in improved productivity, profitability and safety.

Summary

Alarm Management utilises the continuous improvement program and its tools and systems which jointly ensure that the alarm system in a plant is as effective as possible throughout the life of the plant. Typically this will include:

  • Creation and adoption of an ‘Alarm Philosophy’ document
  • Creation and adoption of an ‘Alarm Standards’ document
  • Alarm analysis and benchmarking
  • Alarm rationalisation and prioritisation
  • Alarm philosophy enforcement
  • Correlating alarms to plant equipment and state information
  • Intelligent alarming based on alarm relationships to plant, equipment, state and process data
  • Operator training on how to effectively respond to alarms

More like this...

Disclaimer: All comments posted in a personal capacity
POST A COMMENT
In order to post a comment you need to be regsitered and signed in.
Register | Sign in
No Comments Have Been Submitted
Disclaimer: All comments posted in a personal capacity