1-s2.0-S0950423013001393-main

lable at ScienceDirect

Journal of Loss Prevention in the Process Industries 30 (2014) 207e218

Contents lists avai

Journal of Loss Prevention in the Process Industries

journal homepage: www.elsevier .com/locate/ j lp

A dynamic alarm management strategy for chemical processtransitions

Jianfeng Zhu a, Yidan Shu a, Jinsong Zhao a,*, Fan Yang b

a State Key Laboratory of Chemical Engineering, Department of Chemical Engineering, Tsinghua University, Beijing 100084, Chinab Tsinghua National Laboratory for Information Science and Technology (TNList), Department of Automation, Tsinghua University, Beijing 100084, China

a r t i c l e i n f o

Article history:Received 21 March 2013Received in revised form15 June 2013Accepted 2 July 2013

Keywords:Dynamic alarm managementFault diagnosisTransitionAlarm floodBayesian estimation

* Corresponding authors. Tel.: þ86 10 62783109; faE-mail address: [email protected] (J. Z

0950-4230/$ e see front matter � 2013 Elsevier Ltd.http://dx.doi.org/10.1016/j.jlp.2013.07.008

a b s t r a c t

Chemical processes frequently operate upon a multitude of steady states and transitions between thesestates are inevitable. Unfortunately, transitions are exactly where alarm floods often occur. Alarm floodscause critical alarms overwhelmed and thus increase the probability of larger safety issues. Existingtechniques for the design of alarm systems mostly focus on one steady state of operation and yet cannoteffectively deal with alarm floods during transitions. In this paper, a dynamic alarm managementstrategy is proposed for controlling alarm floods during transitions of chemical processes. In this strategy,the artificial immune system-based fault diagnosis (AISFD) method and a Bayesian estimation baseddynamic alarm management (BEDAM) method are integrated. During transitions, dynamic alarm limitsobtained by the BEDAM method can control alarm floods. However, if a process fault occurs duringtransitions, a flood of alarms could still be yielded. To generate useful alarms in fault situations, anartificial immune system based on dynamic time warping (DTW) is used for fault detection and diag-nosis. Finally, in case studies, the dynamic alarm management strategy is applied to the startup stage anda throughput change transition in a pilot-scale distillation column.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Modern chemical processes are usually equipped with distrib-uted control systems (DCSs) to ensure safe operation and highproduct quality. Typically within a DCS, an alarm system is installedand maintained. In the alarm system, high/low and/or highehigh/lowelow alarms are often configured for important process vari-ables so that operators can maintain variables within their definedoperating limits, i.e., alarm thresholds, to achieve best operationperformance. When a variable moves beyond the defined operatinglimit, an alarm is triggered and the operators are notified that theremight be an abnormal event happening. Generally alarm thresh-olds are carefully determined during the commissioning of plants.Alarms with poorly assigned thresholds frequently result in falsealarms and missed alarms. This is the strategy and situation of thetraditional alarm management systems, where alarm thresholdsare configured for a single steady operation state.

Chemical processes often operate among a multitude of steadystates and transitions between these states are inevitable. Startup

x: þ86 10 62770304.hao).

All rights reserved.

and shutdown are the common forms of process transitions.Meanwhile, feedstock, throughput, or product grade changes andmaintenance operations, such as furnace decoking and absorberregeneration, can also lead to process transitions (Viswanathan &Srinivasan, 2000). When the plant undergoes a transition, mostvariables change from one steady state value to another. However,the traditional alarm system is not aware of the transition andcontinues to monitor the plant with pre-fixed configuration set-tings. Therefore, during the transition, a flood of false alarms mayoccur even if the transition process operates normally. When thetransition is over and a new steady state starts, alarm floods mayalso occur because of inadequate threshold settings and typically,state-based alarming method is used. In this paper, alarm floodsduring transitions are discussed. An alarm flood has been definedby ISA (2009) 18.2 as being 10 or more alarms raised in any 10 minperiod per operator. In the alarm flood, alarms are either turned offor ignored. What is worse is that critical alarms are overwhelmedamong false alarms and hence the probability of larger safety issueswill increase. Consequently, a dynamic alarmmanagement strategyis necessary to deal with alarm floods during transitions of chem-ical processes.

Alarm management has recently attracted a lot of attentionamong researchers. EEMUA (2007) (The Engineering Equipment

Delta:1_given name

Delta:1_surname

Delta:1_given name

Delta:1_surname

mailto:[email protected]

http://crossmark.crossref.org/dialog/?doi=10.1016/j.jlp.2013.07.008&domain=pdf

www.sciencedirect.com/science/journal/09504230

http://www.elsevier.com/locate/jlp

http://dx.doi.org/10.1016/j.jlp.2013.07.008



J. Zhu et al. / Journal of Loss Prevention in the Process Industries 30 (2014) 207e218208

and Materials Users’ Association) published a standard about alarmmanagement in 1999, entitled “Alarm Systems: A Guide to Design,Management, and Procurement”, and had its second editionreleased in 2007. Another recommended standard is ISA (2009)-18.2 e “Management of Alarm Systems for the Process Industries”.Both standards provide guidance that helps users design, imple-ment, and maintain a well performing alarm system. In Izadi, Shah,Shook, Kondaveeti, and Chen (2009), a framework based on thereceiver operating characteristics (ROC) curve was proposed tooptimally design alarm limits, filters, deadbands, and delay timers.A few solutions to reduce false and nuisance alarms was studied byIzadi, Shah, Shook, and Chen (2009). A technique was introducedfor optimal design of alarm limits by analyzing the correlationbetween process variables and alarm variables (Yang, Shah, & Xiao,2010). Two novel alarm data visualization tools, known as the highdensity alarm plot (HDAP) and the alarm similarity color map(ASCM), were presented to assess the performance of alarm sys-tems (Kondaveeti, Izadi, Shah, Black, & Chen, 2012). Event corre-lation analysis and two-layer cause-effect model were used toreduce the number of alarms (Higuchi, Yamamoto, Takai, Noda, &Nishitani, 2009; Kato, Takeda, Noda, Kikuchi, & Hirao, 2012;Takeda, Hamaguchi, Noda, Kimura, & Itoh, 2010). Evaluation ofplant alarm systems by behavior simulation using a virtual subjectwas proposed in Liu, Noda, and Nishitani (2010). A dynamic riskanalysis methodology that uses alarm databases to improve pro-cess safety and product quality was presented (Pariyani, Seider,Oktem, & Soroush, 2010, 2012a,b). In all the above literature,however, little attention has been paid to about alarmmanagementduring transitions of chemical processes.

Transition monitoring of chemical processes has been reportedin literature by many researchers. In Bhagwat, Srinivasan, andKrishnaswamy (2003), a model-based fault detection scheme wasproposed that involves decomposition of nonlinear transient sys-tems into multiple linear modeling regimes. However, it is difficultto build a suitable model for transitions of most chemical processesdue to their nonlinearities, wide operating condition changes, etc.In Sundarraman and Srinivasan (2003), a trend analysis-basedtechnique (i.e., enhanced trend analysis) for monitoring transi-tions in continuous chemical plants was utilized to abstract trendsinto semi-quantitative ones. These real-time trends were thencompared with the dictionary trends representing normal transi-tion operations. Then faults were detected according to compara-tive results. A framework for managing transitions in chemicalplants was proposed in Srinivasan, Viswanathan, Vedam, andNochur (2005). A new fault diagnosis methodology is proposedfor batch chemical processes, based on an artificial immune system(AIS) and the dynamic time warping (DTW) algorithm (Dai & Zhao,2011). In Liu and Chen (2010), the Gaussian mixture model isemployed to extract a series of operating modes from the historicalprocess data and then the local statistic T2 and its normalizedcontribution chart have been derived for detecting abnormalitiesearly and isolating faulty variables. Bayesian method has beenintroduced for multimode process monitoring (Ge & Song, 2009; Yu& Qin, 2008). Also, several techniques for fault detection anddiagnosis for multimode processes have been proposed(Deshpande & Patwardhan, 2008; Ge & Song, 2008; Ge, Yang, Song,& Wang, 2008; Srinivasan & Qian, 2006; Yu & Qin, 2009). However,these techniques of fault detection and diagnosis cannot curefundamental faults in the basic alarm system and should be seen asadd-ons which complement a basic alarm system (EEMUA, 2007).In Beebe, Ferrer, and Logerot (2013), a method of dynamic ration-alization was presented to control alarm floods between processstate changes. However, such method highly depends on operatingexperience and process knowledge to determine the detectableoperating states of each section of the plant. Meanwhile, the

method does not consider real-time transitional data and thus real-time transitional trend information is ignored. Therefore, in thispaper, a dynamic alarm management strategy is proposed in orderto deal with alarm floods happening during transitions of chemicalprocesses. When the strategy is applied to an alarm managementsystem, a dynamic alarmmanagement system (DAMS) can be built.

In the proposed dynamic alarm management strategy, theartificial immune system-based fault diagnosis (AISFD)method andthe Bayesian estimation based dynamic alarm management(BEDAM) method are integrated. The BEDAM method is the core ofthe strategy because by using the dynamic alarm limits obtained bythe BEDAM method, alarm floods during transitions can becontrolled. The AISFD method assists the BEDAM in root causeidentification of alarm flooding caused by a fault. As mentioned inEEMUA (2007), intelligent fault detection and diagnosis is forlogically processing signals to generate useful alarms in fault situ-ations and should be added to the alarm management systems.Herein, the AISFD method is used for fault detection and diagnosisduring transitional stages.

The rest of the paper is organized as follows. In Section 2, theBEDAM method is presented and the Bayesian method (Bolstad,2007, Chapter 14) is used to determine the dynamic alarm limitsof variables. The AISFD method is introduced in Section 3. Then thedynamic alarm management strategy is described in Section 4. InSection 5, such strategy is applied to the startup stage and athroughput change transition in a pilot-scale distillation process toillustrate its effectiveness, followed by concluding remarks in Sec-tion 6.

2. Bayesian estimation based dynamic alarm management(BEDAM) method

The best way to mitigate unnecessary alarms during transitionsis to reset the alarm limits dynamically. In order to dynamicallyreset the alarm limits of the process variables during transitions,Bayesian estimation can be used. According to the general designprinciple of alarm limits (EEMUA, 2007), the method of piecewiselinear segmentation is suitable to simulate dynamic alarm limits.The procedure of estimating dynamic alarm limits is illustrated inthe flowchart (Fig. 1).

The piecewise linear representation is widely used to miningtime series data and is the key to effective solutions of manyproblems. For the method of piecewise linear segmentation, a timeseries of length n is transformed into K linear segments, which arestraight lines. Because K is typically smaller than n, this represen-tation makes the storage and computation of time series data moreefficient. Next, the brief description of the piecewise linear seg-mentation method is introduced. First, anchor the left point,namely, the start point, of a potential segment, and then use theBayesian method to fit the data with increasing longer segments.When to stop increasing the length of segments is decided by thefitting error of involved time series data, namely, the squared re-siduals of the data. Then based on the fitted linear segments, dy-namic alarm limits for the variable are estimated. Details of themethod are presented as follows.

For one linear segment of a variable, the data set consists of nordered pairs of points (ti,yi) for i ¼ 1,.,n, where yi is the obser-vation at time ti and contains an error or noise. To construct aregression model in [t1,tn], a linear relationship that appears to fitthe data is decided. Typically, the equation of a linear function isdetermined by two factors: the slope b and the y-intercept a0, i.e.,y ¼ a0 þ bt. Actually, the slope and any other point on the line candetermine the line, for instance, at , the intercept of the vertical lineat t, i.e., y ¼ at þ bðt � tÞ. No matter which form is chosen, twounknown parameters need to be determined, b and either a0 or at.

Fig. 1. Procedure of estimating dynamic alarm limits.

J. Zhu et al. / Journal of Loss Prevention in the Process Industries 30 (2014) 207e218 209

To determine the best estimation of parameters, least squareregression is usually adopted. However, no inferences about the slopeand intercept are possible because there is no probability model forthe data. What is more, a priori knowledge about parameters ob-tained from historical data will not be used. Hence, the Bayesianmethod forfitting a linear relationship is introducedherein. Given thedata and tn þ 1, the distribution of the next observation yn þ 1 can bepredicted. Then consider a (1 � a) � 100% credible interval for thedistribution and the dynamic alarm limits can be determined.

2.1. Some assumptions

In order to apply the Bayesian method, some assumptions aboutthe probability model underlying the data should be made asfollows:

1 Mean assumption. The conditional mean of yi given ti is anunknown linear function of ti, i.e.

myijti ¼ at þ b�ti � t

�; (1)

where at is the unknown intercept of the vertical line t ¼ t. In thisparameterization, the least squares estimates at ¼ y and b will beindependent under the assumptions, so the likelihood will factorinto a part depending on at and a part depending on b. This greatlysimplifies the computational complexity.

2 Error assumption. Error of the each variable value at theobservation stage is normally distributed with mean 0 andvariance s2. Assume that the errors for all of observations yi areindependent and identically distributed (i.i.d.). Hence we have

yi ¼ at þ b�ti � t

�þ ei; (2)

where at is themean value for y given t ¼ t, and b is the slope. Eachnoise ei follows a normal distribution with mean 0 and variance s2.Because of independent and identical distribution of each ei, yijti isnormally distributed with mean at þ bðti � tÞ and variance s2 andyijti is independent with each other.

2.2. Bayesian method

Now assume that data of a variable in [tanchor,tn] have enteredand the next task is to fit a linear segment with Bayesian method.According to the discussion above, the linear segment is deter-mined by two parameters, the slope b and the intercept of thevertical line at t;at ; y ¼ at þ bðt � tÞ. How to estimate the twoparameters with Bayesian method is described as follows.

The joint distribution for b and at is computed by Bayesianformula, which is proportional to the joint prior probability timesthe joint likelihood, i.e.

g�at ; b

��data�ff�datajat ; b

�� g�at ; b

�(3)

where f ðdatajat ; bÞ and gðat ; bÞ are the joint likelihood distributionand the joint prior distribution, respectively, and data is the set ofordered pair (tanchor,yanchor),.,(tn,yn). Herein, anchor ¼ 1 isassumed.

2.2.1. The joint likelihood distribution for b and atAs discussed above, yijti follows a normal distribution,

i.e.,Nðat þ bðti � tÞ; s2Þ. For the ith observation (ti,yi), the likelihooddistribution is

f�ti; yijat ; b

�fe�

12s2½yi�ðatþbðti�tÞÞ�2 (4)

Since the independence of every observation in the data isassumed, the likelihood distribution of the data is described as

f�datajat ; b

�f

Yni¼1

e�12s2

�yi �

�at þ b

�ti � t

��2

fe� 12s2=SSx

hb� SSxy

SSx

i2� e

� 12s2=n

�at � y

�2 (5)

where SSy ¼ Pni¼1 ðyi � yÞ2; SSxy ¼ Pn

i¼1ðyi � yÞðti � tÞ ; andSSx ¼ Pn

i¼1 ðti � tÞ2. Note that SSxy/SSx ¼ B, the least squares slope,and y ¼ At , the least squares estimate of the intercept of the ver-tical line t ¼ t. The joint likelihood distribution can be factored intothe product of two individual likelihoods


f�datajat ; b

�ff ðdatajbÞ � f

�datajat

�� 1 ½b� B�2 � 1 ½a �A �2 (6)

fe 2s2=SSx � e 2s2=n t t

where we can recognize that the likelihood of the slope b is nor-mally distributed with mean B, the least squares slope, and simi-larly, the likelihood of at is normally distributedwithmean At , sincetwo individual likelihood, f(datajb) and f ðdatajatÞ, are independentas discussed above. During Bayesian estimation, if a priori knowl-edge cannot be provided, results of the slope b and the intercept atcomputed by Bayesian estimation are the same values with leastsquares regression.

2.2.2. The joint prior distribution for b and at e the sliding windowalgorithm

Next we discuss the joint prior distribution for b and at . Assumethat the independent priors are used for each parameter, that is

g�at ;b

�fgðbÞ � g

�at�

(7)

where gðat ; bÞ denotes the joint prior distribution, and g(b) andgðatÞ are the prior distributions for the slope b and the interceptat , respectively. Prior information about b and at are obtainedfrom historical data. A sliding window algorithm can be intro-duced to extract piecewise linear segments characteristics oftransitions (Keogh, Chu, Hart, & Pazzani, 2001). Then the trendinformation is applied as a priori knowledge. The reader can referto Keogh et al. (2001) for more details of the sliding windowalgorithm.

Since for one transition of each variable, a set of data time seriesis stored in the historical database, the sliding window algorithm isused and yields one piecewise linear sequence corresponding toone time series. Then considering all piecewise linear sequences,the prior distributions of every linear segment parameters, b and at ,are calculated. The prior distribution is assumed to follow a normaldistribution, which is estimated bymaximum likelihood estimation(MLE), that is

bwN�mb;

�sb�2�

and

atwN�mat

;�sat

�2�where mb and (sb)2 are the prior mean and variance for b, respec-tively; and similarly,mat

and ðsatÞ2 are the prior mean and variance

for at, respectively. Herein, the offline stage result for one transitionof one variable consists of K linear segments and is represented by a7-tuple of length K

nVtag; Ttag; LSi;N

�mb;

�sb�2�

;N�mat

;�sat

�2�;

tstart; tendo; 0 < i � K

where Vtag and Ttag are the representations of the variable and thetransition, respectively; LSi, tstart and tend denote the ith segment ofthe sequence, the start time and the end time of the segment,respectively; and N(mb,(sb)2) and Nðmat

; ðsatÞ2Þ are discussed above.

Then the results will be stored and be used to predict the dynamicalarm limits for transitions through Bayesian method as the priorinformation.

2.2.3. The joint posterior distribution for b and atNow the joint posterior distribution for b and at ; gðat ;b

��dataÞ,will be calculated. According to Eq. (3) discussed earlier, the jointposterior distribution for b and at is

g�at ; b

��data�ff�datajat ; b

�� g�at ; b

�f�f ðdatajbÞ � f

�datajat

�� gðbÞ � g

�at��

f½f ðdatajbÞ � gðbÞ� � �f�datajat

�� g�at��

fgðbjdataÞ � g�at��data�

(8)

where g(bjdata) and gðat��dataÞ are the marginal posterior distri-

butions for b and at respectively. Consider the joint prior distri-bution and the joint likelihood distribution mentioned above, andthe normal posterior distributions for b and at , are obtained, whichare

bwN�m0

b;�s0b�2�

and

atwN�m0

at;�s0at

�2�where

1�s0b

�2 ¼ 1s2b

þ SSxs2

(9)

and

m0b ¼

�s0b�2

s2b

mb þSSxs2

�s0b�2B (10)

and similarly

1�s0at

�2 ¼ 1s2at

þ ns2

(11)

and

m0at

¼�s0at

�2s2at

matþ ns2

�s0at

�2At (12)

Next the fitting error, that is, the squared residuals, of the data in[tanchor,tn] for the potential segment is calculated as follows:

SSerror ¼Xni¼1

nyi �

hm0

atþm0

b

�ti � t

�io2(13)

wherem0atandm0

bare the mean value of the posterior distributions

for b and at respectively. If SSerror is greater than the user-specifiedthreshold max_error, data in [tanchor,tn] does not fit a good linearrelationship as defined here and a linear segment is finished in[tanchor,tn � 1]. Then with the new data entering, set tanchor ¼ n andstart the next linear segment. On the contrary, if SSerror is less thanor equal to max_error, it is time to determine the dynamic alarmlimits for tn þ 1. A reasonable value of max_error should be deter-mined by a good tradeoff between compression and fidelity.Although the value of the threshold is subjective, historical processdata could help to determine a better value.


2.3. The dynamic alarm limits for variables

Now the procedure of estimating the dynamic alarm limits fortn þ 1 will be described. First, the predictive distribution to estimatethe value of the variable at the next time instant, tn þ 1, conditionalon the observed data is based on integrating the productpðynþ1;at ; b

��dataÞ over all the values of at and b

pðynþ1jdataÞ ¼ RRp�ynþ1;at ;b

��data�datdb¼ RR

p�ynþ1jat ; b;data

�g�at ; b

��data�datdbf

Ze�

12s2ðynþ1 � mnþ1Þ2e

� 12ðs0mÞ2ð

mnþ1 �m0m

�2dmnþ1

f

Ze� 12s2ðs0mÞ2=ðs2þðs0mÞ2Þ

mnþ1 � ynþ1ðs0mÞ2þm0

ms2

s2þðs0mÞ22

�e� 12ðs2þðs0mÞ2Þð

ynþ1�m0mÞ2

dmnþ1

fe� 12ðs2þðs0mÞ2Þ

�ynþ1 �m0

m

�2

(14)

where

mnþ1 ¼ at þ b�tnþ1 � t

�(15)

m0m ¼ m0

atþ �

tnþ1 � t�m0

b (16)

and

�s0m�2 ¼ �

s0at

�2 þ �tnþ1 � t

�2�s0b�2 (17)

As we see, this is a normal distributionwith meanm0y ¼ m0

m andvariance ðs0yÞ2 ¼ ðs0mÞ2 þ s2.

Once the predictive distribution is calculated, a (1 � a) � 100%credible interval for the prediction will be found, which builds thedynamic alarm limits. Because of the unknown noise variance s2,the estimated variance is used, which is obtained from the re-siduals. The credible interval is given by

m0y � Ta

2� s0y ¼ m0

m � Ta2�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�s0m�2 þ bs2

r(18)

where Ta/2 is Student’s t distributionwith n � 2 degrees of freedomand bs2 is the estimate of the variance, which is

bs2 ¼Pn

i¼1�yi �

�At þ B

�ti � t

��2n� 2

(19)

Thus dynamic alarm limits are determined. For a typical processvariable, 95% bounds on the prediction are reasonable and alarmlimit intervals between high and low alarm limit are configured asðm0

y � T0:025 � s0y;m0y þ T0:025 � s0yÞ.

If variablesmove beyond the dynamic alarm limits, alarms occurand the operator is notified that there might be an abnormality.

3. Artificial immune system-based fault diagnosis (AISFD)method

Now we briefly describe the AISFD method, namely, the DTW-based artificial immune system proposed for the fault detectionand diagnosis (Dai & Zhao, 2011).

The artificial immune system (AIS) is developed from immu-nology and applied to engineering fields, such as fault diagnosis and

computer security (Timmis, Andrews, Owens, & Clark, 2008).Considering the characteristic of transitions of chemical processes,the new fault diagnosis approach proposed for batch chemical pro-cesses by Dai and Zhao is suitable for fault detection at the onlinestage. In this approach, antibodies are generated by using historicaldata and antigens are generated by online real-time data and thesystem will detect faults by calculating the affinity of the antigeneantibody binding. Both antigens and antibodies are represented bymatrices of time-sampled data, instead of vectors of data. Beingdifferent from the traditional AIS, the dynamic time warping (DTW)is introduced to calculate the difference between the antibody andantigen in the new approach. In the system initialization, the anti-body libraries are constructed. Then as the online real-time data areintroduced to the system, fault detection can start. The last l samplesbefore the current time will compose an antigen, as

Agl�n ¼ ½Agð1Þ;Agð2Þ; .; AgðkÞ; .; AgðnÞ�

where Ag(k) indicates the data of variable k, and l is usually aninteger taken from 10 to 15. The differences between the antigenand all the antibodies in the normal antibody library are calculatedby using DTW, which are denoted by h0 ¼ [h0(1),h0(2),.,h0(n)],where h0(i) represents the difference between the antigen andantibody i. If min(h0) > dnormal, where dnormal indicates thethreshold of the normal antibody library, then the fault detectionand diagnosis systemwill report a fault and fault diagnosis phase isthe next job. Then intelligent alarms occur and help the BEDAMmethod deal with fault situations in the DAMS. On the other hand, ifany of difference is less than the threshold of the normal antibodylibrary, no fault is detected by the AISFD method.

In the dynamic alarmmanagement strategy, if no alarms occur, anormal operation transition is inferred. Through fault diagnosis andredesign of alarm limits during transitions, alarm floods can becontrolled and the plant is operated in a safer environment.

4. The dynamic alarm management strategy

The proposed dynamic alarm management strategy is shown inFig. 2. In this strategy, the artificial immune system-based faultdiagnosis (AISFD) method and the Bayesian estimation based dy-namic alarm management (BEDAM) method are integrated.

Before the dynamic alarm management strategy is deployed on-line, system initialization has to be completed offline. A priori infor-mation for the BEDAMmethod has to be extracted. To obtain a prioriinformation, a sliding window algorithm is used, which has beendescribed in Section 2.2. The AISFD system initialization is alsocompleted at the offline stage and the antibody library for AISFD isconstructed.

Once the system initialization is completed, the strategy can beimplemented online. During a transition, real-time process data arecollected into the alarm management system with a constantsampling interval. Dynamic alarm limits are continuously deter-mined by the Bayesian estimationmethod described in Section 2. Inthis paper, the method of piecewise linear segmentation is chosento estimate dynamic alarm limits. With the dynamic alarm limits,unnecessary alarms can be mitigated. If the value of a processvariable exceeds its dynamic alarm limits, an alarm will be raisedand displayed to the operators. Parallel to BEDAM, AISFD runssimultaneously for detecting and diagnosing any process fault.When a process fault occurs, many process variables may exceedtheir dynamic alarm limits and alarm flooding may show up. Thekey role of AISFD is the root cause identification of the floodingalarms. Once the fault is diagnosed by the AISFD, intelligent alarmswill be sent to operators. Case studies of the dynamic alarm man-agement strategy are shown as follows.

Fig. 2. Framework of the proposed dynamic alarm management strategy.


5. Case studies

5.1. Case 1: startup stage of a pilot-scale distillation column

A pilot-scale distillation process used in this work is designedespecially for fault diagnosis and alarm management. The distilla-tion column is used to distill an ethanolewater mixture; it is of2.2 m height and 75 mm width and has 15 trays, where the feedenters at tray 12 from top of the column. A DCS is used to ensure thestable operation of the distillation process and gather the onlinedata. In the process, two proportional integral differential (PID)controllers are configured to control the reflux drum level and thecolumn sump level. The flow of feed, reflux and head product areusing metering pumps, so all these rates can be fixed or controlledand the values can be transmitted to the DCS. 21 different variablesare measured at 1-s intervals, including 13 temperatures, 2 pres-sures, 4 flow rates, and 2 liquid levels. The process flow diagram(PFD) of the pilot-scale distillation process is shown in Fig. 3 with allthe measured variables labeled.

The transition of startup for this distillation is discussed below.Cold startup of the distillation process should be performedfollowing the standard operating procedure (SOP) as shown inTable 1. The startup normally takes 60e70 min.

According to the operation procedure, the process of startup canbe divided into three transitions, which are identified by key vari-ables. In order to improve computational performance, it isimportant to avoid redundancy in the choice of key variables, thatis, select only variables that are independent of each another. Thetransitions during the startup stage are listed in Table 2 and shownin Fig. 4 along with profiles of four key variables, which are thecolumn sump level, LIC_101, temperature of column stage 3 fromthe top, TI_102, the reflux drum level, LIC_102, and the reboiler heatpower, respectively. The labels T1eT3 represent three transitionsduring the startup.

The dynamic alarmmanagement strategy is applied to the alarmsystem and a dynamic alarm management system (DAMS) is built.Comparison between the DAMS and traditional alarmmanagementsystem for startup stage will be shown as follows.

First, system initialization is completed offline. In this study,historical data of ten normal startup samples were gathered. At the

offline stage, prior information for the Bayesian method, namely,the trend information of each transition of each variable, is ob-tained and system initialization for the AISFD method, namely, theantibody libraries for fault detection, is completed. The trend in-formation is identified by piecewise linear segments, which arerepresented by a 7-tuple of vectors where parameters, includingthe slope and the intercept of the vertical line of each segment, arecalculated and stored in the knowledge database. Now take thevariable of the column sump level LIC_101 for example. Use thesliding window algorithm to extract piecewise linear segments ofLIC_101. The value of the tuning parameter is specified as:max_error ¼ 0.01. Then the characteristics of segments are listed inTable 3 and the comparison of historical and fitted data is shown inFig. 5. As seen from Table 3, all information of transitions can befound. For instance, the second transition T2 contains one linearsegment LS1. The slope b and the intercept at , which are the linearsegment LS1 parameters, follow normal distributionsN(�0.0003,(0.0001)2) and N(27.8,(0.6)2), respectively. The averagestart and end time of the transition LS1 are at t ¼ 755 s andt ¼ 1730 s, respectively. The information will be used for priordistributions of the BEDAMmethod presented to estimate dynamicalarm limits in the online stage. As for the preparation of the AISFDmethod, six fault samples were taken to generate the antibody li-braries and initialize the system. Six fault samples were of threefault types, with different introduced times and fault magnitudes,including reboiler heater off, cooling water rate decreasing, andhead valve V-20 off. Therefore, three first safe level alarms aredefined corresponding to three fault types.

Now test the performance of the DAMS and the traditionalalarm management system at the online stage. Two situations,normal and abnormal startup processes, are introduced. In thefollowing part, we discuss alarm floods control during the normalstartup process. In the traditional alarm management system,alarm limits are configured for only one normal state so that alarmsstill occur in the normal startup process. Then the traditional alarmmanagement system arises too many false and nuisance alarms,which lead to alarm floods, while the DAMS is thus built to controlalarm floods during transitions including the startup process.Configured with the DAMS, plots of dynamic alarm limits ofreboiler temperature and reflux temperature during the startup are

Fig. 3. Process flow diagram (PFD) of the pilot-scale distillation process.

Table 1Standard operating procedure for startup of the distillation process.

Step Operation

1 Fill the sump with ethanolewater 30% v/v mixturein normal pressure and temperature usingfull-speed feed pump P201.

2 When the column sump level reaches 27.5 cm,stop the full-speed feed pump and start reboilerheater with 100% power.

3 When TI-102 reaches 70 �C, change the reboilerheater power to 60% and active reflux controller tocontrol the reflux drum level at 4 cm.

4 When the reflux drum level and reflux rate areconstant, start the metering pump of feed, setthe feed rate to be 10 L/h and start the meteringpump of head product, and set the reflux ratio to be 2.

5 When the column sump level get higher than25 cm, open reboiler controller to pump bottomresidue, and set the control level to 25 cm.

6 Wait for all the variables to stabilize.


shown in Figs. 6 and 7, respectively. In the traditional alarm man-agement system, the high and low alarm limits of reboiler tem-perature are configured with 90 �C and 84 �C, respectively, justtaking account of the steady state after the startup. However, asshown in Fig. 8 during the startup, the value of reboiler tempera-ture is out of the normal range and thus low alarms occur. Obvi-ously, the startup process is in a normal situation and thus lowalarms are thought to be false alarms. Consider all the variablesconfigured with alarm limits and it is inevitable that alarm floodshappen. When the dynamic alarm management strategy is

Table 2Description of transitions during the startup stage.

Tag Description

T1 Start full-speed feed pump until the columnsump level reaches 27.5 cm

T2 Start reboiler heater with 100% power untilTI-102 reaches 70 �C

T3 Change the reboiler heater power to 60% untilthe column sump level stabilizes

Fig. 4. Four key variables during the startup stage.

Fig. 5. Historical and fitted data of variable LIC_101 using the sliding windowalgorithm.


introduced to the startup process, the BEDAM and AISFD methodsstart working. Dynamic alarm limits are determined as shown inFig. 6 with piecewise linear segments. From Fig. 6, the real-timedata of the startup are in the range of dynamic alarm limits allthe startup time and no alarms occur. Therefore, false alarms areeliminated and the real critical alarms can be annunciated to op-erators without unnecessary alarms overwhelmed, that is, alarmfloods are controlled. The same results can be concluded for refluxtemperature as shown in Fig. 7.

Next, the average alarm rate, which is a key performance indi-cator (KPI) to measure the performance of alarm systems, iscalculated during the startup stage. Defined by EEMUA (2007), theaverage alarm rate is expressed as the average number of alarmsper 10 min period. The comparison of the average alarm rate be-tween the traditional and dynamic alarm management system isshown in Fig. 8. The red and green colors represent the traditionaland dynamic alarm management system, respectively. The bargraph indicates the number of alarms per 10 min period along thestartup stage and the line chart indicates the sum number of alarmsoccurring. From Fig. 8, we can see that with the traditional alarmmanagement system, a flood of 33 alarms would have occurred inthe first 10 min of the startup. Within the same period, the greenbar shows the actual alarm rate. The same situation happens in thethird 10 min period, which is the start of the transition 3, T3.Through the whole normal startup process, with the dynamicalarm management system, the alarm rate is about 2.1, much lessthan 10 alarms per 10 min period. According to the definition of ISA(2009), an alarm flood is a condition during which the alarm rate isgreater than the operator can effectively manage, typically e.g.,

Table 3Piecewise linear segments characteristics for transitions of LIC_101.

Variable Transition Linear segment Distribution of the slope, b Distribu

LIC_101 T1 LS1 N(0.2985, (0.0012)2) N(13.7,LS2 N(0.0038, (0.0005)2) N(27.6,

T2 LS1 N(-0.0003, (0.0001)2) N(27.8,T3 LS1 N(-0.0047, (0.0008)2) N(25.7,

LS2 N(0.0006, (0.0001)2) N(24.9,

more than 10 alarms per 10min. Therefore, from Fig. 8, alarm floodsoccur with the traditional alarm management system while theDAMS could control alarm floods during the normal startup stage.

Here we discuss alarm floods control during the startup stagewith abnormalities. Consider two scenarios of alarms occurring. Inscenario 1, a fault, which can be detected by the AISFD method, isintroduced and an intelligent alarm occurs. In scenario 2, nointelligent alarms occur but alarms arise due to inappropriatecontroller parameters.

tion of the intercept, at Start of linear segment/s End of linear segment/s

(1.7)2) 510 610(1.0)2) 610 755(0.6)2) 755 1730(1.5)2) 1730 2360(0.6)2) 2360 3300

Fig. 6. Plots of dynamic alarm limits of reboiler temperature during the startup stage.

Fig. 7. Plots of dynamic alarm limits of reflux temperature during the startup stage.


5.1.1. Scenario 1: reboiler heat offAccording to the startup procedure, when temperature of col-

umn stage 3 from the top, TI_102, reaches 70 �C, change thereboiler heater power to 60% until the column sump level stabi-lizes. However, reboiler heat power may be off in the transition T3and a fault happens. In this study, a fault of reboiler heat off is

Fig. 8. The comparison of the average alarm rate between the traditiona

introduced into the startup at the 18th minute and then the per-formance of the dynamic alarm management system is verified.Due to reboiler heat off, the reboiler temperature drops andgradually the temperatures of column stages drop. Plots of thehistorical normal and real-time data of reboiler and reflux tem-perature are shown in Fig. 9. There is a fault detected att ¼ 18.2 min by the AISFD method. An intelligent alarm occurs andinforms the operator that there might be a fault happening. Ofcourse, temperatures of column stages also move beyond thedynamic alarm low limit and a flood of alarms occurs. However,with intelligent alarms offered to operators, alarm floods can besolved when some process faults, which are defined in faultdetection and diagnosis system, are introduced.

5.1.2. Scenario 2: inappropriate controller parametersIn scenario 2, consider the PID controller of the reflux sump

level. In the third step of the operation procedure, when temper-ature of column stage 3 from the top, TI_102, reaches 70 �C, refluxcontroller should open to control the reflux drum level at 4 cm. Ifcontroller parameters are configured inappropriately by the oper-ator, deviations therefore occur in the reflux drum level and in-fluence the startup stage. Plots of the historical normal and real-time value of reflux drum level are shown in Fig. 10a and the dy-namic alarm limits for real-time data are illustrated in Fig. 10b. It’saround 1900 s when reflux controller opens. From the result of faultdetection and diagnosis, no intelligent alarms occur, that is, no faultis detected by the AISFD method. However, the real-time value ofreflux drum level is beyond the dynamic alarm limit determined byBEDAM method which combines real-time data with prior infor-mation obtained in the offline stage. Then alarms occur at t¼ 1950 sand t ¼ 2160 s, and the operator’s attention is thus paid to locateand solve the root cause, i.e., inappropriate controller parameters.

Both scenarios indicate that the BEDAM and AISFD methods,need to be integrated and meanwhile the techniques make thealarm management system more effective, that is, alarms are onlyarising for true abnormal situations.

l and dynamic alarm management system for normal startup stage.

Fig. 9. Plots of the historical normal and real-time data of reboiler and reflux temperature.


5.2. Case 2: a throughput change transition in the pilot-scaledistillation column

This case study considers a throughput change transition in thepilot-scale distillation column discussed above. Assume that the col-umn is started-up and is in a steady statewith a feed volumeflow rateof 0.01m3/h. After a short period of steady state operation, a feedwithvolumeflowrate of 0.03m3/h is introduced.Operation continuesuntila new steady state is achieved. Then change the feed volumeflow rateagain with 0.01 m3/h until the steady state is reached.

According to the throughput change process, two transitions needto be monitored. The first transition T1 is the feed volume flow ratechange from0.01m3/h to 0.03m3/h and the second one T2 is the feedvolume flow rate change from 0.03 m3/h to 0.01 m3/h. In order to

Fig. 10. (a) Plots of the historical normal and real-time value of reflux drum leve

acquire historical data, throughput change transitions operate forfivetimes. Then offline stage analysis is carried out and prior informationof the BEDAM method is stored. For the system initialization of theAISFD method, the same faults with startup stage are defined.

Then a new throughput change transition starts and ismonitoredby the DAMS. The comparison of the average alarm rate between thetraditional and dynamic alarm management system for thethroughput change transition is shown in Fig. 11. The red and greencolors represent the same meaning as the graph of the startup stageabove. Note that in the traditional alarm management system alarmlimits are configured for the steady statewith a feed volumeflow rateof 0.01 m3/h. Therefore, when the transition T1 starts, many variablesexceed alarm limits and a flood of alarms occur but nearly all of themare unnecessary and false. However, with dynamic alarm limits in the

l. (b) The dynamic alarm limits for real-time data using the BEDAM method.

Fig. 11. The comparison of the average alarm rate between the traditional and dynamic alarm management system for the throughput change transition.


DAMS, unnecessary and false alarms are eliminated. Thus similar tothe analysis in the startup stage, the dynamic alarm managementstrategy deals with the problem of alarm floods during thethroughput change transition operation successfully.

6. Conclusions

Alarm floods are still an unsolved problem and impact plant safeoperation performance severely. Unfortunately, transitions of chemi-cal processes such as startup or throughput change transitions areexactlywheremost alarmfloodsoccur.However, the traditional alarmmanagement system is configured for a single steady state and cannothandle the problem of alarm flooding during transitions. Therefore, itis necessary to propose a new alarm management strategy for tran-sitions. In this paper, the proposed dynamic alarm managementstrategy utilizes dynamic alarm limits instead of static upper boundand lower boundvalues. Through the case studies, it can be found thatdynamic alarm limits obtained by the Bayesian estimation methodcan effectively mitigate alarm floods during transitions. However,when faultsoccur, BEDAMmaystill undergoalarmfloods. TheAISFD isintegrated with BEDAM in this paper to complement it by identifyingthe root cause of the alarm floods and providing the operators withintelligent decision making support information. Comparison ofaverage alarm rate between the traditional alarm management sys-temand theDAMSbuilt in thispaper indicates that thedynamicalarmmanagement strategy can deal with alarm floods happening in thechemical process transitions successfully.

Acknowledgment

The authors gratefully acknowledge financial support from theNational High Technology Research and Development Program ofChina (863 Program, Grant No. 2013AA040702) and Tsinghua Na-tional Laboratory for Information Science and Technology (TNList)Cross-discipline Foundation.

References

Beebe, D., Ferrer, S., & Logerot, D. (2013). The connection of peak alarm rates to plantincidents and what you can do to minimize. Process Safety Progress, 32, 72e77.

Bhagwat, A., Srinivasan, R., & Krishnaswamy, P. R. (2003). Multi-linear model-basedfault detection during process transitions. Chemical Engineering Science, 58,1649e1670.

Bolstad, W. M. (2007). Introduction to Bayesian statistics (2nd ed.). Hoboken, US:Wiley.

Dai, Y., & Zhao, J. (2011). Fault diagnosis of batch chemical processes using a dy-namic time warping (DTW)-based artificial immune system. Industrial & Engi-neering Chemistry Research, 50, 4534e4544.

Deshpande, A. P., & Patwardhan, S. C. (2008). Online fault diagnosis in nonlinearsystems using the multiple operating regime approach. Industrial & EngineeringChemistry Research, 47, 6711e6726.

EEMUA. (2007). Alarm systems: A guide to design, management and procurement.EEMUA Publication No. 191 (2nd ed.). London: Engineering Equipment andMaterials Users’ Association.

Ge, Z., & Song, Z. (2008). Online monitoring of nonlinear multiple mode processesbased on adaptive local model approach. Control Engineering Practice, 16, 1427e1437.

Ge, Z., & Song, Z. (2009). Multimode process monitoring based on Bayesian method.Journal of Chemometrics, 23, 636e650.

Ge, Z., Yang, C., Song, Z., & Wang, H. (2008). Robust online monitoring for multi-mode processes based on nonlinear external analysis. Industrial & EngineeringChemistry Research, 47, 4775e4783.

Higuchi, F., Yamamoto, I., Takai, T., Noda, M., & Nishitani, H. (2009). Use of eventcorrelation analysis to reduce number of alarms. Computer Aided Chemical En-gineering, 27, 1521e1526.

ISA. (2009).Management of alarm systems for the process industries. Technical ReportANSI/ISA-18.2-2009. 67 Alexander Drive, P.O. Box 12277, Research TrianglePark, North Carolina 27709: International Society of Automation, ISA.

Izadi, I., Shah, S. L., Shook, D. S., & Chen, T. (2009). An introduction to alarm analysisand design. In Proceedings of the 7th IFAC symposium on fault detection, super-vision and safety of technical processes, Barcelona, Spain, June 30eJuly 3 (pp.645e650).

Izadi, I., Shah, S. L., Shook, D. S., Kondaveeti, S. R., & Chen, T. (2009). A framework foroptimal design of alarm systems. In Proceedings of the 7th IFAC symposium onfault detection, supervision and safety of technical processes, Barcelona, Spain, June30eJuly 3 (pp. 651e656).

Kato, M., Takeda, K., Noda, M., Kikuchi, Y., & Hirao, M. (2012). Design method ofalarm system for identifying possible malfunctions in a plant based on cause-effect model. Computer Aided Chemical Engineering, 31, 285e289.

Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for segmentingtime series. In Proceedings of the 2001 IEEE international conference on datamining, San Jose, CA, USA, Nov 29eDec 02 (pp. 289e296).

http://refhub.elsevier.com/S0950-4230(13)00139-3/sref1










































Kondaveeti, S. R., Izadi, I., Shah, S. L., Black, T., & Chen, T. (2012). Graphical tools forroutine assessment of industrial alarm systems. Computers & Chemical Engi-neering, 46, 39e47.

Liu, J., & Chen, D. (2010). Nonstationary fault detection and diagnosis for multimodeprocesses. AIChE Journal, 56, 207e219.

Liu, X., Noda, M., & Nishitani, H. (2010). Evaluation of plant alarm systems bybehavior simulation using a virtual subject. Computers & Chemical Engineering,34, 374e386.

Pariyani, A., Seider, W. D., Oktem, U. G., & Soroush, M. (2010). Incidents investigationand dynamic analysis of large alarm databases in chemical plants: a fluidized-catalytic-cracking unit case study. Industrial & Engineering Chemistry Research,49, 8062e8079.

Pariyani, A., Seider, W. D., Oktem, U. G., & Soroush, M. (2012a). Dynamic risk analysisusing alarm databases to improve process safety and product quality: PartIdData compaction. AIChE Journal, 58, 812e825.

Pariyani, A., Seider, W. D., Oktem, U. G., & Soroush, M. (2012b). Dynamic riskanalysis using alarm databases to improve process safety and product quality:Part IIdBayesian analysis. AIChE Journal, 58, 826e841.

Srinivasan, R., & Qian, M. (2006). Online fault diagnosis and state identificationduring process transitions using dynamic locus analysis. Chemical EngineeringScience, 61, 6109e6132.

Srinivasan, R., Viswanathan, P., Vedam, H., & Nochur, A. (2005). A framework formanaging transitions in chemical plants. Computers & Chemical Engineering, 29,305e322.

Sundarraman, A., & Srinivasan, R. (2003). Monitoring transitions in chemical plantsusing enhanced trend analysis. Computers&Chemical Engineering, 27,1455e1472.

Takeda,K., Hamaguchi, T.,Noda,M., Kimura,N., & Itoh, T. (2010). Use of two-layer cause-effect model to select source of signal in plant alarm system. In Proceedings of the14th international conference on knowledge-based and intelligent information andengineering systems: Part II, Cardiff, Wales, UK, September 8e10 (pp. 381e388).

Timmis, J., Andrews, P., Owens, N., & Clark, E. (2008). An interdisciplinaryperspective on artificial immune systems. Evolutionary Intelligence, 1, 5e26.

Viswanathan, P.K., & Srinivasan, R. (2000). A supervisory algorithm for onlineidentification of operating modes and transitions in process plants. In Presentedat the AIChE annual meeting, Los Angeles, USA, November 2000.

Yang, F., Shah, S. L., & Xiao D. (2010). Correlation analysis of alarm data and alarmlimit design for industrial processes. In Proceedings of the 2010 American controlconference, Baltimore, MD, USA, June 30eJuly 02 (pp. 5850e5855).

Yu, J., & Qin, S. J. (2008). Multimode process monitoring with Bayesian inference-based finite Gaussian mixture models. AIChE Journal, 54, 1811e1829.

Yu, J., & Qin, S. J. (2009). Multiway Gaussian mixture model based multiphase batchprocess monitoring. Industrial & Engineering Chemistry Research, 48, 8585e8594.















































1-s2.0-S0950423013001393-main

Documents

Transcript of 1-s2.0-S0950423013001393-main