Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault...

Eh

Ka

b

a

ARRAA

KPSBC

1

aamstcrsitiTa

csaB

i1

0d

Computers and Chemical Engineering 35 (2011) 342–355

Contents lists available at ScienceDirect

Computers and Chemical Engineering

journa l homepage: www.e lsev ier .com/ locate /compchemeng

valuation of decision fusion strategies for effective collaboration amongeterogeneous fault diagnostic methods

aushik Ghosha, Yew Seng Nga, Rajagopalan Srinivasana,b,∗

Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore, 10 Kent Ridge Crescent, Singapore 119260, SingaporeProcess Sciences and Modeling, Institute of Chemical and Engineering Sciences, 1 Pesek Road, Jurong Island, Singapore 627833, Singapore

r t i c l e i n f o

rticle history:eceived 10 August 2009eceived in revised form 22 March 2010ccepted 5 May 2010vailable online 13 May 2010

a b s t r a c t

Numerous methodologies for fault detection and identification (FDI) in chemical processes have beenproposed in literature. However, it is extremely difficult to design a perfect FDI method to efficientlymonitor an industrial-scale process. In this work, we seek to overcome this difficulty by using multipleheterogeneous FDI methods and fusing their results so that the strengths of the individual FDI meth-ods are combined and their shortcomings overcome. Several decision fusion strategies can be used for

eywords:rocess monitoringupervisionayesian probabilitylassifier

this purpose. In this paper, we study the relative benefits of utility-based and evidence-based decisionfusion strategies. Our results from a lab-scale distillation column and the popular Tennessee Eastmanchallenge problem show that in situations where no single FDI method offers adequate performance,evidence-based fusion strategies such as weighted voting, Bayesian, and Dempster–Shafer based fusioncan provide (i) complete fault coverage, (ii) more than 40% increase in overall fault recognition rate, (iii)

in mo
significant improvementdelays.
. Introduction

Quick and correct detection and identification of process faultsre extremely important as far as efficient, economic and safe oper-tion of chemical process is concerned. Undetected process faultay lead to off-spec products, resulting in poor plant economy and

ometimes even catastrophic consequences like accidents, injuryo plant personnel. Successful detection and identification of pro-ess faults at an early stage can increase the success rate of faultecovery during operations and prevent accidents and unneces-ary shutdowns. Nimmo (1995) reported that the petrochemicalndustries in U.S. lose an estimated 20 billon dollars every year dueo poor abnormal event management; Laser (2000) reported thatts impact on British economy was estimated at 27 billion dollars.herefore, process abnormalities need to be detected and identifieds soon as they occur.

Detection and diagnosis of process faults in chemical pro-
esses has been an active area of research. In the literature,everal methodologies have been proposed for fault detectionnd identification (FDI) in chemical processes (Chiang, Russell, &raatz, 2001; Dash & Venkatasubramanian, 2000; Uraikul, Chan,
∗ Corresponding author at: Department of Chemical and Biomolecular Engineer-ng, National University of Singapore, Singapore, 10 Kent Ridge Crescent, Singapore19260, Singapore. Tel.: +65 65168041; fax: +65 677 91936.

E-mail address: [email protected] (R. Srinivasan).

098-1354/$ – see front matter © 2010 Elsevier Ltd. All rights reserved.oi:10.1016/j.compchemeng.2010.05.004

nitoring performance, and (iv) reduction in fault detection and diagnosis

© 2010 Elsevier Ltd. All rights reserved.

& Tontiwachwuthikul, 2007; Venkatasubramanian, Rengaswamy,Yin, & Kavuri, 2003; Venkatasubramanian, Rengaswamy, & Kavuri,2003; Venkatasubramanian, Rengaswamy, Kavuri, & Yin, 2003).Most of the literature on fault detection and identification (FDI)for chemical processes depend on a single method such as prin-cipal components analysis (PCA), artificial neural networks (ANN),self-organizing maps (SOM), qualitative trend analysis (QTA), sig-nal processing methods or first principles models. Although manyprocess fault detection and diagnosis approaches have been pro-posed, each method has its own advantages and weaknesses (Dash& Venkatasubramanian, 2000; Venkatasubramanian, Rengaswamy,Kavuri, et al., 2003). Table 1 gives a comparison of various FDI meth-ods in terms of desirable characteristics. A check mark indicatesthat the particular method (column) satisfies the correspondingdesirable property (row) while a cross indicates that the propertyis not satisfied. Consider PCA as an example. PCA performs multi-variate analysis by projecting high dimensional data onto a lowerdimensional subspace that explains the most pertinent features, asmeasured by the variance in the data. PCA-based FDI methods candetect faults based on limit derived from violation of Hotteling’s T2

and/or Q statistics. These also provide means for fault identificationand novel fault detection. In PCA-based approaches, the monitor-

2
ing result can be visualized in terms of Hotteling’s T and SPE plots.It is easy to develop a PCA model from historic data. PCA-basedFDI systems can also be endowed with adaptation and robustnessproperties. On the other hand, PCA-based methods suffer from aninability to explain their results, i.e., they cannot identify the root
dx.doi.org/10.1016/j.compchemeng.2010.05.004

http://www.sciencedirect.com/science/journal/00981354

http://www.elsevier.com/locate/compchemeng

mailto:[email protected]

dx.doi.org/10.1016/j.compchemeng.2010.05.004

K. Ghosh et al. / Computers and Chemica

Nomenclature

Indicesi, j fault IDk, l FDI method

ParametersM total number of classes known to a FDI method/

classifierK total number of FDI methods/classifiers usedS total number of samples in the training data˛ threshold used in voting-based fusion (0 < ˛ ≤ 1)

Variables� empty set� BPA that the combination assigns to the null subset,

�Kp Kappa statistic between two FDI methods� = {C1, C2, . . . , CM} frame of discernment (FOD) having M

exhaustive and mutually exclusive classes�Kp standard error for Cohen’s Kappa statistic for large

sampleA, B, C any subset of 2�

BelBayes(Ci) Bayesian posteriori probability of class CiBel(A) total belief value committed to APl(A) total possibility value committed to ACj jth class of faultCMk confusion matrix for kth classifier/FDI methodEk(x) classification results of kth FDI method for sample xEcom(x) combined classification results for sample xEcom,voting(x) combined classification results for sample x

obtained through voting-based fusionEcom,wv(x) combined classification results for sample x

obtained through weighted voting-based fusionEcom,Bayesian(x) combined classification results for sample x

obtained through Bayesian-based fusionEcom,DS(x) combined classification results for sample x

obtained through Dempster–Shafer fusionmk(Ci) basic probability assignment of class Ci for kth FDI

methodmk(A) BPA assigned to hypothesis A for kth FDI methodmk,l(Ci) combined BPA of class Ci for combination kth and

lth FDI methodsNk

ijelement of the confusion matrix CMk, denotingnumber of sample belong to class ‘i’, classified asclass ‘j’ by kth FDI method

OP(Ci) overall probability of class CiPk(Ci) individual probability of class Ci for kth FDI methodpc proportion of agreement by chancepo proportion of agreement observedTV(Ci) total vote of class CiTWV(Ci) total weighted vote of class Ciu length of time window used for output filteringVk(Ci) individual weighted vote of class Ci for kth classi-

fier/FDI methodv minimum number of detectable faulty/abnormal

samples in a time window of length uWVk(Ci) individual weighted vote of class Ci for kth classi-

fier/FDI methodx multivariate online process data (sample)

l Engineering 35 (2011) 342–355 343

cause or describe the fault propagation pathways. Nor can they sug-gest recovery actions required to bring the process back to normaloperating conditions. Thus, a FDI method that works well underone circumstance might not work well under another when dif-ferent features of the underlying process come to the fore. It isclearly difficult to design a perfect FDI method that efficiently moni-tors a large-scale, complex industrial process in all likely scenarios.Hence, there is a strong motivation for developing systems thatrely on collaboration between multiple FDI methods so as to bringtogether their strengths and overcome their individual shortcom-ings.

A similar philosophy is now widely practiced in the patternrecognition and classification literature. A judicious and mean-ingful combination of multiple classifiers generally outperformsa single one (Ho, Hull, & Srihari, 1994; Kittler, Hatef, Duin, &Mates, 1998; Polikar, 2006; Xu, Krzyzak, & Suen, 1992). Thestrategy in such multiple classifier systems is therefore to cre-ate many classifiers, and combine their decisions such that thecombination improves upon the performance of a single classi-fier. The objective of this paper is to evaluate the benefits ofthe multiple classifier approach to chemical process FDI. Partic-ularly, we are interested in situations where the individual FDImethods are highly diverse, with strong disagreement amongthem and the overall performance of each FDI method is inad-equate. The rest of the paper is organized as follows: Section 2provides a review of multiple classifiers systems. In Section 3,various schemes for fusing the results from different classifiersare discussed. In Section 4, online fault detection and identifi-cation of chemical processes based on multiple FDI methods ispresented and illustrated using a lab-scale continuous distillationcolumn case study. We evaluate the performance of the proposedscheme using the Tennessee Eastman challenge problem in Section5.

2. Multiple classifier systems

Multiple classifier based systems, also known as committee ofclassifiers, mixture of experts, or ensemble based systems havebeen shown to outperform single-classifier systems in a broadrange of applications and under a variety of scenarios (Polikar,2006). The main rationale for combination of classifiers is that dif-ferent types of classifiers can often compliment one another andhence classification performance can be improved as a result ofcombination. The intuition is that if each classifier makes differenterrors, then a suitable combination of these classifiers can reducethe total error. This implies that we need classifiers whose decisionboundaries are adequately different from one another so that thereis low correlation between them. Such a set of classifiers is said tobe diverse.

2.1. Generating diverse classifiers

Classifier diversity can be achieved in several ways. One wayis to use different training datasets to train multiple classifiers ofthe same type. This approach is best suited for unstable classifierssuch as neural networks and decision trees in which a small changein the training dataset can lead to significant changes in the clas-sifier performance. Multiple training datasets are often obtainedthrough resampling techniques, such as Boosting or Bagging wheretraining data subsets are generated by drawing randomly, usu-
ally with replacement (Polikar, 2006). Another approach to achievediversity is to use different training parameters for different clas-sifiers of same type. For example, a series of multilayer perceptron(MLP) neural networks can be trained by using different weightinitializations, number of layers/nodes, error goals, etc. Adjusting

344 K. Ghosh et al. / Computers and Chemical Engineering 35 (2011) 342–355

Table 1Strengths and shortcomings of different FDI methods.

Method QTA Expert systems PCA/PLS Kalman filters DTW/DLA Neural networks

Multivariate analysis × × √ √ √ √Speed of detection

√ √ √ √ √ √Fault isolation

√ √ √ √ × √Novel fault detection

√ × √ √ √ √Explanation facility

√ √ × × × ×√×√√

svrioclsieo

2

mieVsKfiom

K

wtcw

�

w[KuTpbaG

Visualization of results × ×Recovery automation × √Ease of development

√ ×Adaptation and robustness

√ ×

uch parameters allows one to control the instability of the indi-idual classifiers, and hence contribute to their diversity. In theandom subspace method, diversity is achieved by training thendividual classifiers using different features or different subsetsf the available features (Ho, 1998). Alternatively, heterogeneouslassifiers such as multilayer perceptron (MLP) neural networks,inear discriminant analysis, k-nearest neighbor classifiers, andupport vector machines can be used for providing diversity. Thiss the approach adopted here since the heterogeneous methods arexpected to have decision boundaries significantly different fromne another.

.2. Measures of diversity

Several measures have been defined for quantitative assess-ent of inter-classifier diversity such as measurement of degree of

nter-classifier correlation, Q statistic, compound diversity or gen-ralized diversity (Kuncheva & Whitaker, 2001; Roli, Giacinto, &ernazza, 2001). The Kappa statistic, Kp, is widely used as a mea-urement to quantify inter-classifier agreement (Cohen, 1960). Theappa statistic is a pair-wise measure defined between two classi-ers and is based on the calculation of the difference between thebserved agreement and the agreements expected by chance. Theathematical representation of Kappa is given as:

p = po − pc

1 − pc(1)

here po is the observed proportion of agreement between thewo classifiers, and pc is the proportion of agreement expected byhance. The standard error for an observed Kp for large samplesas observed to follow (Cohen, 1960):

Kp ∼=√

po(1 − po)

S(1 − pc)2(2)

here S is the total number of samples. The value of Kp is bounded in−1, 1] (Ben-David, 2008; Haley & Osberg, 1989). Different values ofappa correspond to different potency of agreements. A commonlysed benchmark for interpretation of Kappa statistic is shown in

p
able 2 (Landis & Koch, 1977), where K = 1 represents the case oferfect agreement for all cases, while Kp = 0 signifies agreementy chance. Kp < 0 indicates agreement less than chance, possiblysystematic disagreement between the two classifiers (Viera &arrett, 2005). In the present work, we use the Kappa statistic since
Table 2Interpretation of Kappa value.

Kappa value Interpretation

<0.00 Less than chance agreement0.00–0.20 Slight agreement0.21–0.40 Fair agreement0.41–0.60 Moderate agreement0.61–0.80 Substantial agreement0.81–1.00 Almost perfect agreement

× × ×× × ×× √ √× √ √

its quantitative value can also be readily interpreted in qualitativeterms.

3. Decision fusion strategies

The second key component in multiple classifier systems is tocombine the decisions of individual classifiers in such a way thatthe correct decisions are amplified, and incorrect ones cancelledout. Several decision fusion strategies are available for this purposeas discussed next.

Approaches for decision fusion can be broadly classified asutility-based and evidence-based methods. Utility-based methodsprovide the simplest way to fuse decisions. These methods do notutilize any prior knowledge or evidence from previous predictions,but are based on some aggregating techniques which evaluate thecombined utility functions generated from each classifier. Methodsbased on utility techniques include simple average, voting tech-niques, and their variants. In contrast to utility-based techniques,evidence-based approaches use a priori information from previ-ous performance of each classifier to combine the decisions. Twomain approaches that form the backbone of many evidence-basedapproaches in the pattern recognition literature are the Bayesianand the Dempster–Shafer methods.

3.1. Voting-based fusion

Voting has been a popular form of utility-based decision fusion.In voting-based fusion, the class assigned by a classifier is con-sidered as a vote for that class. There are three major versions ofvoting, where the winner is the class (i) on which all classifiersagree (unanimous voting); (ii) predicted by at least one more thanhalf the number of classifiers (simple majority); or (iii) that receivesthe highest number of votes, whether or not the sum of those votesexceeds 50% (plurality voting or just majority voting). The most pop-ular one is the majority vote in which the class voted by most ofclassifiers will be regarded as winner and the input assigned to thatclass.

If there are K independent classifiers and each of these classifierproduces a unique decision regarding the identity of the unknownsample. Then the sample is assigned to the class for which at leasta fraction (˛) of total number of classifiers (K) agrees; otherwise,the sample is rejected. In this work, the voting rule has been imple-mented as follows:

Step 1: Compute individual vote of each classThe first step is to compute individual vote of each class from

the output of a classifier. If the output of the kth classifier is classCi. i.e., Ek(x) = Ci., then the individual vote V of classifier k for classCi is given by:

Vk(Ci) ={

1 when Ek(x) = Ci i = 1, 2, . . . , M; k = 1, 2, . . . , K

0 otherwise(3)

emica

fi(hmiAtetmtDde

3

fioiatoavfb

C

Cktc

m

Step 1: Compute individual probability of each classThe first step is to compute the individual a priori probability of

each class from the output of the classifier. If the output of the kthclassifier is class Cj. i.e., Ek(x) = Cj, then individual probability P of

K. Ghosh et al. / Computers and Ch

Step 2: Compute total votes for each classThe next step is to compute the total votes TV for each class by

adding the respective individual votes.

TV(Ci) =K∑

k=1

Vk(Ci), i = 1, 2, . . . , M (4)

Step 3: Decision ruleFinally, the class with the maximum total vote is considered as

the winner if its total vote is at least a fraction (˛) of the totalnumber of classifiers (K).

Ecom,voting ={

arg maxi ∈ [1,2,...,M]

{TV(Ci)}, if maxi ∈ [1,2,...,M]

{TV(Ci)} ≥ ˛ ∗ K

CM+1, otherwise(5)

where, 0 < ˛ ≤ 1. Ecom,voting(x) is the combined classificationresult for sample x obtained through voting-based fusion.Ecom,voting = CM+1, denotes that the sample cannot be assigned toany of the known M classes and is rejected.

Applications of majority voting techniques to combine classi-ers can be found in Lam and Suen (1997), Rahman and Fairhurst2000), and Lin, Yacoub, Burns, and Simske (2003). A compre-ensive review of majority voting and its variants for combiningultiple classifiers in character recognition has been presented

n Rahman, Alam, and Fairhurst (2002) and Kuncheva (2005).lthough quite simple, there are certain drawbacks associated with

he voting-based method. Voting methods treat all the classifiersqually without any consideration of the classifier’s characteris-ics or performance. But under certain circumstances one classifier

ay outperform others and should be given more weight. Inhe present work, we employed weighted voting, Bayesian andempster–Shafer based fusion strategies in order to overcome thisifficulty. All these fusion methods use previous performance ofach classifier to combine their outputs.

.2. Weighted voting-based fusion

In weighted voting, usually, a weight is assigned for each classi-er, or sometimes the classifier-predicted class combination basedn the performance on the training dataset or even a separate val-dation dataset. In weighted voting method, usually a weight isssigned to each classifier, for instance proportional to its classifica-ion accuracy on a training dataset. The class-specific performancef a classifier can be captured in a confusion matrix that is usu-lly constructed by testing the classifier performance on separatealidation datasets or on training datasets (Xu et al., 1992). The con-usion matrix CM for classifier k is typically represented as shownelow.

Mk =

⎡⎢⎢⎢⎢⎢⎣

Nk11 Nk

12 · · · Nk1M Nk

1(M+1)

Nk21 Nk

22 · · · Nk2M Nk

2(M+1)

......

. . ....

...

NkM1 Nk

M2 · · · NkMM Nk

M(M+1)

⎤⎥⎥⎥⎥⎥⎦ k = 1, 2, . . . , K (6)

The rows in this confusion matrix stand for the actual classes: C1,2, . . ., CM, while the columns indicate the classes assigned by theth classifier. An element Nk

ij, in the confusion matrix, represents

he percentage of input samples from class Ci that are assigned tolass Cj by classifier k.

In this work, the weighted voting algorithm has been imple-ented based on the confusion matrix as follows:Step 1: Compute individual weight for each class

l Engineering 35 (2011) 342–355 345

The first step is to compute individual weight for each class fromthe output of a classifier. Let the output of the kth classifier be classCj. i.e., Ek(x) = Cj. Then the individual weight WV for each class Ci forclassifier k, is given by:

WVk(Ci) =Nk

ij∑Mi=1Nk

ij

, k = 1, 2, . . . , K (7)

Thus, each classifier assigns a weight for each class based on theconfusion matrix.

Step 2: Compute total weight for each classThe next step is to compute the total weight TWV for each class

by adding the respective individual weights.

TWV(Ci) =∑K

k=1WVk(Ci), i = 1, 2, . . . , M (8)

Step 3: Decision ruleFinally, the class with the maximum total weight is considered

as the winner and the input is assigned to that class.

Ecom,wv = arg maxi ∈ [1,2,...,M]

{TWV(Ci)} (9)

The above decision rule relies on the closed-world assumption(Smets, 2007) as is common in evidence-based decision fusionstrategies (Niu et al., 2008; Parikh, Pont, & Jones, 2001; Parikh,Pont, Jones, & Schlindwein, 2003), wherein, the known M classes(C1, C2, . . . , CM) are regarded as being a complete description ofthe system states.

A detailed discussion of weighted voting can be found inLittlestone and Warmuth (1994). Benediktsson and Kanellopoulos(1999) used a weighting based approach to combine the classifica-tion results from multiple neural network and statistical models.The weights of the individual classifier reflected the reliability ofthe sources and were optimized in order to improve the com-bined classification accuracy during training. Tsoumakas, Angelis,and Vlahavas (2005) combined weighted voting with the classifierselection step so that only results from a subset of the classifierswere used for fusion. Such selective fusion was shown to be a gen-eralization of weighted voting.

3.3. Bayesian-based fusion

Bayesian technique is a popular evidence-based method fordecision fusion and conflict resolution among multiple classifiers.It estimates the posteriori probability of a class from the a prioriknowledge of class-specific performance of each individual classi-fier. The Bayes rule is used to calculate the posteriori probability.The final predictions are then made based on the estimated valueof posteriori probability.

The Bayesian fusion algorithm used in this work has the follow-ing steps:

class Ci assigned by classifier k is computed as:

Pk(Ci) =Nk

ij∑Mi=1Nij

, k = 1, 2, . . . , K (10)

3 emica

idnibaaoiMtbfuon–

3

ftbrat(a

ctmb�{ab{ascfcc

and both imprecision and uncertainty are eliminated. If BPAs areassigned to only simple hypotheses (m(A) = 0 for |A| > 1), then the

46 K. Ghosh et al. / Computers and Ch

Step 2: Compute overall probability of each classThe overall probability OP of each class is computed by multi-

plying the individual probabilities from the various classifiers.

OP(Ci) =K∏

k=1

Pk(Ci), i = 1, 2, . . . , M (11)

Step 3: Compute Bayesian belief value of each classThe next step is to compute the Bayesian posteriori probability

Bel of each class from the overall probabilities.

BelBayes(Ci) = OP(Ci)∑Mi=1OP(Ci)

(12)

Step 4: Decision ruleFinally the class with the maximum posterior probability is con-

sidered the winner.

Ecom,Bayesian = arg maxi ∈ [1,2,...,M]

{BelBayes(Ci)} (13)

Bayesian decision fusion strategy has been successfully appliedn diverse fields ranging from pattern recognition (handwrittenigit/ character recognition, image recognition), to medical diag-osis and machine fault diagnosis (fault diagnosis in transformer,

nduction motor). Zheng, Krishnan, and Tjoa (2005) used Bayesian-ased fusion to integrate results from different image processingpproaches for diagnosing diseases. Foggia, Sansone, Tortorella,nd Vento (1999) studied the best tradeoff between error-ratef Bayesian combination and rejection of sample (i.e., not assign-ng it a class) and proposed a threshold-based rejection criterion.

cArthur, Strachan, and Jahn (2004) combined k-means clus-ering, back-propagation neural network, and user written rulesased on Bayesian approach to diagnose faults in a power trans-ormer. For fault diagnosis of an induction motor Niu et al. (2008)sed a Bayesian decision fusion scheme to combine the resultsf classifiers based on support vector machines, linear discrimi-ant analysis, k-Nearest neighbors, and adaptive resonance theoryKohonen neural network.

.4. Dempster–Shafer fusion

The Dempster–Shafer theory, also referred to as theory of beliefunctions, is a generalization of the Bayesian theory of subjec-ive probabilities. The theory of Dempster–Shafer is based onelief functions originally developed by Dempster (1968), and laterefined by Shafer (1976) to robustly deal with incomplete data. Itllows for a representation of both imprecision and uncertaintyhrough the definition of two functions: plausibility (Pl) and beliefBel), both derived from a mass function m or basic probabilityssignment (BPA).

Consider a classification problem where the results (class)an be C1, C2, or C3. The set of all the classes of interest (inhis example, � = {C1, C2, C3}) is called the frame of discern-ent. The possible assignments by a classifier for a sample could

e the power set 2� i.e., the set containing all the subsets ofincluding itself and the null set �. In the example, 2� =

�, {C1}, {C2}, {C3}|, {C1, C2}, {C1, C3}, {C2, C3}, �}. If a classifier isble to always identify one class for any sample, its results cane considered to be atomic and will be a member of the set{C1}, {C2}, {C3}}. In this situation, there is no imprecision in the classssignment. On the contrary, if there are samples for which the clas-
ifier is unable to assign a specific class but can only rule out somelasses to which that the sample does not belong, then the responserom the classifier can be considered to be a set with multiplelasses as members, for example {C1, C2}. Such compound resultsould be considered as imprecise. Bayesian technique allows prob-
l Engineering 35 (2011) 342–355

ability to be assigned only to atomic hypothesis; Dempster–Shafertheory on the other hand allows a BPA to be assigned to compoundhypotheses as well.

The BPA is a critical element of Dempster–Shafer theory anddoes not refer to probability in the classical sense. For any subset Aof 2�, the BPA, represented as m(A), defines a mapping from 2� tothe interval [0,1]. Formally:

m : 2� → [0, 1]m(�) = 0∑A ∈ 2�

m(A) = 1(14)

Belief (Bel) and plausibility (Pl) functions are derived from the BPA,and are defined as follows. The total belief level committed to A, Bel :2� → [0, 1] is defined as the sum of all BPAs of all proper subsets(B) of A, i.e., B ⊆ A

Bel(A) =∑B⊆A

m(B) (15)

So in the simple 3-class assignment example above

Bel({C1, C2}) = m({C1}) + m({C2}) + m({C1,C2}) (16)

The difference between m(A) and Bel(A) is that m(A) measuresthe assignment of belief only to A, while Bel(A) measures the totalassignment of belief to A and all its subsets. The plausibility functionis defined as the sum of the BPAs of all sets B that intersect A., i.e.,B ∩ A /= �

Pl(A) =∑

∀B B∩A /= �

m(B) (17)

Therefore, in the above example

Pl({C1, C2}) = m({C1}) + m{{C2}} + m({C1,C2}) + m({C1, C3})+ m({C2, C3} + m({C1, C2, C3})) (18)

The relationship between Bel and Pl is

Pl(A) = 1 − Bel(¬A)

Pl(A) ≥ Bel(A)(19)

where (¬A) is the negation of hypothesis A.Both imprecision and uncertainty can thus be represented by Bel

and Pl. In Bayesian theory, the uncertainty of an event is expressedthrough a single value, the probability that the event will occur,and it is assumed that there is no imprecision in the measurement.In Dempster–Shafer evidence theory, the belief value of hypoth-esis A is interpreted as the minimum uncertainty value of A, andits plausibility value is interpreted as the maximum uncertaintyvalue of A. Thus, DS thus theory provides an explicit measure ofthe extent of uncertainty as the length of the interval [Pl(A), Bel(A)].In the above example, when further information about the samplebecomes available, its assignment can be changed from {C1, C2} to{C2}. At that point, for that sample

Bel({C2}) = Pl({C2}) = m({C2}) (20)

three functions BPA, Bel and Pl become equal.In Dempster–Shafer theory, evidence from multiple sources are

fused using Dempster’s rule of combination. Let, mk and ml arethe BPA assigned by the kth and lth classifiers respectively. Thecombined BPA mk,l(A) for hypothesis A can be calculated as:

emical Engineering 35 (2011) 342–355 347

m

wf(a

fiobo

pwtc

iectiDcp

Table 3Process faults analyzed for distillation column case study.

Fault ID Description Magnitude

F1 Low reflux ratio Step change from 9:1 to 1:1

The startup normally takes 2 h to reach the final steady-state. Once


k,l(A) =

∑∀B,C ∈ 2�:B∩C=A

mk(B)ml(C)

(1 − �)(21)

mk,l(�) = 0

� =∑

∀B,C B∩C=�

mk(B)ml(C) (22)

here, ⊕ is the combination operator, and � is the combined BPAor the null subset (�) before normalization. The denominator in21) is a normalization factor that ensures that the combined BPAsdd up to unity. i.e.,

∑A ∈ 2� mk.l(A) = 1.

The combination rule can be easily extended to several classi-ers by repeated application of Eq. (21) i.e., by combining the BPAsf first two classifiers (m1 and m2) using Eq. (21) to obtain the com-ined BPA (m1,2) and then combining the result (m1,2) with the BPAf the third classifier (m3) and so forth until the Kth classifier.

m1,2,...,K = m1 ⊕ m2 ⊕ · · · ⊕ mK

= (((m1 ⊕ m2) ⊕ m3) ⊕ · · · ⊕ mK )

= ((m1,2 ⊕ m3) ⊕ · · · ⊕ mK )

and so on.

(23)

Several approaches to compute BPAs for classifiers have beenroposed by Xu et al. (1992). Here, we have used the approachhere the performance of the classifier on training data, stored in

he confusion matrix, is used to estimate the belief functions of thelassifier.

Step 1: Calculate individual Basic probability assignment (BPA)values for each classifier.

Suppose � = {C1, C2, . . ., CM} has M exhaustive and mutuallyexclusive elementary classes. If the output of the kth classifier isclass Cj, i.e., Ek(x) = Cj., then the individual BPA values of a class Cifor classifier k can be expressed as:

mk(Ci) =Nk

ij∑Mi=1Nk

ij

k = 1, 2, . . . , K (24)

Therefore,∑A ∈ �

mk.l(A) = 1 (25)

In this work, following Li, Bao, and Ou (2008), and Parikh et al.(2001, 2003) we consider all hypothesis to be atomic classes andBPAs for compound classes are set to 0.Step 2: Compute combined BPA values using Dempster’s Rule

Individual BPA values of all the classifiers are combined byDempster combination rule in Eqs. (21) and (22) to obtain thecombined BPA values m1,2,...,K (Ci).Step 3: Decision rule

Finally the class Ci with maximum combined BPA value is con-sidered as the winner.

Ecom,DS = arg maxi ∈ [1,M]

{m1,2,...,K (Ci)} (26)

Dempster–Shafer based fusion has also been widely used in var-ous fields. Parikh et al. (2001, 2003) used the Dempster–Shafervidence theory to combine the outputs of multiple primarylassifiers to improve overall classification performance. The effec-
iveness of this approach was demonstrated for detecting failuresn a diesel engine cooling system. Basir and Yuan (2007) used theempster–Shafer approach to locate faults in induction motors byombining time-domain, frequency-domain and statistical signalrocessing methods. Similar approaches were used in Yang and
F2 Low Reboiler power Step change from 0.8 to 0.4 kWF3 Low feed pump speed Step change from 50 to 20 rpm

Kim (2006) to predict the type of motor failures, where time-based and frequency-based features were first extracted from amotor by using a current and vibration classifier and by Tilie, Bloch,and Laboreli (2007) to detect blotches in digitized archive filmsequences.

There are several other decision fusion strategies includingBorda count which takes the rankings of the class supports intoconsideration; behavior knowledge space (Huang & Suen, 1993,1995) which uses a lookup table that lists the most common correctclasses for every possible class combinations given by the classi-fiers; decision templates (Kuncheva, Bezdek, & Duin, 2001) whichcompute a similarity measure between the current decision pro-file of the unknown instance and the average decision profiles ofinstances from each class. A detailed overview of various decisionfusion strategies is available in Kuncheva (2005).

There exists little work in the literature regarding the applica-tion of decision fusion strategies to combine multiple FDI methodsfor fault detection and identification in chemical processes. Despitethe obvious promise of multi classifier systems to process monitor-ing and fault diagnosis, their potential remains largely unexplored,especially in situations where the FDI method are diverse withsignificant amount of conflict. We address this important issuehere.

4. Decision fusion for chemical process FDI

The decision fusion based fault detection and identificationscheme deployed in this work is shown schematically in Fig. 1. Theinput to each FDI method is online process data. The output fromeach FDI method is an assigned class, normal or a fault class. Theoutput from the FDI methods are combined through the decisionfusion strategies reviewed above to obtain a consolidated resultin which the agreements among individual methods are combinedand conflicts are resolved. The proposed scheme is illustrated usinga lab-scale distillation unit case study.

4.1. Case study I: Lab-scale continuous distillation column

The schematic of the distillation unit is shown in Fig. 2. Thedistillation column is of 2 m height and 20 cm width and has 10trays including a total condenser and a reboiler; the feed entersat 5th tray from top. This distillation column is used to separatebinary liquid mixture of ethanol and water (20%v/v). The systemis well integrated with a control console and data acquisition sys-tem. Seventeen variables – all tray temperatures (top tray – tray 1to reboiler – tray 9), column top temperature, cooling water inletand outlet temperature, reflux temperature, feed temperature, feedpump rpm, reboiler heat duty and reflux ratio – are measured at 10-second intervals. First cold startup of the distillation column unitis performed following the standard operating procedure (SOP).

the distillation column reaches steady-state, three different faults– reflux valve fault (F1), reboiler power fault (F2) and feed pumpfault (F3) are introduced in different runs. The details of the faultsare shown in Table 3. Various FDI methods are used to detect anddiagnose the faults.


eme

4

mcn

4

mKe(tdrvapw

Fig. 1. Online fault detection and identification sch

.2. FDI methods

We have implemented both model-based and data-driven FDIethods for online process monitoring and fault diagnosis of the

ontinuous distillation column. We briefly describe these methodsext.

.2.1. Extended Kalman filter methodGunter (2003) presented a dynamic, nonlinear 1st principles

odel of a binary methanol-water distillation column. An extendedalman filter (EKF) has been developed based on this model tostimate the process outputs from the measured process variablesinput, output and states). In the EKF implementation, all the nineray temperatures are used as process outputs, while reboiler heatuty, feed flow rate, feed temperature, feed composition, reflux
atio, reflux temperature are used as process inputs and liquid andapor phase compositions of all the 10 trays (including condensernd reboiler) are considered as process states. Values of the processarameters related to physico-chemical properties of ethanol andater and thermodynamic equilibrium properties of ethanol-water
Fig. 2. Schematic diagram of lab

based on decision fusion of multiple FDI methods.

mixture are obtained from the literature, while the parameters per-taining to the column’s design such as molar hold-up of all thetrays, condenser, and reboiler are obtained from the distillation col-umn manual. Heat loss from the column is estimated by fitting theexperimental results with model simulation data.

The difference between the actual measurement and that esti-mated by the filter is defined as the innovation or residual ofthe filter. When a fault occurs, the behavior of the process isaltered which results in larger than normal residuals. Thresholdlimits are defined for residuals for normal behavior to accountfor noise and un-modeled nonlinearity. In addition, a minimumtime above threshold criterion, called dwell-time, is also used(Bhagwat, Srinivasan, & Krishnaswamy, 2003). The dwell-time cri-terion accounts for momentary threshold crossings due to noise,discontinuities, phase changes or bad readings. Faults are flagged
when both these conditions are satisfied. Once a fault is detected, itslocation and cause is identified by analyzing the residuals (Bhagwatet al., 2003). Fault maps have been developed to identify the pos-sible location(s) of the fault(s). The fault map is a set of logicalstatements that correlate faults to their causes. It can be gen-
-scale distillation Column.

K. Ghosh et al. / Computers and Chemical Engineering 35 (2011) 342–355 349

Table 4Performance of various FDI methods in distillation column case study – Scenario 1.

Fault ID Fault introduction time(sample no.)

EKF PCA SOM ANN

Detectiondelay(samples)

Diagnosisdelay(samples)







F1 50 7 12 1 345 3 46 1 2F2 50 4 12 1 47 3 5 2 3

etssdtoHif

4

tSfa9nis

fTrosf

fsTttLtc

bdtiudsitkippof

with voting being the slowest among them. Both Bayesian andDempster–Shafer strategies yield around 80% reduction in averagefault detection delay and 90% reduction in diagnosis delay w.r.t. theslowest FDI method. The percentage of samples correctly classified

Table 5Inter-classifier agreement among FDI methods in distillation column case study –Scenario 1.

FDI method I FDI method II Kappa value Agreement level

EKF PCA 0.8911 Almost perfect

F3 50 3 3 3Average delay 4.667 9.000 1.667

rated by analyzing observable faults and detectable causes forhe process under consideration. For example, if the residuals ofome top tray temperatures (tray 1 to tray 3 in the rectificationection) exceed the threshold limit, then the fault is most likelyue to reflux valve fault. Similarly, if the residual of feed trayemperature and a few trays above and/or below it cross the thresh-ld limit then abnormality is most likely due to feed pump fault.igher residuals on reboiler and/or few bottom plate temperatures

n the striping section will be most likely due to reboiler powerault.

.2.2. Data-based FDI methodsThis first principles model-based EKF method is used along with

hree data-driven methods – Principal Component Analysis (PCA),elf-organizing maps (SOM) and artificial neural network (ANN)or monitoring and fault diagnosis. Since in the EKF method we usell the nine tray temperatures (top tray – tray 1 to reboiler – tray) as process outputs, for the data-driven FDI methods the sameine tray temperatures are also used for process monitoring. This

s solely to provide a common ground for comparing the various FDIchemes and is not a requirement of the decision fusion strategies.

A PCA-based model of normal operations was developed forault detection. A fault is flagged when the 99% confidence limits of2 statistic and/or SPE value is violated. For fault diagnosis, a faulteconstruction scheme is used where in separate models are devel-ped for each fault class. The PCA model that shows an in-controltatus during abnormal operations is considered to flag the class ofault.

The ANN-based FDI method considered here used a feed-orward back-propagation neural network that mapped the onlineample to the four process states (normal and three fault classes).he ANN has two layers with [10, 4] nodes using tan-sigmoidransfer function for the hidden layers, and linear transfer func-ion for the final output layer. The ANN was trained using theevenberg–Marquardt algorithm. Among the four output nodes,he one with the largest value was considered to indicate the pro-ess state if its value is close to 1 (1 ± 0.3).

A Self-Organizing Map (SOM) based on the approach proposedy Ng and Srinivasan (2008a), Ng and Srinivasan (2008b) was alsoeveloped for this case study. This involves two phases – (i) offlineraining, (ii) online process monitoring and fault diagnosis. Dur-ng offline training phase, first a 2-dimensional SOM is trainedsing all available training data. After training, the neurons on 2-imensional SOM space are clustered. The trained SOM for this casetudy consists of 48 × 11 map units and the neurons are clusterednto 30 clusters using the k-means clustering algorithm. The clus-er hits corresponding to normal training data as well as variousnown fault states are annotated and stored in a database. Dur-
ng online process monitoring and fault diagnosis phase, the onlinerocess data is projected onto the trained SOM. The SOM methoderforms monitoring by tracking any deviations from the nominalperating cluster on the SOM space. Fault identification is throughault signature analysis by comparing the cluster hits generated
3 19 309 3 25131.7 8.333 120 2.0 10.0

from online measurement to that of the training data of each classof fault. The interested reader is referred to Ng (2008) for detailsof the SOM implementation. In order to reduce the effect of noiseon the performance of the FDI methods, their output is filteredas follows: A process abnormality alarm is flagged whenever thefault/abnormality is detected in at least v samples out of u consecu-tive samples, where (1 ≤ v ≤ u). We use u = 10 in all the case studiesin this work. The decision fusion based on of these heterogeneousFDI methods is illustrated next.

4.3. Results

We have studied two different scenarios for this case study. InScenario 1, all the FDI methods used are well designed and welltrained. Therefore each of them is best-in-class and can quicklydetect and identify all three process faults. In industrial processes,often it is difficult to design a single FDI method that can detectand identify all process faults. To mimic this situation, in Scenario2, we have redesigned and retrained all four FDI methods in such away that each FDI method now becomes a specialist that can detectand identify only certain faults. Since none of the FDI methods candetect/identify all the faults the overall recognition rate of eachmethod becomes quite low. In addition, there will be significantamount of conflict among the FDI methods. Next, we present theresults in both these scenarios.

4.3.1. Scenario 1Table 4 shows the performance of the four FDI methods for the

three faults. It can be seen from this table that all the FDI meth-ods can quickly detect and identify all the three process faults. Theinter-classifier agreement between each pair of method is shownin Table 5. The high Kappa values (>0.8) suggest that almost per-fect agreement exist between all the method pairs. There is nosignificant conflict as such among the different methods. The per-formance of the various decision fusion strategies are summarizedin Table 6. The results show that all three fusion methods respondquickly in terms of detecting and diagnosing all three process faults

EKF SOM 0.8901 Almost perfectEKF ANN 0.9335 Almost perfectPCA SOM 0.8474 Almost perfectPCA ANN 0.8830 Almost perfectSOM ANN 0.8906 Almost perfect


Table 6Performance of various decision fusion strategies in distillation column case study – Scenario 1.

Fault ID Voting Bayesian Dempster–Shafer







F1 3 46 1 3 1 3F2 3 12 1 5 1 5F3 3 25 3 3 3 3Average delay 3.000 27.667 1.667 3.667 1.667 3.667

Table 7Overall recognition rate of each FDI method in distillation column case study –Scenario 1.

FDI methods Faults identified Overall recognition rate (%)

EKF F1, F2, F3 96.49PCA F1, F2, F3 94.52SOM F1, F2, F3 94.06ANN F1, F2, F3 96.26

Table 8Overall recognition rate of decision fusion strategies for distillation column casestudy – Scenario 1.

FDI methods Faults Identified Overall recognition rate (%)

b

tbr

odm>>

4

sdmfFed

Table 10Inter-classifier agreement among FDI methods in distillation column case study –Scenario 2.


EKF PCA 0.422 ModerateEKF SOM 0.053 SlightEKF ANN 0.053 Slight

TP

Voting F1, F2, F3 97.4Bayesian F1, F2, F3 99.88Dempster–Shafer F1, F2, F3 99.88

y a method is termed as its overall recognition rate.

Overall recognition rate (%)

= 100 × No. of samples correctly classifedTotal no. of samples

(27)

Tables 7 and 8 show that 3–5% improvement in overall recogni-ion rate can be achieved through Bayesian and Dempster–Shaferased fusion whereas improvement achieved in overall recognitionate through voting-based fusion is 1–3%.

Although all three fusion strategies provide some improvementver any single FDI method, the improvement achieved throughecision fusion is marginal in this case since the individual FDIethods perform reasonably well (with an overall recognition rate

94%) and with no significant conflict among them (Kappa values0.8).

.3.2. Scenario 2In this scenario, we have retrained the FDI methods to mimic the

ituation likely in large-scale processes, i.e., not all FDI methods canetect and identify all three process faults. In this scenario, the PCA
odel can only detect and diagnose fault F3; it classifies samples
rom the other two faults as normal. Similarly, the SOM and ANNDI methods can detect and diagnose only faults F2 and F3. Thextended Kalman filter is tuned to be insensitive to fault F1 and canetect and diagnose only faults F2 and F3. Thus, each FDI method

able 9erformance of various FDI methods in distillation column case study – Scenario 2.

Fault ID Fault introduction(sample no.)

EKF PCA




F1 50 – – –F2 50 4 12 –F3 50 3 3 3

Average delay 3.5 7.5 3.0

PCA SOM −0.274 Less than chancePCA ANN −0.275 Less than chanceSOM ANN 0.990 Almost perfect

now becomes a specialist that can detect and identify only a fewprocess faults.

The performance of each FDI method in this scenario is shownin Table 9. It is quite evident from this table that none of the FDImethods can detect and identify all three faults – EKF method candetect and identify faults F2 and F3 only. PCA detects and identi-fies only F3. Both SOM and ANN methods can detect and identifyF1 and F2. The inter-classifier agreement between the methods ispresented in Table 10. The Kappa values show that almost perfectagreement exists only between SOM and ANN; agreements amongthe other methods vary from moderate to less than chance. In thisscenario, there exists significant conflict (disagreement) among theresults from the different FDI methods and decision fusion becomemore challenging.

The performance of various decision fusion strategies are pre-sented in Table 11. The results indicate that all the faults canbe detected/ identified very quickly through weighted voting,Bayesian and Dempster–Shafer based fusion i.e., complete (100%)fault coverage can be achieved. All three evidence-based meth-ods provide around 43% reduction in average fault detection delayand around 90% reduction in average fault diagnosis delay w.r.t.the slowest FDI method. Tables 12 and 13show that significantimprovement (40–50%) in overall recognition rate can be achievedthrough fusion. The improvement in monitoring performance canbe visualized through the receiver operating characteristics (ROC)curves shown in Fig. 3. In these ROC curves the True positive rate(TPR) and false positive rate (FPR) are calculated using Eqs. (28)and (29) respectively. The area under the ROC curve provides a
measure of monitoring performance of a classifier, higher the areabetter is the monitoring performance. It can be easily seen that thearea under the ROC curve for Bayesian fusion is significantly higherthan that of any single FDI method.
SOM ANN






– 3 46 1 2– 3 5 2 33 – – – –3.0 3.0 25.5 1.5 2.5


Table 11Performance of various decision fusion strategies in distillation column case study – Scenario 2.

Fault ID Fault Introduction(sample no.)

Voting Weighted voting Bayesian Dempster–Shafer









F1 50 – – 1F2 50 4 7 2F3 50 – – 3

Average Delay 4.0 7.0 2.0

Fig. 3. ROC curves for individual monitoring methods and Bayesian fusion in distil-lation column case study – Scenario 2.

Table 12Overall recognition rate of each FDI method in distillation column case study –Scenario 2.


EKF F2, F3 52.339PCA F3 59.737SOM F1, F2 49.124ANN F1, F2 49.666

Table 13Overall recognition rate of decision fusion strategies for distillation column casestudy – Scenario 2.


Voting F2 51.832

iipbtisovcp

5

f(ne

contrast, the testing dataset consist of runs simulating 50 operat-ing hours (3000 min) with the fault introduced at 8 h (480 min). Thesignals for each test run are thus different from the training data interms of run-length and time of fault introduction.

Table 14Disturbances in Tennessee Eastman process case study (from Downs & Vogel, 1993).

Disturbance ID Description Type

IDV(1) A/C feed ratio, B compositionconstant (stream 4)

Step

IDV(2) B composition, A/C ratioconstant (stream 4)

Step

IDV(3) D feed temperature (stream 2) StepIDV(4) Reactor cooling water inlet

temperatureStep

IDV(5) Condenser cooling water inlettemperature

Step

IDV(6) A feed loss (stream 1) StepIDV(7) C header pressure loss-reduced

availability (stream 4)Step

IDV(8) A, B, C feed composition(stream 4)

Random variation

IDV(9) D feed temperature (stream 2) Random variationIDV(10) C feed temperature (stream 4) Random variationIDV(11) Reactor cooling water inlet

temperatureRandom variation

IDV(12) Condenser cooling water inlettemperature

Random variation

IDV(13) Reaction kinetics Slow driftIDV(14) Reactor cooling water valve StickingIDV(15) Condenser cooling water valve Sticking

Weighted voting F1, F2, F3 99.79Bayesian F1, F2, F3 99.88Dempster–Shafer F1, F2 F3 99.88

The performance of voting-based fusion scheme is rather poorn this scenario. Only fault F2 can be successfully detected anddentified and the overall recognition rate is only about 52%. Itserformance is not better than any single FDI method. This isecause voting treats each method equally without consideringheir class-specific performance. Since in this case each methods a specialist that can detect and identify only certain faults, theuccess of decision fusion depends primarily on proper utilizationf a priori information about class-specific performance. Weightedoting, Bayesian and Dempster–Shafer methods, which use thelass-specific performance from the confusion matrix, hence out-erform voting.

. Case study II: Tennessee Eastman challenge problem

In this section, the various decision fusion methods are testedor online disturbance identification on the Tennessee EastmanTE) industrial challenge problem (Downs & Vogel, 1993). The Ten-essee Eastman process provides a realistic industrial process forvaluating process control and monitoring methods. It has been

1 1 1 1 13 2 3 2 33 3 3 3 32.33 2.0 2.33 2.0 2.33

widely used by the process monitoring community to evaluate andcompare various approaches. The TE process produces two prod-ucts (G and H) and a byproduct (F) from reactants A, C, D, and E.The decentralized multi-loop control strategy proposed by Ricker(1996) is used here. The process has five units, namely: a two-phase reactor, a product condenser, a flash separator, a recyclecompressor, and product stripper. The process contains 41 mea-sured, 12 manipulated and 50 state variables. Out of 41 measuredvariables, 22 are sampled frequently (usually in every 3 min) andthe remaining 19 are composition measurements that are sampledless frequently. We use the 22 continuous process measurementsfor online identification of unknown process disturbances. TheTennessee Eastman process simulation contains 20 programmedprocess disturbances (Table 14), as proposed by Downs and Vogel(1993). Of these, eighteen process faults are tested here. Since thedecentralized multi-loop control strategy is able to provide verygood recovery actions to disturbances IDV15 and IDV16, these twoIDVs are excluded from the analysis.

To generate a training dataset, 19 runs (one normal run and18 fault runs) were performed. Each training run simulates 25 h(1500 min) of operations with a sampling interval of 3 min. All faultswere introduced at 1 h of operating time for the training data. In

IDV(16) Unknown UnknownIDV(17) Unknown UnknownIDV(18) Unknown UnknownIDV(19) Unknown UnknownIDV(20) Unknown Unknown


Table 15Detailed performance of various FDI methods in Tennessee Eastman process case study.

IDV no. Fault introductiontime (Sample no.)

PCA SOM DPCA Expert System ANN











1 160 8 – 12 86 3 20 – – 3 122 160 11 363 27 28 8 69 – – 5 123 160 397 – – – 27 – 25 27 11 –4 160 8 – – – 2 – 31 31 16 –5 160 – – – – 2 – 19 21 9 –6 160 7 7 7 27 1 6 – – 1 37 160 7 – 2 – 1 – 51 51 21 –8 160 18 43 23 72 10 26 – – 6 129 160 149 – – – 34 – 57 57 20 –

10 160 11 13 – – 5 10 – – 1 511 160 22 29 18 30 15 26 – – 31 –12 160 35 – – – 7 226 29 29 21 –13 160 18 20 27 53 12 25 – – 33 9714 160 10 22 117 – 4 14 – – 29 –17 160 23 34 41 41 16 36 – – 5 17

3311

5

PiDtpF

doPtievts

ocuttdeHobot/s

(

trained SOM consists of 37 × 22 map units that are further clusteredto 50 clusters. The predictions from all five FDI methods are com-bined through the various fusion strategies and their performancecompared.

Table 16Overall recognition rate of FDI methods in Tennessee Eastman process case study.

FDI method No. of IDVs identified Overall recognition rate (%)

PCA 10 39.352SOM 9 47.799DPCA 13 49.200Expert System 7 52.000ANN 9 32.254

Table 17Inter-classifier agreement among FDI methods in Tennessee Eastman process casestudy.


PCA SOM 0.385 FairPCA DPCA 0.407 FairPCA Expert System 0.003 SlightPCA ANN −0.027 Less than chance

18 160 46 61 74 7419 160 57 – – –20 160 19 35 27 30

Average delay 49.76 62.70 34.09 49

.1. FDI methods

We deployed five FDI methods in this case study. Apart fromCA, SOM, and ANN developed using the same strategy as describedn Section 4, we also use two additional FDI methods namelyynamic Principal Components Analysis (DPCA) and an Expert Sys-

em (ES) for monitoring and fault diagnosis of Tennessee Eastmanrocess. Next, we briefly discuss the DPCA and Expert System basedDI methods.

The DPCA is broadly similar to the PCA method for FDI. The keyifference is that each raw sample is augmented with a numberf previous observations to provide temporal information to theCA model. In this case study, a time lag of five samples was usedo construct the augmented sample. The DPCA-based FDI methods developed by constructing a DPCA model for normal as well asach for each of the fault classes. Fault detection is based on theiolation of 99% confidence limit of the T2 and / or SPE statistic forhe normal model. When a fault is detected, the DPCA model thathows in-control status is considered to flag the right class of fault.

Table 15 shows the monitoring and diagnosis performancef the various FDI methods. As seen there, some of the pro-ess disturbances are extremely difficult to detect and identifysing data-driven FDI methods. The multi-loop decentralized con-rol strategy employed in this case study is efficient in bringinghe process back to its normal operating range quickly after theisturbances occur, thus camouflaging the fault and preventingffective diagnosis. This occurs widely in real-life processes as well.owever, plant operators and engineers with wide experience inperating the process use subtle clues to uncover such distur-ances/faults in the underlying process. To mimic this knowledgef experienced operators, we use a simple rule-based expert sys-em to identify these disturbances that are otherwise hard to detectdiagnose. Some examples of the if-then rules used in our expertystem are:

(a) If the reactor cooling water flow increases from the normaloperating region of ∼36% by 2%, then the fault is likely due toincrease in the reactor cooling water temperature (IDV04),

b) If the condenser cooling water flow increases from the normaloperating region of ∼20% by 2%, then the fault is likely due toincrease in condenser cooling water temperature (IDV05).

1 59 – – 2 252 39 27 29 6 –2 33 – – 3 102.33 45.31 34.14 35 12.39 21.44

(c) If there is a simultaneous fall in reactor pressure, stripper pres-sure, stream 4 flow rate, and purge rate by 1% even for a verybrief period of time, then the fault is likely due to C header pres-sure loss and the consequent reduced availability of stream 4(IDV07).

(d) If there is a significant simultaneous increase in the magnitudeof oscillation of stripper underflow, product separator under-flow, reactor pressure, and product separator pressure, thenfault IDV19 is most likely to have occurred.

This ES and the DPCA-based FDI methods are used along withANN, SOM, and PCA-based methods for supervising the TE process.The ANN in this case has three layers with [10, 10, 19] nodes. The

SOM DPCA 0.404 FairSOM Expert System −0.203 Less than chanceSOM ANN 0.144 SlightDPCA Expert System −0.019 Less than chanceDPCA ANN 0.157 SlightExpert system ANN 0.000 Slight


Table 18Detailed performance of various decision fusion strategies in Tennessee Eastman process case study.

IDV no. Fault introductiontime (sample no.)

Voting Weighted Voting Bayesian Dempster–Shafer









1 160 12 14 2 12 6 14 6 142 160 27 27 11 12 11 12 11 123 160 399 399 25 25 25 25 25 254 160 – – 2 31 8 31 8 315 160 – – 19 19 19 19 19 196 160 7 7 2 3 2 3 2 37 160 7 54 2 19 1 9 1 98 160 23 23 12 12 12 12 12 129 160 169 169 57 57 57 57 57 57

10 160 11 11 5 5 5 5 5 511 160 23 23 18 18 18 18 18 1812 160 35 35 29 29 29 29 29 2913 160 27 27 18 25 18 20 18 2014 160 117 117 5 5 9 10 9 1017 160 36 36 17 17 17 17 17 17

25 31 31 31 3127 27 27 27 2710 10 10 10 1019.5 16.94 19.39 16.94 19.39

5

mddwftsmTh

am1ras(ffp

T

F

aan

TOp

18 160 59 59 2519 160 63 63 2720 160 27 27 8

Average delay 65.13 68.19 15.78

.2. Results

Table 16 provides a summary of FDI performance of eachethod. It is evident that none of the FDI method can individually

etect / identify all the eighteen process faults. In terms of averageetection delay, the DPCA method performs the best (12 samples)hich is about 75% faster compared to the PCA method that per-

orms the worst (50 samples). In terms of average diagnosis delay,he ANN method is the fastest (21 samples) and PCA the slowest (63amples). Further, the overall recognition rates of the individual FDIethods are extremely poor, varying from 32% to 52%. As shown in

able 17, the low Kappa values indicate that the FDI methods areighly diverse and there is significant disagreement among them.

The detailed FDI results for the various decision fusion strategiesre shown in Table 18. It can be seen that all the evidence-basedethods are successful in quickly detecting and identifying all the

8 process faults resulting in complete (100%) fault coverage. Thisesults in a 66% reduction in average fault detection delay andround 69% reduction in average fault diagnosis delay w.r.t. thelowest FDI method. As seen from Table 19, they perform well∼95%) in terms of overall recognition rate as well. The ROC curvesor the individual process monitoring methods as well as Bayesianusion are shown in Fig. 4. The true positive rate (TPR) and falseositive rates (FPR) in these ROC curves are calculated as follows:

rue positive rate (TPR) = TPTP + FN

(28)

alse positive rate (FPR) = FPFP + TN

(29)

If an abnormal sample is classified as abnormal, it is counteds a true positive (TP); if it is classified as normal, it is counteds a false negative (FN). If the sample is normal but classified asormal, it is counted as a true negative (TN); if it is classified as

able 19verall recognition rate of various decision fusion strategies in Tennessee Eastmanrocess case study.

Fusion schemes No. of IDVs identified Overall recognition rate (%)

Voting 16 57.924Weighted voting 18 94.67Bayesian 18 97.16Dempster–Shafer 18 97.155

Fig. 4. ROC curves for individual monitoring methods and Bayesian fusion in TEprocess case study.

abnormal, it is counted as a false positive (FP). To generate theROC curve, the TPR and FPR were calculated for different val-ues of v in output filtering. The higher area under the curve forBayesian-based fusion compared to any single FDI method indi-cates that the monitoring performance is improved significantlythrough evidence-based decision fusion. Voting-based fusion candetect and identify 16 faults, suffers from poorer overall recogni-tion rate and higher detection/diagnosis delay and is thus not asgood in enabling effective collaboration among the heterogeneousFDI methods.

The above case studies clearly demonstrate the potential bene-fits obtainable by combining multiple heterogeneous FDI methodsthrough decision fusion in the situations where the FDI methodsare diverse with strong disagreement (conflict) among them andthe overall performance of each FDI method is extremely poor.

6. Discussion

Fault detection and identification in chemical processes hasreceived significant attention in literature. Traditionally, single FDImethods are used for process monitoring. In this work, we havestudied the benefits that accrue from deploying multiple heteroge-
neous FDI methods simultaneously. In such situations, a key step isin having an effective means to combine the results from the var-ious FDI methods. These decision fusion strategies can be broadlyclassified into utility-based methods and evidence-based methods– the former being the simplest strategies for deployment while

3 emica

tdTsWsFttbFaooc

ffouibDtcendfLBesc

saiwisifiwtts

R

B

B

B

B

C

C

C

D

54 K. Ghosh et al. / Computers and Ch

he latter exploit some priori information of the relative merits andemerits of the various FDI methods and hence require “training”.he performance of some popular utility and evidence-based deci-ion fusion strategies has been evaluated under different scenarios.

hen all FDI methods are equally good performers, our resultshow that decision fusion provides only marginal improvement inDI performance since the individual FDI methods detect and iden-ify all the faults quickly and there is no significant conflict amonghem. The fusion performance of utility-based methods is compara-le to those of evidence-based strategies in such cases. Since all theDI methods have high individual overall recognition rate, treatingll the methods equally is a reasonable strategy with the benefitf simplicity. However, it can be argued that multiple FDI meth-ds are probably not warranted in such cases due to the additionalomplexity with limited benefits.

When the individual FDI methods have varying performanceor different faults, disagreements among them then come to theore. Our results indicate that the maximum benefit in termsf FDI performance improvement can be achieved in such sit-ations where the overall performance of any one method is

nadequate. Here, evidence-based methods perform significantlyetter than utility-based ones. Performance of both Bayesian andempster–Shafer based fusion schemes are essentially the same in

he two case studies evaluated here. Both can effectively resolveonflicts among multiple heterogeneous FDI methods and yieldxcellent results with (i) high prediction accuracy (overall recog-ition rate >95%), (ii) complete (100%) fault coverage, (iii) shortetection and diagnosis delay, and (iv) remarkable monitoring per-ormance. This supports the claim of Hoffman and Murphy (1993),uo and Caselton (1997), and Cobb and Shenoy (2003) that bothayesian and Dempster–Shafer approaches have roughly the samexpressive power. Utility-based methods perform poorly in suchituations because they treat all the FDI methods equally withoutonsidering their class-specific performance.

The benefits of decision fusion arise largely from the diver-ity among the individual FDI methods. In this work, diversity ischieved by using entirely different FDI methods. As mentionedn Section 2, in the general pattern classification literature, other

ays to achieve classifier diversity through resampling of the train-ng data (such as bagging, boosting, and stacking) have been widelytudied. Although such techniques are theoretically more tractable,n the process plant context, there is generally a paucity of datarom fault states. Hence, the adequacy of training data for deploy-ng an ensemble of a single data-driven FDI method (such as PCA)

ould be an issue. We intend to explore the trade-offs involved inhis in our future work. Alternate decision fusion strategies such ashe decision templates of Kuncheva et al. (2001) and the learningcheme proposed by Huang and Suen (1993) will also be studied.

eferences

asir, O., & Yuan, X. (2007). Engine fault diagnosis based on multi-sensor informa-tion fusion using Dempster–Shafer evidence theory. Information Fusion, 8(4),379–386.

en-David, A. (2008). About the relationship between ROC curves and Cohen’s kappa.Engineering Applications of Artificial Intelligence, 21, 874–882.

enediktsson, J. A., & Kanellopoulos, I. (1999). Classification of multisource andhyperspectral data based on decision fusion. IEEE Transactions on Geoscience andRemote Sensing, 37(3), 1367–1377.

hagwat, A., Srinivasan, R., & Krishnaswamy, P. R. (2003). Fault detection duringprocess transitions: A model-based approach. Chemical Engineering Science, 58,309–325.

hiang, L. H., Russell, E. L., & Braatz, R. D. (2001). Fault detection and diagnosis inindustrial systems. Briton: Springer-Verlag London Limited.

obb, B. R., & Shenoy, P. P. (2003). A comparison of Bayesian and belief functionreasoning. Information Systems Frontiers, 5(4), 345–358.

ohen, J. (1960). A coefficient of agreement for nominal scales. Educational andPsychological Measurement, 20, 37–46.

ash, S., & Venkatasubramanian, V. (2000). Challenges in the industrial applicationsof fault diagnostic systems. Computers and Chemical Engineering, 24, 785–791.

l Engineering 35 (2011) 342–355

Dempster, A. P. (1968). A generalization of Bayesian inference. Journal of the RoyalStatistical Society Series B, 30, 205–247.

Downs, J. J., & Vogel, E. F. (1993). A plant-wide industrial control problem. Computers& Chemical Engineering, 17(3), 245–255.

Foggia, P., Sansone, C., Tortorella, F., & Vento, M. (1999). Multiclassification: Rejectcriteria for the Bayesian combiner. Pattern Recognition, 32, 1435–1447.

Gunter, A. M. (2003). Dynamic mathematical model of a distillation column. Depart-mental Honors Thesis, College of Engineering and Computer Science, TheUniversity of Tennessee at Chattanooga.

Haley, S. M., & Osberg, J. S. (1989). Kappa coefficient calculation using multipleratings per subject. Physical Therapy, 69(11), 970–974.

Ho, T. K. (1998). Random subspace method for constructing decision forests. IEEETransactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.

Ho, T. K., Hull, J. J., & Srihari, S. N. (1994). Decision combination in multiple classifiersystems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1),66–75.

Hoffman, J. C., & Murphy, R. R. (1993). Comparison of Bayesian and Dempster–Shafertheory for sensing: A Practitioner’s approach. In SPIE proceedings on neural andstochastic methods in image and signal processing II.

Huang, Y. S., & Suen, C. Y. (1993). Behavior-knowledge space method for combi-nation of multiple classifiers. In Proceedings of IEEE computer vision and patternrecognition (pp. 347–352).

Huang, Y. S., & Suen, C. Y. (1995). A method of combining multiple experts for therecognition of unconstrained handwritten numerals. IEEE Transaction on PatternAnalysis and Machine Intelligence, 17(1), 90–94.

Kittler, J., Hatef, M., Duin, R. P. W., & Mates, J. (1998). On combining classifiers. IEEETransactions on Pattern Analysis and Machine Intelligence, 20, 226–239.

Kuncheva, L. I. (2005). Combining pattern classifiers, methods and algorithms. NewYork, NY: Wiley Interscience.

Kuncheva, L. I., Bezdek, J. C., & Duin, R. (2001). Decision templates for multi-ple classifier fusion: An experimental comparison. Pattern Recognition, 34(2),299–314.

Kuncheva, L. I., & Whitaker, C. J. (2001). Ten measures of diversity in classifier ensem-bles: Limits for two classifiers. In IEEE workshop on international sensor processingFebruary 2001, Birmingham.

Lam, L., & Suen, C. Y. (1997). Application of majority voting to pattern recognition:An analysis of the behavior and performance. IEEE Transactions on Systems Manand Cybernetics, 27(5), 553–567.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement forcategorical data. Biometrics, 33(1), 159–174.

Laser, M. (2000). Recent safety and environmental legislation. Transactions of theIChemE Part B, 78, 419–422.

Li, H., Bao, Y., & Ou, J. (2008). Structural damage identification based on integra-tion of information fusion and Shannon entropy. Mechanical Systems and SignalProcessing, 22, 1427–1440.

Lin, X., Yacoub, S., Burns, J., & Simske, S. (2003). Performance analysis of pattern clas-sifier combination by plurality voting. Pattern Recognition Letters, 24, 1959–1969.

Littlestone, N., & Warmuth, M. (1994). Weighted majority algorithm. Informationand Computation, 108, 212–261.

Luo, W. B., & Caselton, B. (1997). Using Dempster–Shafer theory to represent climatechange uncertainties. Journal of Environmental Management, 49, 73–93.

McArthur, S. D. J., Strachan, S. M., & Jahn, G. (2004, November). The design of a multi-agent transformer condition monitoring system. IEEE Transactions on PowerSystems, 19(4), 1845–1852.

Ng, Y. S. (2008). A collaborative, multi-agent based methodology for abnormal eventsmanagement. PhD Dissertation, Department of Chemical & Biomolecular Engi-neering, National University of Singapore, Singapore.

Ng, Y. S., & Srinivasan, R. (2008a). Multivariate temporal data analysis using self-organizing maps. 1. Visual exploration of multi-state operations. IndustrialEngineering Chemistry Research, 47(20), 7744–7757.

Ng, Y. S., & Srinivasan, R. (2008b). Multivariate temporal data analysis usingself-organizing maps. 2. Monitoring and diagnosis of multi-state operations.Industrial Engineering Chemistry Research, 47(20), 7758–7771.

Nimmo, I. (1995, September). Adequately address abnormal operations. ChemicalEngineering Progress.

Niu, G., Widodo, A., Son, J. D., Yang, B. S., Hwang, D. H., & Kang, D. S. (2008). Decision-level fusion based on wavelet decomposition for induction motor fault diagnosisusing transient current signal. Expert Systems with Applications, 35(3), 918–928.

Parikh, C. R., Pont, M. J., & Jones, N. B. (2001). Application of Dempster–Shafer theoryin condition monitoring applications: A case study. Pattern Recognition Letters,22, 777–785.

Parikh, C. R., Pont, M. J., Jones, N. B., & Schlindwein, F. S. (2003). Improving the perfor-mance of CMFD applications using multiple classifiers and a fusion framework.Transactions of the Institute of Measurement and Control, 25(2), 123–144.

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits andSystems Magazine, 6(3), 21–45.

Rahman, A. F. R., Alam, H., & Fairhurst, M. C. (2002). Multiple classifier combinationfor character recognition: revisiting the majority voting system and its varia-tions. In IAPR workshop on document analysis systems (Vol. 2423) August 2002,Princeton, NJ, (pp. 167–178).

Rahman, A. F. R., & Fairhurst, M. (2000). Decision combination of multiple classifiersfor pattern classification: Hybridisation of majority voting and divide and con-quer techniques. In Fifth IEEE workshop on applications of computer vision (pp.58–63).

Ricker, N. L. (1996). Decentralized control of the Tennessee Eastman Challenge Pro-cess. Journal of Process Control, 6(4), 205–211.

emica

R

S

S

T

T

U

V


oli, F., Giacinto, G., & Vernazza, G. (2001). Methods for designing multiple classifiersystems. In Proceedings of multiple classifier systems 2001 (pp. 78–87).

hafer, G. (1976). A mathematical theory of evidence. Princeton, USA: Princeton Uni-versity Press.

mets, P. (2007). Analyzing the combination of conflicting belief functions. Informa-tion Fusion, 8, 387–412.

ilie, S., Bloch, I., & Laboreli, L. (2007). Fusion of complementary detectors forimproving blotch detection in digitized films. Pattern Recognition Letters, 28,1735–1746.

soumakas, G., Angelis, L., & Vlahavas. (2005). Selective fusion of heterogeneousclassifiers. Intelligent Data Analysis, 9(6), 511–525.

raikul, V., Chan, C. W., & Tontiwachwuthikul, P. (2007). Artificial intelligence formonitoring and supervisory control of process systems. Engineering Applicationsof Artificial Intelligence, 20, 115–131.

enkatasubramanian, V., Rengaswamy, R., & Kavuri, S. N. (2003). A review of processfault detection and diagnosis. Part II: Qualitative models and search strategies.Computers and Chemical Engineering, 27, 313–326.

l Engineering 35 (2011) 342–355 355

Venkatasubramanian, V., Rengaswamy, R., Kavuri, S. N., & Yin, K. (2003). A review ofprocess fault detection and diagnosis. Part III: Process history based methods.Computers and Chemical Engineering, 27, 327–346.

Venkatasubramanian, V., Rengaswamy, R., Yin, K., & Kavuri, S. N. (2003). A review ofprocess fault detection and diagnosis. Part I: Quantitative model-based methods.Computers and Chemical Engineering, 27, 293–311.

Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: TheKappa statistic. Family Medicine, 37(5), 360–363.

Xu, L., Krzyzak, A., & Suen, C. Y. (1992). Methods of combining multiple classifiersand their applications to handwriting recognition. IEEE Transactions on Systems,Man, and Cybernatics, 22(3), 418–435.

Yang, B. S., & Kim, K. J. (2006). Application of Dempster–Shafer theory in fault diagno-sis of induction motors using vibration and current signals. Mechanical Systemsand Signal Processing, 20, 403–420.

Zheng, M. M., Krishnan, S. M., & Tjoa, M. P. (2005). A fusion-based clinical decisionsupport for disease diagnosis from endoscopic images. Computers in Biology andMedicine, 35, 259–274.

Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault...

Documents

Transcript of Evaluation of decision fusion strategies for effective collaboration among heterogeneous fault...