Asset Life Prediction and Maintenance Decision-Making ... · Asset Life Prediction and Maintenance...

Asset Life Prediction and Maintenance Decision-Making Using a Non-Linear Non-

Gaussian State Space Model

By

Yifan Zhou

Supervised by

Prof. Lin Ma Prof. Joseph Mathew Prof. Rodney Wolff

A thesis submitted in conformity with the requirements for the degree of doctor of philosophy

School of Engineering Systems Faculty of Built Environment and Engineering

Queensland University of Technology 2010

Statement of Original Authorship

The work contained in this thesis has not been previously

submitted to meet requirements for an award at this or any

other higher education institution. To the best of my knowledge

and belief, the thesis contains no material previously published

or written by another person except where due reference is

made.

i

Abstracts Estimating and predicting degradation processes of engineering assets is crucial for

reducing the cost and insuring the productivity of enterprises. Assisted by modern

condition monitoring (CM) technologies, most asset degradation processes can be

revealed by various degradation indicators extracted from CM data. Maintenance

strategies developed using these degradation indicators (i.e. condition-based

maintenance) are more cost-effective, because unnecessary maintenance activities

are avoided when an asset is still in a decent health state. A practical difficulty in

condition-based maintenance (CBM) is that degradation indicators extracted from

CM data can only partially reveal asset health states in most situations.

Underestimating this uncertainty in relationships between degradation indicators and

health states can cause excessive false alarms or failures without pre-alarms. The

state space model provides an efficient approach to describe a degradation process

using these indicators that can only partially reveal health states. However, existing

state space models that describe asset degradation processes largely depend on

assumptions such as, discrete time, discrete state, linearity, and Gaussianity. The

discrete time assumption requires that failures and inspections only happen at fixed

intervals. The discrete state assumption entails discretising continuous degradation

indicators, which requires expert knowledge and often introduces additional errors.

The linear and Gaussian assumptions are not consistent with nonlinear and

irreversible degradation processes in most engineering assets. This research proposes

a Gamma-based state space model that does not have discrete time, discrete state,

linear and Gaussian assumptions to model partially observable degradation

processes. Monte Carlo-based algorithms are developed to estimate model

parameters and asset remaining useful lives. In addition, this research also develops

a continuous state partially observable semi-Markov decision process (POSMDP) to

model a degradation process that follows the Gamma-based state space model and is

under various maintenance strategies. Optimal maintenance strategies are obtained

by solving the POSMDP. Simulation studies through the MATLAB are performed;

case studies using the data from an accelerated life test of a gearbox and a liquefied

ii

natural gas industry are also conducted. The results show that the proposed Monte

Carlo-based EM algorithm can estimate model parameters accurately. The results

also show that the proposed Gamma-based state space model have better fitness

result than linear and Gaussian state space models when used to process

monotonically increasing degradation data in the accelerated life test of a gear box.

Furthermore, both simulation studies and case studies show that the prediction

algorithm based on the Gamma-based state space model can identify the mean value

and confidence interval of asset remaining useful lives accurately. In addition, the

simulation study shows that the proposed maintenance strategy optimisation method

based on the POSMDP is more flexible than that assumes a predetermined strategy

structure and uses the renewal theory. Moreover, the simulation study also shows

that the proposed maintenance optimisation method can obtain more cost-effective

strategies than a recently published maintenance strategy optimisation method by

optimising the next maintenance activity and the waiting time till the next

maintenance activity simultaneously.

Keywords: Degradation model, EM algorithm, Particle filter, Particle smoother,

State space model, Partially observable Markov decision process

iii

Acknowledgements I wish to express my sincere thanks to Prof. Lin Ma who not only leaded me into the

area of engineering asset management but also taught me principles of academic

research. Without the help from Prof. Lin Ma, I could not have overcome the

obstacles and finished my research. Moreover, Prof. Lin Ma also helped me

understand western culture and enjoy my life in Australia.

I would like to thank Prof. Joseph Mathew and Prof. Rodney Wolff for their

valuable advice on my research and assistance in proofreading my papers.

I appreciate the financial support from Queensland University of Technology, China

Scholarship Council, and the Cooperative Research Centre for Integrated

Engineering Asset Management. With their generous support, I can concentrate on

my PhD study without taking any part-time job.

I real want to thank my parents Lihong Zhou and Meijun Fan. They always

encouraged me when I faced difficulties during the PhD study.

I am also grateful to Dr. Sheng Zhang and Dr. Yong Sun for their support, help, and

advice.

Last but not least, I would like to thank Dr. Liqun Zhang, Dr. Eric Kim, Dekui Mu,

Yi Yu, Nima Gorjian, Ruizi Wang, Dr. Avin Mathew, Vladimir Frolov, Fengfeng

Li, and Tony Kim who helped me improve my English, inspired me through fruitful

discussions, and made my life in Australia more memorable.

iv

Table of Contents

1 Introduction ....................................................................................... 1

1.1 Introduction of the Research ................................................................. 1

1.2 Research Objectives and Methodologies .............................................. 4

1.3 Relationships of the Developed Models and Algorithms ..................... 6

1.4 Originality and Significance ................................................................. 8

1.5 Related Publications of the Candidate ................................................ 11

1.6 Structure of the Thesis ........................................................................ 12

2 Literature Review ........................................................................... 15

2.1 Degradation Modelling ....................................................................... 15

2.1.1 Threshold Crossing Models .................................................................. 16

2.1.2 Degradation Models Based on the Hazard Rate Process ...................... 25

2.1.3 State Space Degradation Models .......................................................... 28

2.1.4 Comments ............................................................................................. 32

2.2 Condition-based Maintenance Decision-Making ............................... 33

2.2.1 Inspection Scheduling ........................................................................... 34

2.2.2 CBM Optimisation Objectives ............................................................. 35

2.2.3 CBM Optimisation Methods ................................................................. 36

2.2.4 Imperfect inspections ............................................................................ 39

2.2.5 Comments ............................................................................................. 40

2.3 Solving Algorithms for Nonlinear Non-Gaussian State Space Models41

2.3.1 Basic Inference Algorithms .................................................................. 41

2.3.2 Parameter Estimation Algorism ............................................................ 47

2.3.3 Control Algorithms for the State Space Model .................................... 50

v

2.3.4 Comments ............................................................................................. 53

3 Modelling Correlated Degradation Processes of Direct and

Indirect Indicators .......................................................................... 54

3.1 Introduction ......................................................................................... 54

3.2 Model Formulations and Solving Algorithms .................................... 56

3.2.1 Model Formulations .............................................................................. 56

3.2.2 Parameter Estimation ............................................................................ 57

3.2.3 Variance-Covariance Matrix of the Parameter Estimates .................... 63

3.2.4 Model Selection .................................................................................... 64

3.2.5 Monte Carlo-Based Lifetime Prediction ............................................... 66

3.3 Simulation Study ................................................................................. 68


3.3.2 Performance Investigation .................................................................... 72

3.3.3 Life Prediction ...................................................................................... 76

3.4 Case study: Crack Size Propagation Modelling ................................. 79

3.5 Chapter Summary ............................................................................... 82

4 Joint Modelling of Failure Events and Multiple Indirect

Indicators ......................................................................................... 84

4.1 Introduction ......................................................................................... 84

4.2 Model Formulations and Solving Algorithms .................................... 86

4.2.1 Model Formulations and Notations ...................................................... 86


4.2.3 Indicator Effectiveness Evaluation ....................................................... 91

4.3 Simulation Study ................................................................................. 92


vi

4.3.2 Lifetime Prediction ............................................................................... 94

4.3.3 Effectiveness Evaluation of Indicators ................................................. 96

4.4 Case Study: Lifetime Prediction for the Bearing on a Liquefied

Natural Gas (LNG) Pump ............................................................................. 98

4.4.1 Data Introduction .................................................................................. 98

4.4.2 Model Application .............................................................................. 100

4.4.3 Discussion ........................................................................................... 103

4.5 Chapter Summary ............................................................................. 103

5 Maintenance Strategy Optimisation Using the POSMDP ........ 105

5.1 Problem Formulation ........................................................................ 107

5.2 Regular Maintenance Intervals ......................................................... 108

5.2.1 Solving the POSMDP ......................................................................... 108

5.2.2 Simulation Study ................................................................................ 117

5.3 State-Dependent Maintenance Intervals ........................................... 121

5.3.1 The Formulations and Solution Method of the POSMDP .................. 122


5.4 Maintenance Strategy Considering Imperfect Maintenance ............. 131

5.4.1 The Formulations and the Solution Method of the POSMDP ............ 132


5.5 Chapter Summary ............................................................................. 140

6 Conclusions and Future Research Directions ............................ 142

6.1 Conclusions ....................................................................................... 142

6.1.1 Modelling Correlated Degradation Processes of Direct and Indirect

Indicators .......................................................................................................... 143

6.1.2 Joint Modelling of Failure Events and Multiple Indirect Indicators .. 144

vii

6.1.3 Maintenance Strategy Optimisation Using the Continuous State

POSMDP .......................................................................................................... 145

6.2 Future Research ................................................................................ 146

7 References ...................................................................................... 148

8 Appendix ........................................................................................ 164

viii

List of Figures Figure 1-1: Relationships of developed models and algorithms ................................. 7

Figure 3-1: The simulated indirect indicators and direct indicators .......................... 69

Figure 3-2: The development of the parameter estimates ......................................... 71

Figure 3-3: MSEs of the direct indicator estimates when the observation noise is 0.5

( 0.5) ........................................................................................................ 74

Figure 3-4: MSEs of the direct indicator estimates when the observation noise is

0.05 ( 0.05) .............................................................................................. 74

Figure 3-5: Life prediction results when the failure is observable ............................ 77

Figure 3-6: The lifetime distribution predicted at different time points .................... 78

Figure 3-7: The lifetime distribution prediction at 251 when the failure is not

observable ...................................................................................................... 78

Figure 4-1: Three Simulated degradation indicators ................................................. 93

Figure 4-2: The convergence process of the EM algorithm ...................................... 94

Figure 4-3: Estimation of underlying health states .................................................... 95

Figure 4-4: RUL prediction results ............................................................................ 96

Figure 4-5: Pump schematic ...................................................................................... 99

Figure 4-6: Outer raceway spall of P301C .............................................................. 100

Figure 4-7: Inner raceway flaking of P301D ........................................................... 100

Figure 4-8: RUL prediction results of the bearing on P301C .................................. 102

Figure 5-1: Parameters spreading of the censored Gaussian distribution ................ 111

Figure 5-2: Parameters spreading of the Beta distribution ...................................... 111

Figure 5-3: Minimum long-run average cost according to different inspection

intervals when actual health states are observable ...................................... 119

Figure 5-4: The results of the policy iteration when maintenance intervals are regular

and the standard deviation of the observation noise is σ 0.3 ................. 120

Figure 5-5: Some results of the policy iteration for POMDP with irregular

maintenance intervals (the numbers in rectangles are the optimal waiting

durations till the corresponding maintenance actions) ................................ 128

ix

Figure 5-6: Some results of the policy iteration for POMDP considering imperfect

maintenance (the numbers in rectangles are the optimal waiting durations till

the corresponding maintenance actions) ...................................................... 139

x

List of Tables Table 3-1: The mean likelihood function values and the elapsed times of the three

strategies ........................................................................................................ 72

Table 3-1: The measurements of the crack size during the accelerated life test ....... 79

Table 3-2: The AICc of different models .................................................................. 81

Table 4-1: The results of effectiveness evaluation for indicators .............................. 97

Table 4-2: The specifications of the pump ................................................................ 98

Table 4-3 Vibration data features ............................................................................ 100

Table 4-4: Effectiveness evaluation for the three features extracted from the

vibration signals ........................................................................................... 101

Table 4-5: RUL prediction results of the bearing on P301C ................................... 102

Table 5-1: Mean likelihood values of the Censored Gaussian distribution and the

Beta distribution under different observation noise ..................................... 112

Table 5-2: The Monte Carlo-based method that calculates the transition matrix .... 114

Table 5-3: The process of policy iteration for the POSMDP .................................. 117

Table 5-4: The long-run average costs derived by three methods (i.e., the method

simply ignoring the observation noise, the heuristic method, and POSMDP)

when the observation noise level is different .............................................. 120

Table 5-5: The long-run average costs per unit time derived by the POSMDP with

irregular inspection interval and the method proposed by Wang (Wang and

Christer 2000; Wang 2003b). ...................................................................... 130

Table 5-6: The process to calculate the transition matrix using the Monte Carlo

based method ............................................................................................... 135

xi

List of Notations Notations used in different chapters are summarised as follows:

Notations used in Chapter 3:

The direct indicator at time

The indirect indicator at time

·,· The PDF of the Gamma distribution

The shape function of the Gamma process

The scale parameter of the Gamma process

The observation noise of the Gamma-based state space model

The standard deviation of the observation noise

The failure threshold on the direct indicator

·,· The PDF of the Gaussian distribution

The time to perform the th inspection

The direct indicator value at the th inspection

The indirect indicator value at the th inspection

The number of inspections

· The indicator function of inspections when the direct indicator is

observable

The inspection index of the th observable direct indicator

The number of observable direct indicators

xii

The parameter set of the Gamma-based state space model

The parameter set in the system equation of the Gamma-based state space

model

The parameter set in the observation equation of the Gamma-based state

space model

The increment of the direct indicator before the th inspection

The increment of the shape function before the th inspection

·,· The PDF of the Beta distribution

· | · The PDF of the importance density used in particle filtering

The th sample generated at the th inspection time according to the

importance density

The weight of the sample

The th particle at the th inspection time after particle filtering

|, The weight of the th filtering particle at the th inspection corresponding to

the th smoothing particle at the 1th inspection

The th particle at the th inspection time after particle smoothing

The model parameters used to generate simulation data

The parameter estimates derived at the th EM iteration

· The PDF of the observation noise

· The Dirac delta measure

Additional notations used in Chapter 4

xiii

The underlying health state at time

The indirect indicator vector at time

The size of the indirect indicator vector

The failure time

The censoring time

The relative contribution ratio

Additional notations used in Chapter 5

The cost incurred by an inspection

The cost incurred by a preventive replacement activity

The cost incurred by an imperfect maintenance activity

The cost incurred by an unexpected breakdown

The duration of an inspection

The duration of a preventive replacement activity

The duration of an imperfect maintenance activity

The duration of an unexpected breakdown

The original belief of the Gamma-based state space model obtained by

particle filtering

The space of the original belief

·; The projected parametric distribution

Θ The parameter space of the projected parametric distribution ·;

xiv

Ω The projected parametric density space

Ω b The density projection function

Ω The discretised projected parametric density space

The th elements in the discretised projected parametric density space

Ω

The projected belief corresponding to the brand new health state

the relative cost starting in the projected belief state

The relative costs starting in if the “do nothing” strategy is adopted

The relative costs starting in if the “preventive replacement”

strategy is adopted

∆ The inspection interval

∆ | The expected reliability at the next inspection epoch given that the

current belief state is projected as

∆ | The expected survival time during the next inspection interval when

the current projected belief is

The long-run minimum expected cost (downtime) per unit time

The transition matrix in the discretised projected belief space Ω over

one inspection interval

·,· A distance measure defined in the projected belief space

· The policy function

· The optimal maintenance strategy obtained by the policy iteration

∆ The interval of the sampling points of the waiting time for the next

preventive replacement

xv

∆ The waiting duration till the next preventive replacement

∆ The maximum waiting time for the next preventive replacement


inspection

∆ The waiting duration till the next inspection

∆ The maximum waiting time for the next inspection


imperfect maintenance

∆ The waiting duration till the next imperfect maintenance

∆ The maximum waiting time for the next imperfect maintenance

, ∆ The relative cost (downtime) when the initial projected belief state is

and inspection is performed after ∆


and preventive replacement is performed after ∆


and imperfect maintenance is performed after ∆

The transition matrix of the discretised projected beliefs given that the

transition epoch is ∆ and a health inspection is conducted

The interval of the expected reliability when is required to

calculated

| The expected waiting time till the next maintenance activity when the

current belief can be projected to and preventive replacement is

performed


xvi

current belief can be projected to and an inspection is performed


current belief can be projected to and imperfect maintenance is

performed

Δ | The expected duration of the imperfect maintenance performed after

Δ given the current projected belief state

The transition matrix of the projected belief states after imperfect

maintenance

xvii

List of Abbreviations AHM Additive hazard model

AIC Akaike's information criterion

AICc Akaike's information criterion with a second order correction

BIC Bayesian information criterion

CBM Condition-based Maintenance

CDF Cumulative distribution function

CM Condition monitoring

DPCA Dynamic principle component analysis

E step Expectation step

EKF Extended Kalman filter

EM algorithm Expectation-maximisation algorithm

FFT Fast Fourier transform

HMM Hidden Markov model

HPF High pass filter

KL divergence Kullback–Leibler divergence

LNG Liquefied natural gas

M step Maximisation step

MCMC Markov chain Monte Carlo

MDP Markov decision process

MLE Maximum likelihood estimation

MSE Mean square error

PCA Principle component analysis

PDF Probability density function

PHM Proportional hazard model

POMDP Partially observable Markov decision process

POSMDP Partially observable semi-Markov decision process

PWLC Piecewise-linear and convex

PWLC Piecewise-linear and convex

RMS Root mean square

RUL Remaining useful life

xviii

SAME State-augmentation for marginal estimation

SIR filter Sampling importance resampling filter

SMDP Semi-Markov decision process

SPC Statistical process control

UKF Unscented Kalman filter

1

1 Introduction

1.1 Introduction of the Research

The availability and capability of engineering assets is an important business

objective in modern engineering asset management. An unexpected failure of a

critical engineering asset can cause the breakdown of the whole plant. For high-risk

assets (e.g. helicopters, aircrafts, bridges, and dams), reliability and safety is even

more crucial. Therefore, effective maintenance strategies should be executed to

enhance the reliability and availability of essential assets. The optimisation of

maintenance strategies, in turn, largely depends on the prediction of asset health

condition and failure time.

Conventional research on asset life prediction up to the early nineties has been based

on lifetime distribution. However, assets employed in modern industry are becoming

more and more reliable due to the development of material science and

manufacturing technology. As a result, reliability analysis relying on lifetime

distribution cannot be performed effectively due to the deficiency of failure events.

On the other hand, advanced sensors and computer systems have made more

condition monitoring (CM) data available. Effective indicators extracted from these

CM data can be used to model asset degradation processes. Based on these

degradation indicators, asset lives can be predicted. When durations and costs of

breakdowns and maintenance activities are known, optimal maintenance strategies

can be further developed.

In reality, degradation indicators extracted from CM data have different

relationships with failure mechanisms. Wang classified the information from

degradation processes as “direct information” and “indirect information” (Wang et

al. 2000). Motivated by the research of Wang, this research divides degradation

indicators into two categories: (1) direct indicators (e.g. the thickness of a brake pad,

and the crack depth on a gear) which directly relate to a failure mechanism; and (2)

1 Introduction 2

indirect indicators (e.g. indicators extracted from vibration signals and oil analysis

data) which can only partially reveal a failure mechanism.

Direct and indirect indicators both have advantages and disadvantages. In

degradation modelling, direct indicators are often used to represent the underlying

degradation process of an asset. An asset is regarded as failed when one of its direct

indicators crosses a predetermined failure threshold. In contrast, setting a

predetermined failure threshold on indirect indicators can cause excessive false

alarms or failures without pre-alarms. Therefore, direct indicators are preferable for

asset degradation process modelling given their deterministic failure thresholds

(Grall et al. 2002; Liao et al. 2006b; Crowder and Lawless 2007). However, direct

indicators are often technically or economically impossible to sample frequently. For

example, the crack on the tooth of a gear cannot be measured online. Similarly, the

wear of the impeller in a pump cannot be measured during its operating period.

Directly applying degradation models to these direct indicators with limited sample

size is often not practically possible. Moreover, for some engineering assets, the

failure mechanisms are complex and no direct indicator is available to represent the

underlying degradation processes. For example, Wang used a generic wear condition

as a direct indicator of aircraft engines (Wang 2007). This generic wear condition

was an abstract concept and was not extracted directly from the CM data. Whitmore

proposed a similar model in which a failure was assumed to happen when a latent

process across a predetermined failure threshold. This latent process did not have

particular physical meaning and was only known when a failure happens (Whitmore

et al. 1998). Different from direct indicators, indirect indicators can be often

obtained easily through various CM techniques. However, indirect indicators can

only partially reveal the degradation process of an asset. Consequently, an

appropriate mathematical model should be developed to describe the relationship

between indirect indictors and the related underlying degradation process.

The state space model is an effective approach to reveal the underlying degradation

process of an engineering asset using indirect indicators. The state space degradation

model consists of a state equation and an observation equation. The underlying

1 Introduction 3

degradation process of an engineering asset is modelled by the state equation, and

the relationship between the underlying degradation process and indirect indicators

is described by the observation equation. Subsequently, the state space model

combines both the information from the stochastic underlying degradation processes

and the uncertain relationships between the underlying degradation process and

indirect indicators. Moreover, the state space model is an effective tool for indicators

fusion. Compared with commonly used multivariate statistical approaches and

multivariate time series analysis methods, the state space model can analyse

degradation indicators with uneven sampling intervals.

Difficulties still exist while the state space model is applied to describe practical

asset degradation processes. First of all, existing state space degradation models are

largely discrete in time or state (Jie et al. 2000; Wang 2002; Wang 2006). In

contrast, most asset degradation processes are continuous both in time and state. The

discrete state assumption requires discretising continuous degradation indicators,

which needs expert knowledge and may introduce additional errors. The discrete

time assumption, on the other hand, assumes that maintenance activities and failures

can only happen at discrete time points with regular intervals, which is not consistent

with reality. To overcome shortcomings of discrete degradation model, some

continuous state space degradation models have been proposed. Nevertheless, most

of these continuous models adopt linear and Gaussian assumptions (Whitmore et al.

1998; Hashemi et al. 2003). When a degradation process follows the linear and

Gaussian assumptions, the degradation process is not monotonically increasing. On

the contrary most degradation processes of engineering assets (e.g. wearing,

corrosion, crack growth) are not reversible between two maintenance activities. In

addition, the Gaussian process possesses a diffusion property. Therefore, conditional

probability density functions (PDF) are involved when the likelihood function is

constructed to ensure that a Gaussian process does not drift beyond its failure

threshold between two normal health states (Yuan 2007). Integrals are often needed

to calculate these conditional PDFs, which increases difficulties in establishing and

evaluating the likelihood function. Therefore, by removing the discrete, linear, and

1 Introduction 4

Gaussian assumptions, the state space model can describe asset degradation

processes that are partially revealed by indirect indicators more effectively.

This research adopts a Gamma-based state space degradation model that does not

have discrete state, discrete time, linear, or Gaussian assumptions. Two types of

underlying degradation processes are considered in this research. The first type is

represented by a direct indicator that has particular physical meaning and can be

measured. The second type indicates the overall health condition of an asset and is

only known at failure times. Both the two types are partially revealed by some

indirect indicators. Monte Carlo-based parameter estimation algorithms are

developed to address the non-linear and non-Gaussian property of the Gamma-based

state space model. Lifetime prediction and maintenance strategy optimisation

methods for the Gamma-based state space model are also investigated. Research

objectives and methodologies are introduced in detail as follows.

1.2 Research Objectives and Methodologies

1. Modelling correlated degradation processes of direct and indirect indicators

The first objective of this research is to model the correlated degradation processes

of direct and indirect indicators.

In some applications, direct indicators can be revealed by indirect indicators. For

example, the wear status of the impeller in a slurry pump can be assessed through

the cumulative amplitude measure evaluated from its vane pass frequency (Mani et

al. 2008). Therefore, direct indicators can be estimated through related indirect

indicators. This research develops a Gamma-based state space model to model the

correlated degradation processes of direct and indirect indicators. The Gamma-based

state space model consists of a state equation and an observation equation. The state

equation describes the degradation process of a direct indicator. In this research, the

Gamma process is adopted to model the degradation process of a direct indicator.

The Gamma process has been widely used to model a range of direct indicator

degradation processes, e.g. fatigue crack growth (Lawless and Crowder 2004),

1 Introduction 5

corrosion of pressure vessel (Kallen and Van Noortwijk 2005), and brake-pad wear

for automobiles (Crowder and Lawless 2007). The observation equation models the

relationship between direct and indirect indicators. In this research, the indirect

indicator is assumed as a function of a direct indicator with an additional Gaussian

noise. The parameter estimation algorithm for the Gamma-based state space model

is also developed. The parameter estimation algorithm should be able to process

incomplete observation of direct indicators due to difficulties of measurement. After

model parameters are estimated, a life prediction method based on the Gamma-based

state space model is developed.

2. Joint modelling of failure events and multiple indirect indicators

The second objective of this research is to jointly model failure events and multiple

indirect indicators.

The Gamma-based state space model with multivariate observations is adopted to

combine failure events and multiple indirect indicators. The multiple indirect

indicators are modelled by the multivariate observations and the failure times are

modelled by the first crossing time of the underlying system state process. A

parameter estimation algorithm is developed for the Gamma-based state space model

with multivariate observations. The parameter estimation algorithm can consider

failure times and multiple indirect indicators. Moreover, the parameter estimation

algorithm should also consider the degradation sequences without failure times, i.e.,

censored data. The censored data is caused by preventive replacement or missing

observation of failure events. This research also provides a parametric bootstrap

method to evaluate the effectiveness of different degradation indicators in parameter

estimation and lifetime predication. After the effectiveness of different indicators is

identified, a more economical CM system can be obtained by ignoring unnecessary

sensors.

3. Maintenance strategy optimisation

1 Introduction 6

The third objective of this research is to develop a maintenance strategy optimisation

algorithm for an asset whose degradation process follows the Gamma-based state

space model.

This research develops a Monte Carlo-based continuous state partially observable

Semi-Markov decision process (POSMDP) to model the maintenance decision-

making process of an asset that follows the Gamma-based state space model.

Optimal maintenance strategies can be obtained by the solving the POSMDP.

Inspections, preventive replacement, corrective replacement, and imperfect

maintenance are all considered. Strategies that minimise the expected cost per unit

time and maximise the availability are both investigated. The next maintenance

activity and the waiting time till the next maintenance activity are optimised

simultaneously to achieve an optimal maintenance strategy.

4. Model and algorithm validations

The last objective of this research is to validate the developed models and algorithms

using simulation and field data.

Firstly, this research performs simulation study to investigate the performance of the

developed parameter estimation algorithms, life prediction algorithms, and

maintenance strategy optimisation algorithms. Secondly, a case study using the data

collected from the accelerated life test of a gear box is conducted to validate the

advantages of the state space degradation model without linear and Gaussian

assumptions when processing monotonically increasing direct indicators. Thirdly, a

case study that uses a field dataset from a liquefied natural gas industry is conducted

to validate the asset RUL prediction ability of the Gamma-based state space model.

1.3 Relationships of the Developed Models and Algorithms

As shown in Figure 1-1, this research is divided into three parts, i.e., modelling

correlated degradation processes of direct and indirect indicators, joint modelling of

failure events and multiple indirect indicators, and maintenance strategy

1 Introduction 7

optimisation. The first part models the degradation processes of direct and indirect

indicators by the Gamma-based state space model. The second part jointly models

failure events and multiple indirect indicators by the Gamma-based state space

model with multivariate observations. The outputs of the first two parts are model

parameters, model formulations, and RULs. The RUL is only a reference for asset

management decision support. After costs and durations of breakdowns and

maintenance activities are known, an optimal maintenance strategy can be further

developed to reduce costs and enhance availability. Consequently, the last part of

this research (i.e., maintenance strategy optimisation) is based on the model

parameters and formulations that are derived by the first two parts and additional

information about costs and durations of breakdowns and maintenance activities.

Direct Indicators Indirect Indicators Failure times

1. Modelling correlated degradation process of direct and indirect indicators

2. Joint modelling of failure events and multiple indirect indicators

Model Parameters and formulations Remaining useful life

Costs and durations of breakdowns maintenance activities

3. Maintenance strategy optimisation

Optimal maintenance strategies

Input data Three parts of this research Output results

Figure 1-1: Relationships of developed models and algorithms

1 Introduction 8

1.4 Originality and Significance

The state space model is an effective tool to model partially observable asset

degradation processes. However, only a small number of state space models that are

applied to degradation modelling do not have discrete time, discrete state, linear and

Gaussian assumptions. These applications largely adopt the physical-based approach

and assume that model formulations and parameters are known (Cadini et al. 2009;

Orchard et al. 2009).

This research for the first time adopts a Gamma-based state space model to describe

asset degradation processes. The parameter estimation, lifetime prediction, and

maintenance strategy optimisation algorithms are systematically investigated. The

algorithms developed in this research can be also used to process other nonlinear

non-Gaussian state space degradation models. Detailed originality and significance

is summarised as following three parts:

1. This research develops a Gamma-based state space model to describe the

correlated degradation processes of direct and indirect indicators

This research for the first time use a Gamma-based state space model to model the

correlated degradation processes of direct and indirect indicators. A parameter

estimation method that considers indirect indicators and incomplete direct indicator

observations is developed. Issues encountered when a nonlinear non-Gaussian state

space model is applied to degradation modelling are addressed systematically.

Advantages of the developed models and algorithms are as follows:

1) The monotonically increasing property of the Gamma process is consistent

with irreversible degradation processes of most direct indicators.

Consequently, the Gamma process has been widely applied in modelling

direct indicators (Lawless and Crowder 2004; Kallen and Van Noortwijk

2005; Park and Padgett 2005b; Liao et al. 2006a; Yuan 2007; van Noortwijk

2009). Therefore, the Gamma-based state space model is expected to achieve

1 Introduction 9

a better fitness result when used to model the correlated degradation

processes of direct and indirect indicators

2) The observation equation of the Gamma-based state space model does not

have linear assumption. Therefore, more complex relationships between

direct and indirect indicators in practice can be described.

3) The developed EM algorithm can process incomplete direct indicator

observations. Direct indicators are often more difficult to obtain than indirect

indicators, and samples of direct indicators are often incomplete. Therefore,

the developed EM algorithm can make use of practical degradation dataset

more efficiently.

4) Existing research on the Gamma process also provides approaches to

consider operation conditions and unit-specific random effects during

degradation modelling (Lawless and Crowder 2004). Based on these

approaches, the proposed Gamma-based state space model can be extended

to deal with more complicated case studies.

2. The developed Gamma-based state space model is also extended to jointly

model failure times and multiple degradation indicators

In this research, the Gamma-based state space model also jointly models failure

events and multiple indirect indicators. Some original work has been done:

1) This research develops a Monte Carlo-based EM algorithm to estimate the

parameters of the Gamma-based state space model. The multiple degradation

indicators and event data are all considered during parameter estimation. The

proposed parameter estimation method can also use censored data whose

failure time is unknown.

2) This research develops an initial parameter identification method for the

Gamma-based states space model using the method of moments and

properties of the Gamma process. The proposed method can process

degradation data with irregular inspection intervals.

1 Introduction 10

3) This research develops a parametric bootstrap method to evaluate the

effectiveness of different degradation indicators that are adopted in the

Gamma-based state space model.

Advantages of the developed models and algorithms are as follows:

1) The situation where event data are insufficient can be overcome by

considering both degradation indicators and event data. In addition, the

Gamma-based state space model can also consider the censored data whose

failure time is unknown. These additional censored data can improve the

accuracy of parameter estimation and RUL prediction (Heng et al. 2009).

2) The monotonically increasing property of the Gamma process makes the

establishment of a likelihood function easier when a failure is considered.

For example, during the calculation of the likelihood function for a state

space model based on the Gaussian process, conditional PDFs are required to

ensure that the underlying degradation process does not drift across a failure

threshold between two normal states (Whitmore et al. 1998). Integrals are

needed in these conditional PDFs, which increases difficulties in establishing

and evaluating likelihood functions.

3) This research provides a method to evaluate the effectiveness of different

indicators. Consequently, a more cost effective condition monitoring system

can be established by only installing necessary sensors, and the size of the

database that stores degradation indicators can be also reduced, after the

effectiveness of different degradation indicators is identified.

4) The developed algorithms in this research can be also used to process state

space model with other non-Gaussian underlying system processes. As a

result, the failure time following different distributions can be modelled.

3. This research proposes a continuous state POSMDP to optimise the

maintenance strategy for an asset whose degradation process follows the

Gamma-based state space model

Existing partially observable Markov decision processes (POMDP) that are applied

in maintenance strategy optimisation is discrete in time and state, while the

1 Introduction 11

POSMDP developed in this research can process the state space degradation model

continuous in time and state. Moreover, the proposed POSMDP can deal with the

non-Gaussian state space degradation model which has not been discussed in

literature.

The proposed continuous state POSMDP has following advantages when used to

optimise maintenance strategies:

1) The continuous time property enables the failure happen at any time and can

process irregular inspection intervals. The continuous state, on the other

hand, avoids discretising continuous asset health states, which may introduce

errors and affect the maintenance strategy optimisation results.

2) Monte Carlo-based methods are used to solve the POSMDP. Consequently,

the proposed POSMDP can be adopted to process various state space models

without Gaussian assumptions.

3) As an extension of the Markov decision process (MDP), the POSMDP can

optimise maintenance strategies without specifying a predetermined strategy

structure (e.g. the control limit theory). Therefore, the POSMDP can derive

more flexible maintenance strategies when multiple maintenance activities

can be chosen from.

4) The POSMDP decomposes a long-run decision process into single steps.

Subsequently, some practical issues (e.g., state-dependent maintenance costs

and durations, and uncertain maintenance effects) can be formulated

concisely.

1.5 Related Publications of the Candidate

1. Refereed International Journals

Zhou, Y., L. Ma, et al. (2009). "Asset Life Prediction Using Multiple Degradation

Indicators and Failure Events: A Continuous State Space Model Approach."

Maintenance and Reliability 44: 72-81.

1 Introduction 12

Zhou, Y., Y. Sun, et al. "Latent Degradation Indicators Estimation and Prediction: a

Monte Carlo Approach." Mechanical Systems and Signal Processing. In

Press.

Zhou, Y., L. Ma, et al. "Maintenance Decision-Making with Multiple Maintenance

Options Using a Continuous-State Partially Observable Semi-Markov

Decision Process." Microelectronics Reliability. In Press.

2. Refereed International Conferences

Zhou, Y., L. Ma, et al. (2009). Asset Life Prediction Using Multiple Degradation

Indicators and Lifetime Data: a Gamma-Based State Space Model Approach.

ICRMS' 2009. Chengdu, China, IEEE.

Zhou, Y. (2010). Maintenance Decision-Making Using a Continuous-State Partially

Observable Semi-Markov Decision Process. IEEE – Prognostics & System

Health Management Conference. Macau, China, IEEE.

Zhou, Y., L. Ma, et al. (2008). A Gamma-based Continuous State Space Model for

Asset Degradation WCEAM-IMS. Beijing, China, Springer-Verlag London

Ltd: 1981-1991.

Zhou, Y., L. Ma, et al. (2008). Latent Degradation Indicator Estimation Using

Condition Monitoring Information. WCEAM-IMS 2008. Beijing, China,

Springer-Verlag London Ltd: 1967-1980.

Yu, Y., L. Ma, et al. (2008). Confidence Interval of Lifetime Distribution Using

Bootstrap Method. WCEAM-IMS. Beijing, China, Springer-Verlag London

Ltd: 1883-1890.

1.6 Structure of the Thesis

Chapter 1 Introduction

1 Introduction 13

At the beginning of this chapter, the background and topic of this research are

presented. Then, the objectives and methodologies of this research are identified.

After that, the relationships of the models and algorithms developed in this research

are presented. Finally, the originality and significance of this research is

summarised.

Chapter 2 Literature Review

The literature review is divided into three parts, i.e., degradation modelling, CBM

Decision-Making, and solving algorithms for nonlinear non-Gaussian state space

models. The first part surveys different aspects of degradation modelling. The

second part reviews literature that solves different issues involved in CBM decision-

making. The last part of the literature review investigates existing algorithms that

process nonlinear non-Gaussian state space models.

Chapter 3 Modelling Correlated Degradation Processes of Direct and Indirect

Indicators

In this chapter, the underlying health state of an asset is represented by a direct

indicator that relates to a failure mechanism directly. The direct indicator cannot be

sampled frequently due to difficulties of measurement. However, the direct indicator

is assumed to be partially revealed by an indirect indicator. The Gamma-based state

space model with a single degradation indicator is used to model the correlated

degradation processes of direct and indirect indicators. The parameter estimation and

lifetime predication algorithms for the Gamma-based state space model are

developed. The performance of the developed algorithms is evaluated by simulation

studies. Finally, a case study using the data collected from the accelerated life test of

a gear box is conducted to demonstrate the disadvantage of the linear and Gaussian

assumption in modelling the particular asset degradation process.

Chapter 4 Joint Modelling of Failure Events and Multiple Indirect Indicators

This chapter considers the situation that direct indicators are not available. In this

situation, the underlying degradation process becomes an abstract process which is

assumed to be only known at failure time. A Gamma-based state space model with

1 Introduction 14

multiple indicators is used as a joint model of multiple degradation indicators and

failure events. A Monte Carlo-based parameter estimation method that considers

multiple degradation indicators is developed. Censored data whose failure times are

unknown can be used during the parameter estimation. A parametric bootstrap

method is developed to evaluate the effectiveness of different indicators. Finally a

case study that uses the vibration data collected from a liquefied natural gas (LNG)

industry is conducted to validate the RUL prediction ability of the Gamma-based

state space model.

Chapter 5 Maintenance Strategy Optimisation Using the POSMDP

This chapter develops a continuous state POSMDP to optimise maintenance

strategies of an asset whose degradation process follows the Gamma-based state

space model. The continuous state POSMDP is converted to a semi-Markov decision

process (SMDP) through a Monte Carlo-based density projection. Optimal

maintenance strategies are then obtained by solving the converted SMDP through

policy iteration. The maintenance strategies with regular inspection interval,

irregular inspection interval, and imperfect maintenance are investigated using the

POSMDP. Simulation studies are carried out to validate the effectiveness of the

POSMDP.

Chapter 6 Conclusions and Future Research Directions

The last chapter summaries the whole thesis and identifies some possible future

research directions.

15

2 Literature Review Before the Gamma-based state space model is introduced and the corresponding

RUL prediction and maintenance strategies optimisation algorithms are investigated,

the related literature is reviewed and discussed first. This literature review is divided

into three parts. Because asset life prediction and maintenance strategy optimisation

both depend on asset degradation process modelling, commonly used degradation

modelling methods are reviewed in Section 2.1. After that, Section 2.2 discusses

different issues involved in maintenance decision-making. In this research, RUL

prediction and maintenance strategy optimisation are both based on the solving

algorithms for nonlinear non-Gaussian state space models. Therefore, these solving

algorithms are summarised in Section 2.3.

2.1 Degradation Modelling

From the perspective of statisticians, degradation or aging “pertains to a unit’s

position in a state space wherein the probabilities of failure are greater than in a

former position” (Singpurwalla 2006). According to that definition, degradation is

an abstract conception which cannot be observed or measured directly. However,

some physical indicators that reveal a degradation process may exist. When these

degradation indicators are obtained, it is possible to describe a degradation process

with mathematical models. In the reliability community, these mathematical models

are called degradation models. Once a degradation model is established, asset health

prediction and maintenance strategy optimisation can be conducted. Hence,

degradation models play a vital part in condition based maintenance (CBM).

A degradation model usually contains two components, i.e., degradation processes

of indicators, and the relationship between degradation indicators and failure events.

According to different ways to describe the relationship between degradation

indicators and failure events, commonly used degradation models can be classified

into three types. The first type is threshold crossing models in which a failure time is

modelled by the time when the degradation process of an indicator crosses a failure

2 Literature Review 16

threshold. The second type degradation models depend on a hazard rate process. In

these degradation models, degradation indicators are related to a failure time

distribution through a hazard rate process. The last type of degradation models,

namely the state space model, consists of a state equation and an observation

equation. The state equation models an underlying degradation process, and the

observation equation describes the relationship between the underlying degradation

process and related degradation indicators. The three types of degradation models

are reviewed in detail in the following sections.

2.1.1 Threshold Crossing Models

Threshold crossing models are the most commonly used degradation models in

engineering asset management. In a threshold crossing model, a failure is assumed to

happen when a degradation indicator crosses a failure threshold. A threshold

crossing model contains two critical components, i.e., degradation processes of

indicators, and failure thresholds on these indicators. In practice, the value of a

degradation indicator at a certain time points follows a conditional distribution given

the operation time, historical degradation indicator observations, and historical

working environment. A flexible mathematical model with a reasonable number of

parameters should be established to derive this conditional distribution. Section

2.1.1.1 reviews commonly used approaches to modelling the stochastic development

process of degradation indicators. The second issue of the threshold crossing model

is the identification of a failure threshold. For a direct indicator that directly relates

to a failure mechanism (e.g., the thickness of a brake pad and the crack depth on a

gear), a fixed failure threshold can be identified. On the other hand, a random failure

threshold is required to set on an indirect indicator that does not directly relate to a

failure mechanism (e.g., the indicators extracted from vibration signals and oil

analysis data). Methods to identify a failure threshold are reviewed in Section

2.1.1.2.


2.1.1.1 Degradation Process Modelling

Indicators obtained during asset deterioration are essential information to disclose

asset degradation processes. An appropriate model for degradation indicators is

indispensable for accurate asset life prediction and effective maintenance strategy

optimisation. According to the difference in mathematical assumptions, degradation

indicator modelling methods can be largely classified into four categories, i.e. the

general path model, the random process model, the Markovian stochastic process

model, and the time series model. The four types of modelling methods are reviewed

in following paragraphs. In additions, this section also surveys the method to model

multivariate degradation processes.

General Path Models

The general path model or the degradation path approach assumes that the

degradation curves of a group of assets follow an identical function form. To model

the difference among individual assets, some of the function parameters are assumed

to be random variables. For example, the degradation curves of similar assets

follow the multiplication of a same function with different random variables

; 1,2, … , , i.e., the degradation curve of the th asset is · (Zuo et

al. 1999). The simplest general path model is the random deterioration rate model

(Frangopol et al. 2004). In random deterioration rate model, the degradation process

is described as · , where, the is a random deterioration rate.

The assumption of the general path model is that the function forms of degradation

curves are known except several individual-independent random parameters.

Therefore, the general path model needs relatively few training samples and is still

effective when only sparse samples of degradation indicators are available.

However, the general path model is not flexible enough. The temporal uncertainty is

not considered due to the predetermined function form of degradation curves.

Random Process Model


The random process model presumes that the degradation indicators of different

individual assets at the th inspection time point ( 1,2, … , ) follow a certain

distribution where is the number of inspections. The parameters of that distribution

is a time-dependent parameter vector . The function of this time-dependent

parameter vector can be estimated by the regression using distribution

parameter vectors estimated at different times, i.e. ; 1,2, … , .

Consequently, stochastic degradation processes of indicators are presented by

changed distribution parameters at different inspections (Zuo et al. 1999).

Compared with the general path model, the random process model does not require

the degradation curve of each unit to follow the same function form. Therefore, it is

more flexible. However, the random process model requires distribution parameters

of a degradation indicator at every inspection. As a result, there should be enough

degradation indicator observations at each inspection. This assumption is often

impractical. Moreover, inspection schedules of different units may not be identical.

In this case, interpolation is required (Jiang and Jardine 2008). The interpolation

may introduce additional uncertainties into the process of parameter estimation.

Another disadvantage of the random process model is that the distributions of

degradation indications of different individuals only depend on time. Subsequently,

the heterogeneity of different subjects is not considered.

Markovian Stochastic Process Model

Both the general path model and the random process model are largely based on

regression. These regression models have been widely applied in practice. However,

limitations of these conventional regression based models were reported recently

(Yuan 2007). One of the limitations is that the regression model does not consider

the temporal uncertainty during the progress of degradation. In practice, most

degradation processes experience temporal uncertainty because of individual

randomness and dynamic environment.


Distinct from the general path model and the random process model, the stochastic

process model introduces time-varying uncertainties into degradation modelling.

Therefore, the stochastic process model is more flexible while fitted to practical

degradation indicators. Various stochastic processes have found their applications in

degradation process modelling (Singpurwalla 1995). Most of these stochastic

processes have the assumption of independent increments. The occurrence times and

the sizes of these increments follow different distributions as to diverse stochastic

processes (Yuan 2007). Following paragraphs introduce some stochastic processes

commonly used in degradation modelling.

The Markov chain is a widely used stochastic degradation model discrete in time

and state. The Markov chain can be represented as ; 0,1,2, … , where

denotes the degradation indicator at time . All the possible values of are

contained in a state space . The states in the space can shift to each other

according to a transient matrix . In the state space , there is a special state

called the absorb state which represents the failure state of an asset. The number of

remaining steps required to move from the current state to the absorb state is

the RUL of the asset. The distribution of the RUL can be obtained according to the

transition matrix. The Markov chain degradation model relies on two assumptions.

One is that the inspection interval is regular, and the other is that the future health

state depends only on the current one. Morcous investigated the impact of the two

assumptions on the effectiveness of a degradation model using the Markov chain

(Morcous 2006).

A drawback of the Markov chain is that the transition probability does not relate to

the resident time of the Markov chain at the current state. The semi-Markov model

overcomes this shortcoming by assuming the residing time of a state follows a

specified distribution. Thus, the transition probability depends on the time spent at

the current state. Black et al. described the process of fitting a semi-Markov model

to a degradation dataset (Black et al. 2005). A case study using a degradation dataset

of switchgear oil was conducted in that paper.


The semi-Markov model removes the discrete time assumption of the Markov chain.

However, it still has the discrete state assumption. The discrete state assumption

requires discretisation of continuous asset health states. To avoid errors brought in

by this discretisation, stochastic degradation processes continuous in state have been

also adopted in degradation modelling. Two continuous state stochastic processes

commonly used in degradation modelling are the Wiener process and the Gamma

process. The stochastic process ; 0 is said to be the Wiener process, when

the independent increment follows a Gaussian distribution with

mean and variance , for all , 0 . The and are called the drift

parameter and the diffusion parameter, respectively. Similarly, a continuous

stochastic process ; 0 is the Gamma process when the independent

increment follows a Gamma distribution , .

The increasing function is the shape function, while 0 is the scale

parameter. The normal distribution and the Gamma distribution both belong to the

class of infinitely divisible distributions. Therefore, they are adopted as the

distribution of independent increments of continuous stochastic process (Castanier et

al. 2005).

The Wiener process has been widely used in degradation modelling. Whitmore and

Schenkelberg developed a degradation model using the Wiener process with a time

scale transformation (Whitmore and Schenkelberg 1997). A case study using

accelerated life testing data was conducted in that paper. Whitmore et al. developed

a bivariate Wiener process to deal with the situation when degradation indicators did

not relate to a failure deterministically (Whitmore et al. 1998). By defining a failure

as the first crossing time of the Wiener process to a certain threshold, Lee et al.

assessed the mortality risk of the work environment railway workers (Lee et al.

2004). The most promising property of the Wiener process is its convenient

mathematical characteristics. For the Wiener process with a white noise, an

analytical likelihood function can be obtained (Hashemi et al. 2003). Explicit

expressions are available when stress and strength are assumed to be two


independent Wiener processes (van Noortwijk 2009). However, as a non-

monotonically increasing stochastic process, the Wiener process shows its disability

while describing unrecoverable engineering asset deterioration processes.

Different from the Wiener process, the Gamma process has a monotonically

increasing property. This monotonically increasing property makes the Gamma

process more appropriate to model an irreversible degradation process. In addition,

efficient algorithms have been also developed for the simulation and parameter

estimation of the Gamma process. Due to the monotonically increasing property and

mathematical tractability, the Gamma process has been extensively adopted in

engineering asset degradation modelling. Two case studies about carbon-film

resistors and fatigue crack sizes were carried out using the Gamma process (Park

and Padgett 2005a). Kallen and van Noortwijk used the Gamma process to model

corrosion damage mechanism and imperfect inspections were considered (Kallen

and Van Noortwijk 2005). The random effects among individuals as well as

environment covariates were considered while modelling degradation processes

using the Gamma process (Lawless and Crowder 2004). A comprehensive review

about applications of the Gamma process in degradation modelling was conducted

by Van Noortwijk (van Noortwijk 2009). The parameter estimation and simulation

methods of the Gamma process were summarized in that paper.

Time series models

Time series refers to a sequence of data points obtained at uniform time intervals.

Time series analysis aims to understand the underlying mechanism of an observed

data sequence. Based on the underlying mechanism, useful information can be

obtained and the accurate prediction of the observed sequence can be conducted.

Time series is a special case of stochastic processes. Compared with Markovian

stochastic processes, time series models do not necessarily follow the Markovian

assumption. Subsequently, time series analysis can process more generic

degradation sequence. Some applications of time series models to describe asset


deterioration processes have been conducted (Stavropoulos and Fassois 2000;

Huitian et al. 2001; Lu et al. 2001).

Compared with Markovian stochastic processes, applications of the time series

analysis in degradation modelling are relatively few. A main reason is that, in time

series analysis, a relatively long sequence of data is required for model identification

and prediction. In reality, this kind of long degradation data sequence is not

common. However, degradation data sequence with enough length will be available

for some critical assets with the development of CM technology. Furthermore,

extensions of time series analysis theories (e.g., multivariate time series, intervene of

time series, and non linear time series) will make the modelling of practical

degradation data more feasible.

Multivariate Degradation Processes

In reality, an asset may have multiple failure modes; a failure mode may be revealed

by several degradation indicators. The issue of multiple degradation indicators has

been widely discussed in literatures (Whitmore et al. 1998; Lu et al. 2001; Wang and

Coit 2004; Xu and Zhao 2005). The multiple degradation indicators may be

governed by a single degradation process. Therefore correlations may exist among

these degradation indicators. A degradation model should consider these correlations

among indicators. Alternatively, the inter-dependent degradation indicators should

be transformed into independent indicators by the dimension reduction algorithms.

Principle component analysis (PCA) is a commonly used dimension reduction

technique. The idea of PCA is generating a new set of variables, namely principal

components. Each principal component is a linear combination of the original

variables. All the principal components are orthogonal to each other. Consequently

there is no redundant information. Lin et al. applied PCA to reduce the number of

covariates in the proportional hazard model (PHM) (Lin et al. 2006). Wang and

Zhang adopted the PCA to reduce the dimension of the original data set while doing

oil analysis (Wang and Zhang 2005). The original PCA was extended to dynamic


principle component analysis (DPCA) to deal with autoregressive multivariate

degradation data (Makis et al. 2006).

While the number of observations is tractable, asset health prediction can be also

performed without dimension reduction. Liao et al. dealt with each of multiple

degradation indicators by the nonlinear model fitting separately (Liao et al. 2006b).

However, the correlations among indicators should be taken into consideration when

they are significant. Wang and Coit showed that an incorrect independence

assumption may underestimate system reliability (Wang and Coit 2004). Whitmore

et al. treated the multiple observations as a multivariate Wiener process. The

relationships among observations were modelled by the covariance matrix

(Whitmore et al. 1998). A multivariate time series model was employed to describe

interdependent degradation data by Lu (Lu et al. 2001).

2.1.1.2 Threshold Identification

Once the development process of a degradation indicator is described by a

mathematical model, a well defined threshold is required to indicate the occurrence

of a failure. Various threshold identification approaches have been developed

according to various failure mechanisms, diverse degradation indicator properties,

and different failure data accessibility.

Failures of assets are largely divided into two categories, i.e. soft (degradation)

failure, and hard (catastrophic) failure (Zuo et al. 1999). A soft failure happens when

the performance of a device deteriorates to an unacceptable level. A soft failure does

not cause an immediate breakdown. For a soft failure, the threshold largely depends

on industry standards, expert knowledge, and the result of optimisation. A hard

failure refers to the completely breakdown of an asset. It can happen at any time

during asset operation. However, the probability of a hard failure often relates to the

health state of an asset (Singpurwalla 2006).


The threshold of a hard failure can be also determined by industry standards or

expert knowledge. However, when enough failure records are available, statistical

approaches to identifying the threshold are preferable. Three commonly used

approaches are adopted to identify the relationship between survival probability and

degradation indicators, i.e. PHM, logistic regression, and multiple time-scale

modelling. The PHM is originally used to model the effects of multiple

environmental covariates. However, some researchers also employed the PHM to

identify the distribution of a random failure threshold on multiple indicators (Jiang

and Jardine 2008). The PHM is introduced in Section 2.1.2.1 in detail. The logistic

regression is a standard statistical technique for binomially distributed

response/dependent variables. When used to identify the failure threshold, the

logistic regression can be written as

Pr failure| ∑∑ , (2-1)

where is the regression coefficient vector and is the degradation indicator vector.

Xu and Zhao adopted the logistic regression model to identify the relationship

between degradation indicators and the probability of a catastrophic failure (Xu and

Zhao 2005). The multiple time-scale modelling method identifies a new time scale

that consists of various usages and calendar time. In the new time scale, the lifetime

distribution has the minimum coefficient of variation. In other words, the failure

time is the most predictable in the obtained composite time scale. A failure threshold

can be then set on that composite time scale. Besides calendar time and variety of

usages, the time scale can also include degradation indicators (Jiang and Jardine

2006).

In most situations, an asset degradation process is often revealed by multiple

degradation indicators. Two methods can be adopted to identify a failure threshold

using multiple degradation indicators. The first approach combines multiple

degradation indicators into one composite degradation indicator through the PHM or

the multiple time scale model (Jiang and Jardine 2006; Makis et al. 2006). A

threshold can be set up on the composite degradation indicator. An alternative

approach processes the development of multiple degradation indicators as a


degradation curve in a multi-dimensional space. Several threshold boundaries can be

set in the multi-dimensional space. Each boundary corresponds to a particular failure

mode (Lu et al. 2001; Lee and Whitmore 2006).

Most threshold regression methods require enough failure event data. However,

failure history is not always sufficient in reality. Several approaches have been

proposed to overcome the shortage of failure data. Besides the incorporation of

industry standards and expert knowledge, the statistical process control (SPC) is

another approach to set up a criterion for failure occurrences. SPC is an effective

abnormality detection tool, which has the ability to disclose abnormal behaviours

from the processes. Different from other threshold identification methods, SPC is

not based on the abnormal data. Instead, it infers thresholds and control principles

from the normal data. SPC has been used to detect abnormality from the CM signals.

Goode et al. developed an SPC method to divide the whole operational cycle of the

machine into a stable stage and a failure stage (Goode et al. 2000).

2.1.2 Degradation Models Based on the Hazard Rate Process

Hazard rate, a measure of development of risk, plays a fundamental role in reliability

analysis. Hazard rate is defined as the rate of failure for the survivors during the next

instant of time. In discrete time situation, the hazard rate is given by

Δ Δ · , (2-2)

where is the survival function. For continuous time situation, the hazard rate is

written as

lim Δ Δ · ⁄ , (2-3)

where is the PDF of the lifetime. In this review, the hazard rate is continuous in

time without special statement. When the hazard rate function is identified, the

CDF of the failure time can be calculated as

1 exp . (2-4)

Therefore, the life time distribution can be calculated if the hazard rate function is

obtained. Some degradation models assume that the hazard rate is a function of


degradation indicators or environment covariates. Consequently, the lifetime

distribution is related to values of future degradation indicators or environment

covariates. These degradation models that are based on the hazard rate process

consist of two components: (1) the relationships between the covariates and the

hazard rates, (2) the degradation processes of degradation indicators and the change

of environment covariates.

2.1.2.1 Covariate-Hazard Relationship Modelling

The PHM proposed by Cox (Cox 1972) is a commonly used approach to model the

relationship between covariates and hazard rates. The formulation of PHM is given

by

| exp , (2-5)

where is the baseline hazard rate at time , the is the covariates vector at

time , and the is the corresponding regression coefficient vector. Parameter

estimation and fit in goodness test approaches for the PHM with time independent

covariates can be obtained from (Prasad and Rao 2002). Liao et al. introduced a

parameter estimation method for the PHM with time dependent covariates (Liao et

al. 2006b). However, sufficient failure event data required by existing parameter

estimation methods are sometimes not available in reality. To overcome the shortage

of data, an approach to incorporate expert knowledge into parameter estimation was

proposed (Zuashkiani et al. 2006). An important assumption of the PHM is the

effects of covariates are time independent. This time independent assumption of

PHM is not always true. Kumar and Westberg provided a method to convert a time

dependent covariate to several time independent covariates which can be processed

by the PHM (Kumar and Westberg 1996).

An alternative way to model the relationship between covariates and a hazard rate

processes is the additive hazard model (AHM). In contrast to the multiplicative form

in (2-5), the formulation of the AHM given by

| (2-6)


follows an additive form. The AHM does not have the proportional hazard rate

assumption. Therefore, the AHM is more flexible than the PHM. For some

applications, the AHM has more plausible performance than the PHM (Lin and Ying

1994). In the original AHM, the regression vector is time dependent. This time

dependent property makes the AHM more flexible. However, the number of

covariates is limited due to the complexity of the parameter estimation. Lin and Ying

treated the regression vector as time independent to simplify the AHM.

Subsequently, they could estimate the model parameters using a partial likelihood

function similar to the PHM (Lin and Ying 1994). To strike a balance between the

flexibility and mathematical tractability, McKeague and Sasieni proposed a partly

parametric additive risk model(McKeague and Sasieni 1994). The partly parametric

additive risk model assumed that only a part of elements in regression vector were

time dependent.

Some research developed a hybrid model by combining the PHM and the AHM:

| , exp . (2-7)

Lin and Ying investigated the additive-multiplicative hazard model and proposed a

class of efficient parameter estimation method (Lin and Ying 1995). Based on the

hybrid model proposed by Lin and Ying, Torben and Thomas treated the regression

coefficients and as time dependent variables (Torben and Thomas 2002). The

additive-multiplicative hazard model was employed to investigate the mortality from

cancer (Kravdal 1997). A transformed hazard model was proposed as a unified

formulation of the additive, multiplicative and hybrid hazard model (Zeng et al.

2005).

2.1.2.2 Degradation Indicator and Environmental Covariate

Modelling

After the relationship between covariates and hazard rate is established, the hazard

rate process can be induced from the processes of degradation indicators and

environmental covariates. In some applications, properties of an asset that can affect


the failure time (e.g., material of an component) and some environmental covariates

are time independent (Prasad and Rao 2002). For these time independent covariates,

the hazard rate process is calculated as a deterministic function of time, and the

lifetime distribution can be obtained straightforwardly according to Equation (2-4).

For degradation indicators or dynamic environmental covariates, deterministic

functions of time were used as approximations in some applications. Liao et al.

approximated indicator degradation processes as polynomial functions of time (Liao

et al. 2006b). Then, a deterministic hazard rate process was calculated according to

these indicator functions and the PHM. In reality, a deterministic function of time is

often not flexible enough for stochastic development processes of degradation

indicators and environmental covariates. Banjevic and Jardine employed a

continuous time discrete state Markov process to model degradation indicators that

are extracted from the results of oil analysis (Banjevic and Jardine 2006). Makis et

al. adopted the same approach as Banjevic (Makis et al. 2006). However, the

dynamic principal component analysis (DPCA) was performed before the PHM

model was applied to reduce the size of the degradation indicator vector. Some

stochastic processes continuous in time and state are also used to model degradation

indicators and dynamic environmental covariates. However, to facilitate the

calculation of the survival function, some assumptions about degradation indicator

(environmental covariate) processes and the relationship between the hazard rate and

degradation indicators (environmental covariates) are often made. Yashin et al.

assumed that degradation indicator (environmental covariate) follow a Wiener

process and the hazard rate was a function of the covariates (Yashin and Manton

1997; Yashin et al. 2007). The survival function was then calculated according to

Cameron-Martin approach.

2.1.3 State Space Degradation Models

In a state space model, the dynamic characteristics of a system are modelled by a

system state process. A general formulation of a discrete time state space model is

given by

Γ (2-8)


and

(2-9)

(Garcia Marquez et al. 2007), where is the system state at time , is the input of

the system at time , and is the observation of the system at time . System

disturbance and measurement noise are denoted by and respectively. The

corresponding coefficients, i.e., , Γ, , , , and , that can be time-dependent

are determined by the system characteristics. Equation (2-8) is called the state

equation which describes the evolution of system states. Equation (2-9) is the

observation equation that addresses the relationships between observations and

system states. In the original state space model, the observations are

conditionally independent from each other given the underlying system states .

However, some extended state space models bring in direct relationships between

observations, e.g., autoregressive hidden Markov model (HMM) (Logan and

Robinson 1997).

When the state space model is used to describe the degradation of an asset, the

underlying asset degradation process is modelled by the system equation, and the

underlying degradation process is partially revealed by observations (i.e.,

degradation indicators). Compared with other degradation models discussed above,

the state space model considers both stochastic asset degradation processes and

uncertain relationships between degradation indicators and health states. Therefore

the state space model can process partially observable degradation process more

efficiently and no additional mathematical model for time-dependent degradation

indicators is needed. Moreover, the state space model is an effective tool for

indicator fusion. Compared with commonly used multivariate statistical approaches

and multivariate time series analysis methods, the state space model can analyse

degradation indicators with uneven sampling intervals.

When system processes are discrete in time and state, a typical example of the state

space model is the HMM. The system state process of the HMM is a Markov chain

which is not observable. This hidden Markov chain is revealed by observations


probabilistically. The HMM, as a powerful pattern recognition tool, has been widely

used in engineering asset diagnosis. Miao adopted modulus maxima as a defect

feature. Further, in order to provide decision information for CBM, a two-stage

HMM-based classification system is presented using the feature extracted from

wavelet modulus maxima (Miao 2005). Li et al. obtained defected feature vectors by

the FFT, wavelet transform and bispectrum from the speed-up and speed-down

process in rotating machinery. After that, HMMs have been employed as the

classifiers to recognise faults (Li and Pham 2005). Ge et al. used a number of

autoregressive models to describe monitoring signals in different time periods of a

stamping operation and used the residues as the features. Then, a HMM was

introduced for classification (Ge et al. 2004).

Some research also employed the HMM to model degradation processes of assets.

When the HMM is used to model a degradation process, asset health states are

described by an unobservable Markov chain. An asset suffers a failure when the

underlying Markov chain reaches an absolved state that represents a failure state.

The underlying Markov chain is revealed by observations probabilistically. The

HMM is an appropriate tool to combine information from inspections with event

data (Jardine et al. 2006). A HMM has been constructed by Bunks, McCarthy et al.

The state probability densities and state transition probabilities were modelled.

Sixty-eight states due to different torque levels and defect types were used in the

model (Bunks et al. 2000). Wang modelled partially observable asset health states as

a three-state (i.e. good, defective, and failed) Markov chain. The delay time concept,

HMM and filtering theory were combined to form a prognosis model (Wang 2006).

The HMM was extended by adopting a continuous time discrete state Markov

process as the latent system process (Makis and Jiang 2003).

HMM assumes asset health states to be discrete. However, most engineering assets

degrade continuously. Therefore state space models continuous in state are often

more appropriate for engineering assets. Christer et al. developed a continuous state

discrete time state space model to estimate and predict the erosion status of a furnace

through its conductance ratios (Christer et al. 1997). Recently, Wang proposed a


new state space model by assuming increments of underlying health states follow a

Beta distribution (Wang 2007). Subsequently, Wang’s new model had a monotonic

increasing underlying degradation process that was more similar to irreversible

engineering asset wear processes. However, both of the two models developed by

Christer and Wang were discrete in time, which assumed that inspections and

failures can only happen at discrete time points with regular intervals. On the other

hand, irregular inspection intervals are often more cost-effective and failures always

happen between these discrete inspections.

Some state space degradation models continuous in time and state have been also

proposed. However, these sate space degradation models largely follows linear and

Gaussian assumptions. In linear and Gaussian state space model, both the state

equation and the observation equation follow a linear formulation and a Gaussian

random component. Wang et al. developed a state space model to predict the RUL of

bearings using root mean square (RMS) values of vibration signals (Wang 2002).

Wang’s model used values of RUL as underlying health states. This deterministic

underlying degradation process did not consider stochastic heterogeneous

degradation processes of different individuals. Whitmore et al. proposed a bivariate

Wiener process (Whitmore et al. 1998) to model a partially revealed degradation

process. The bivariate Wiener process is also a special type state space model.

However, the bivariate Wiener process only considered the covariates collected at

failure or censoring times, while degradation indicators at other occasions were

ignored. Hashemi et al. formulated a joint model of a counting process and a

sequence of longitudinal measurements, using the state space model based on the

Wiener process (Hashemi et al. 2003). Proust extended the model of Hashemi to a

nonlinear situation (Proust et al. 2006). The linear and Gaussian property can

provide convenience in mathematical operations. However, the degradation process

of a direct indicator and the relationships between direct and indirect indictors are

not necessarily linear. Moreover, the Gaussian assumption renders a degradation

process to be non-monotonically increasing. In contrast, most degradation processes

of direct indicators (e.g. wear, corrosion, and crack depth growth) are not reversible.


The nonlinear non-Gaussian state space models have also found its applications in

degradation modelling. Cadini et al. modelled a fatigue crack degradation process

that followed Paris–Erdogan model and was under non-destructive ultrasonic

inspection by a state space model. Particle based method was used to estimating the

failure probability. A optimal preventive replacement strategy was also developed

(Cadini et al. 2009). Orchard et al. predicted degradation indicators using the particle

filter with a feedback correction loop that could improve solution accuracy and

reduce uncertainty bounds (Orchard et al. 2009). However, these applications of

nonlinear non-Gaussian state space models largely adopt physical-based approach

and assume that model parameters are known.

2.1.4 Comments

Degradation modelling has been investigated intensively. However, two practical

issues are only partially addressed by existing research. The first issue is identifying

uncertain failure thresholds of degradation indicators. In reality, an identical

indicator of different individuals may have diverse values when a failure happens.

Therefore, setting a fixed failure threshold on this kind of indicators is not

appropriate. The second issue is fusing multiple degradation indicators extracted

from condition monitoring data. A failure mechanism is often revealed by more than

one degradation indicators, and information from these degradation indicators

should be fused properly. The state space model can solve these two issues

effectively. However, existing state space degradation models largely depend on

assumptions such as, discrete time, discrete state, linearity, and Gaussianity. The

discrete time assumption requires inspections and failures can only happen at

discretised time points, which is not realistic. The discrete state assumption entails

discretising continuous degradation states, which often introduces additional errors.

The linear and Gaussian assumptions are not consistent with nonlinear and

irreversible degradation processes in most engineering assets. Therefore, the

application of nonlinear non-Gaussian state space model is expected to model asset

degradation process more effectively.


2.2 Condition-based Maintenance Decision-Making

Most engineering assets experience deterioration with age and usage. During a

degradation process, the running cost of an asset increases while asset capability

decreases. When the degradation process crosses a failure threshold, a failure will

take place. As a result, additional expenditure for the unexpected breakdown is

incurred. To enhance asset capability and reduce cost, maintenance activities are

entailed in the procedure of asset operation. Maintenance is defined as actions to

“control the deterioration process leading to failure of a system” and “restore the

system to its operational state through corrective actions after a failure” (Blischke

and Murthy 2000). In the lights of this definition, the maintenance can be

categorised into preventive maintenance and corrective maintenance. Preventive

maintenance is adopted to control a deterioration process while corrective

maintenance is carried out to bring a failed asset back to a working state.

The preventive maintenance can be divided into three categories, i.e. design-out

maintenance, time-based maintenance, and condition-based maintenance (CBM)

(Blischke and Murthy 2000). The design-out maintenance refers to carrying out

optimisation during the design stage of a component. The time-based maintenance

can be further classified into three subcategories, i.e. clock-based maintenance, age-

based maintenance, and usage-based maintenance. The clock-based maintenance is

carried out at specified times. The age-based maintenance is performed at certain

age of a component. The usage-based maintenance, on the other hand, is scheduled

based on the usage of a component. When the condition monitoring information of a

component is available, the CBM is preferable. A major advantage of the CBM is

that unnecessary maintenance when an asset is in a good health state can be avoided

(Zhou 2007). This literature review focuses on the research of CBM strategies.

To obtain an optimal CBM strategy, several issues are to be addressed. The first is

the underlying degradation models that have been discussed in Section 2.1. Under

the CBM scheme, a degradation process is assumed to be revealed by the

information collected during inspections. Inspection scheduling is discussed in


Section 2.2.1. In Section 2.2.2, different objectives of maintenance strategy

optimisation are introduced. To achieve these maintenance optimisation objectives,

various optimisation algorithms have been developed. These optimisation algorithms

are reviewed in Section 2.2.3. Additional problems brought in by imperfect

inspections are discussed in Section 2.2.4.

2.2.1 Inspection Scheduling

Asset health inspection is a fundamental approach to acquiring information for CBM

decision-making. Similar to other maintenance activities, an inspection can incur

additional costs. Some inspection methods even insist on the shutdown of an asset.

Therefore, inspections should be well scheduled to reduce cost and enhance asset

availability.

Asset health inspections can be carried out continuously or only at discrete time

points. In practice, continuous condition monitoring is often technically or

economically impossible. Therefore, most current CBM methods adopt discrete

inspections. Wang classified the inspection interval of discrete inspections into three

categories: regular inspection intervals, inspection intervals with limited number of

different length, and inspection intervals with arbitrary length (Wang et al. 2000).

For example, Amari and McLaughlin employed a maintenance strategy with regular

inspection intervals (Amari and McLaughlin 2004). Grall et al. assumed that

inspection intervals of different length could be chosen from according to the

current health state (Grall et al. 2002). Identification of the next inspection epoch

was regarded as an optimisation problem given the past inspection information in the

research of Christer and Wang, and arbitrary length of inspection intervals could be

used (Christer and Wang 1995).

Some research about CBM strategies with continuous inspections is also available.

Marseguerra et al. considered a continuously monitored multi-component system

and used a Genetic Algorithm (GA) to determine the optimal CBM policy

(Marseguerra et al. 2002). A condition-based maintenance model was developed for


degradation processes that follow the Gamma process and are under continuous

monitoring (Liao et al. 2006a). Barata et al. modelled a continuously monitored

system through a Monte Carlo simulation method (Barata et al. 2002).

2.2.2 CBM Optimisation Objectives

A maintenance strategy is optimised according to a single or multiple objectives.

These objectives relate to the property of an engineering asset and its functions in an

enterprise. Most research in maintenance optimisation focuses on two aspects, i.e.

cost, and availability.

Cost is the most commonly used criterion to optimise maintenance strategies

(Hontelez et al. 1996; Barata et al. 2002). Most maintenance activities, e.g., health

inspections, preventive maintenance, and corrective maintenance, incur a certain

amount of costs. In addition, a failure usually causes an additional cost. Maintenance

costs can be state-independent (Wang et al. 2000), predetermined functions of health

states (Moustafa et al. 2004), or random variables that depend on health states (Chen

and Trivedi 2005). Health states can also affect operating costs (Moustafa et al.

2004) and production profits (Wang 2009). The cost objectives can be evaluated as

the expected cost per unit time over an infinite or finite horizon.

Cost is an effective criterion for maintenance decision-making. However, costs of

maintenance activities or an unexpected breakdown of an asset are often difficult to

evaluate. On the other hand, the down time of an asset can be often measured

accurately. In these situations, the availability becomes a more reasonable principle

to optimise maintenance strategies. The availability is the proportion of the time for

which a machine is available for use. The formulation of the availability is given by

Availability = Up Time / (Up Time + Down Time), (2-10)

where the up time is the time for which equipment is operable and the down time

refers to inoperable time. Amari and McLaughlin illustrated algorithms to find the

optimal model parameters that maximise the system availability (Amari and


McLaughlin 2004). Condition-based availability limit policy was developed for a

continuously monitored degrading system (Liao et al. 2006a).

Some maintenance strategy optimisation methods consider multiple objectives.

Marseguerra et al. performed maintenance decision-making as a multi-objective

search according to both the profit and availability of a multi-component system

(Marseguerra et al. 2002). Bris et al. optimised maintenance strategies according to

the cost under the constraint of availability (Bris et al. 2003). Munõz et al.

considered two objective functions (i.e. cost and risk) when optimising maintenance

strategies (Munõz et al. 1997). A constraint was set on one of the objective functions

while the other objective function was adopted to optimise the maintenance strategy.

Constraints were also set on the values of variables in the objective functions.

From the prospective of the entire business process, a maintenance strategy should

incorporate with other related strategies of enterprise management (e.g. the spare

inventory strategy, and the configuration of manufacturing system). Zhou performed

a joint optimisation of maintenance scheduling and the production dispatching in a

complex multi-product manufacturing system (Zhou 2007). The optimisation of

spare part inventory and maintenance policy was carried out simultaneously by Ilgin

and Tunali (Ilgin and Tunali 2007).

2.2.3 CBM Optimisation Methods

The objective function of maintenance decision-making can be established

according to a property of the renewal reward process (Ross 1996). The renewal

reward process can be defined as pairs , ; 1,2, … , , where with

distribution are the interval length of a renewal process ; 0 and are

the reward earned during the renewals. The rewards are independent and

identically distributed. The total reward up to time is given by ∑ . If

the expected length of a renewal interval and the expected reward per interval satisfy


∞ and ∞ , the long-run expected reward per unit time can be

obtained as

Lim ⁄ ⁄ . (2-11)

The asset life between two maintenance activities (e.g., preventive or corrective

replacement) that bring the asset to an as good as new state can be modelled as a

renewal interval. During the interval, a certain cost can be incurred. Therefore, a

degradation process under maintenance can be modelled as a renewal reward

process, and the long-run expected cost per unit time can be calculated as Equation

(2-11).

When the average cost per unit time can be calculated efficiently by Equation (2-11),

the optimal maintenance strategy can be obtained through directly searching the

strategy space. Park optimised the preventive replacement threshold and the

inspection interval for a Gamma degradation process using the renewal reward

process (Park 1988). Crowder and Lawless developed a maintenance strategy for an

asset whose degradation process follows a Gamma process with random with a

random effect that controled heterogeneity across units (Crowder and Lawless

2007). The expected cost per unit time was calculated according to Equation (2-11).

Grall et al. proposed a multi-level control limit strategy for a continuous degradation

process (Grall et al. 2002). The analytical formulation of the long-run expected cost

per unit time was established using the property of the renewal reward process.

The renewal reward process is a useful tool to optimise maintenance strategies.

However, in some applications, the number of variables to optimise is large and

some constraints may be applied to maintenance strategies. In these applications, it

is difficult to obtain the optimal maintenance strategy by directly searching the

strategy space. Artificial intelligence algorithms are often used to optimise

maintenance strategies in these situations. Marseguerra et al. used the genetic

algorithm (GA) to optimise the maintenance strategy regarding both availability and

profits (Marseguerra et al. 2002). Ilgin and Tunali used the GA to optimise the

preventive maintenance policy and spare provision simultaneously (Ilgin and Tunali


2007). The GA and hybrid GA/simulated annealing (SA) techniques were compared

when maintenance scheduling was optimised (Mohanta et al. 2007).

To calculate the expected cost per unit time using Equation (2-11), structure

characteristics of the optimal maintenance strategy (e.g., control limit theory) should

be identified first. In some situations, the structure of the optimal maintenance

strategy may be different from that predetermined subjectively. For example,

Moustafa et al. showed that the optimal maintenance strategy for the degradation

system that was discussed in their paper did not necessary follow the control limit

theory (Moustafa et al. 2004). Maillart also demonstrated counterintuitive structural

properties of the optimal maintenance strategy for a Markovian deterioration system

with obvious failures (Maillart 2006). When multiple maintenance activities can be

chosen from, it becomes even more difficult to identify the maintenance structure.

The Markov decision process (MDP) and its extensions are often used to investigate

the optimal structure of a maintenance strategy. Makis and Jardine used the MDP to

model the maintenance decision process for an asset that follows PHM, and the

condition for the effectiveness of the control limit theory on a hazard rate process

was derived (Makis and Jardine 1992). The structure characteristics of the optimal

maintenance strategy for a classic two-state production process were investigated by

several papers using the partially observable Markov decision process (POMDP)

that is an extension of MDP (Ross 1971; Wang 1976; White 1978; White 1979;

Grosfeld-Nir 2007). Hopp and Kuo investigated the structure of the optimal

maintenance strategy of partially observable aircraft engine components using the

POMDP (Hopp and Kuo 1998).

Besides the structure property of an optimal maintenance strategy, the parameters of

an optimal maintenance strategy can be also derived by the MDP and its extensions.

Chen and Trivedi performed joint optimisation of inspection rate and maintenance

activities using the semi-Markov decision process (SMDP) (Chen and Trivedi 2005).

Amari et al. developed a maintenance strategy optimisation method that could be

applied to a wide range of stochastic deterioration processes (Amari et al. 2006).

Chan and Asgarpoor developed a maintenance optimisation method that considered


both random failures and failures due to deterioration using the MDP (Chan and

Asgarpoor 2006). Moustafa et al. investigated maintenance strategies for a multi-

state semi-Markov deterioration process that involved multiple maintenance actions

(Moustafa et al. 2004). Both the control limit theory and the policy iteration for the

SMDP were applied to derive the optimal maintenance strategies. The results

showed that more cost-effective maintenance strategies could be developed by the

SMDP.

2.2.4 Imperfect inspections

In some situations, asset health states cannot be acquired deterministically by

inspections. Simply ignoring the uncertainty of asset health state estimates can cause

excessive false alarms or breakdowns without pre-alarms.

A commonly used approach to model the uncertain relationship between asset health

states and degradation observations is the PHM. Some research has been conducted

to optimise the maintenance strategy of an asset whose degradation process is

described by the PHM. Makis and Jardine used a backward recursion algorithm to

obtain the optimal maintenance strategy for a degradation process that followed the

PHM (Makis and Jardine 1992). Some applications and extensions of Makis’

method have been also published (Vlok et al. 2002; Lin et al. 2006; Ghasemi et al.

2008). Kumar and West used a total time on test (TTT) plot based on the PHM to

estimate the optimal maintenance time interval and threshold values for monitored

variables (Kumar and Westberg 1997). Kobbacy et al. proposed a full history PHM

for preventive maintenance scheduling, which considered multiple maintenance

cycles. However, Kobbacy’s method assumed that covariates were time independent

between two maintenance activities (Kobbacy et al. 1997).

An alternative approach to describing a partially observable degradation process is

the state space model. Compared with the PHM that estimates a health state only

based on the information acquired from the latest inspection, the state space model

uses the degradation observations up to the current time to identify a health state.


After the effects and costs of maintenance activities are considered, the state space

degradation model becomes a POMDP. Existing research on maintenance strategy

optimisation using the POMDP can be largely divided into two types. The first type

investigates the structure property of the optimal maintenance strategy. Monahan

(Monahan 1982) reviewed some early papers (Ross 1971; Wang 1976; White 1978;

White 1979) investigating maintenance strategies for a classic two-state production

process. Optimal maintenance strategy structures under different assumptions were

identified using the POMDP. More recent research by Grosfeld-Nir investigated the

two-state production process again using the POMDP and obtained a weaker

condition for the optimal policy to be of a control limit type (Grosfeld-Nir 2007).

The second type of research further identifies the optimal maintenance strategies by

solving POMDPs. Ghasemi et al. derived the optimal condition based maintenance

policy with regular maintenance intervals for a discrete state degradation process

using the POMDP (Ghasemi et al. 2008). Maillart optimised the inspection intervals

and maintenance actions at different health state using the POMDP (Maillart 2006).

Both perfect and imperfect inspections were considered. However, Maillart assumed

that the degradation process was discrete in time and state.

2.2.5 Comments

The CBM has been investigated comprehensively. However, most research assumed

that inspections can reveal health states completely. Existing approaches to

optimising maintenance strategies for partially observable degradation process are

still not enough. These maintenance optimisation approaches largely assume discrete

time and states. On the other hand, most practical asset degradation processes are

continuous in state. Moreover, failures and maintenance activities do not only

happen at discrete time epochs with regular intervals. Therefore, maintenance

strategy optimisation methods for partially observable degradation processes

continuous in time and state need more investigations.


2.3 Solving Algorithms for Nonlinear Non-Gaussian State

Space Models

This research investigates algorithms for asset life prediction and maintenance

strategy optimisation using the Gamma-based state space model. The Gamma-based

state space model does not have the linear and Gaussian assumptions. Therefore,

existing exact solving algorithms (e.g., Kalman filter) are not effective for the

Gamma-based state space model. This section reviews commonly used approximate

solving algorithms for nonlinear non-Gaussian state space models. Three types

solving algorithms are encountered during asset life prediction and maintenance

strategy optimisation. The first type is basic inference algorithms that estimate

distributions of underlying system states using observations. These inference

algorithms can be conducted recursively using Bayesian theory. The second type

solving algorithm is parameter estimation algorithms for the state space model. The

last type of solving algorithms addresses the control of the state space model. In

practice, the change of states can bring in certain costs or rewards, e.g., in

degradation modelling when the change of underlying health states indicates a

failure the costs for a breakdown and corrective maintenance are induced. Control

algorithms optimise the actions that can affect the state transition to minimise the

costs or maximise the rewards. The three types of solving algorithms for the

nonlinear non-Gaussian state space model are reviewed in following sections.

2.3.1 Basic Inference Algorithms

Two basic inference algorithms for the state space model are used in this research,

i.e., filtering and smoothing.

2.3.1.1 Filtering

The filtering algorithm estimates the present system state using the observations up

to the current time. For a state space model continuous in state, the filtering can be

performed recursively as


| : | | | : , (2-12)

where and denote the system state and the observation at the th inspection. In

this research, the filtering algorithm is to estimate the present health state given

degradation indicators up to the current time. In addition, the filtering algorithm is

the basis of other solving algorithm for the state space model.

For state space models with the linear and Gaussian assumptions, the Kalman filter

is used to estimate system states analytically. In a linear and Gaussian state space

model, if the filtering result at the th inspection, i.e., | : , follows the

Gaussian distribution, the next filtering result | : will also follow the

Gaussian distribution. Therefore, if the initial state is known or follows the Gaussian

distribution, the following state estimates will all follow the Gaussian distribution. A

Gaussian distribution can be represented by two variables, i.e., the mean value ,

and the variance . Consequently, the filtering algorithms essentially calculate

mean values and variance values of system states at different time steps. The

Kalman filter provides an approach to calculate the mean value and the variance

value recursively using , , and the current observation . For the

derivation of the Kalman filter, readers can refer to (Yu et al. 2004).

For nonlinear non-Gaussian state space models the filtering result | : does

not necessary follow some particular parametric distribution. Therefore, the

distribution | : cannot be represented exactly by a fixed number of

parameters, and some approximate filtering algorithms are required to process non-

linear non-Gaussian state space models. A commonly used approximate filtering

algorithm is the extended Kalman filter (EKF). The EKF performs local linearization

of the state equation and the observation equation by derivatives. After the

linearization, the original state space model is approximated as a linear and Gaussian

state space model and the Kalman filter can be adopted. However, when the state

equation and the observation equation are highly non-linear, derivatives do not

obtain satisfactory approximate results. To improve the performance of EKF,

another approximate filtering algorithm named the unscented Kalman filter (UKF)


was developed (Julier and Uhlmann 1997). The UKF uses a deterministic sampling

technique, i.e., the unscented transform, to pick a minimal set of sample points (i.e.,

sigma points). These sigma points are processed by the state equation and the

observation equation. After that, the mean value and the variance value are

recovered according to these sigma points.

Another filtering algorithm for the non-linear non-Gaussian state space model is the

particle filter. Different from the UKF, the particle filter represents the distribution

of a system state by a large amount of random numbers instead of several

deterministic sampling points. In addition, the result of the particle filter is not fitted

by a Gaussian distribution. Therefore, the particle filter can obtain more accurate

estimation results than the UKF at the expense of lower efficiency. Due to the

enhancement of computational power, the particle filter is becoming prevalent in

processing a nonlinear non-Gaussian state space model.

The process of the particle filter follows the principle of importance sampling: a

function · is assumed to be the PDF of a distribution difficult to draw samples

from directly. The values of function · are proportional to those of · . An

“importance density” · that can generate random numbers easily is selected to

generate a certain number of particles ~ 1,2, … , , where is the index

of a particle, and is the number of particles. Following on, the distribution · is

represented approximately as

∑ , (2-13)

where · is the Dirac delta measure given by

0, 1,

, (2-14)

and the weight of the th particle is calculated according to

. (2-15)

For the particle filter, Equation (2-15) can be written as


: | : : | : , (2-16)

and the weights : ; 1,2, … , can be calculated recursively

according to

· | | | , . (2-17)

For the derivation of Equations (2-17) from (2-16), readers can refer to

(Arulampalam et al. 2002b). After : are worked out, the posterior distribution of

the system state at the th inspection is approximated as

| : ∑ 1,2, … , . (2-18)

A problem of the particle filter is degeneracy. After several time steps, the weight of

one particle may have a dominant value, while the weights of the other particles tend

to be zero. In this situation, most computation efforts are spent to the particles whose

effects on filtering results are ignorable. The degeneracy is caused by the difference

between the importance density | , and the posterior density | : .

Doucet proofed that the variance of the importance weights : increases with the

time index (Doucet et al. 2000). The degeneracy can be alleviated by adopting an

importance density close to the posterior density and can be overcome by resampling

the particles.

Arulampalam summarised the approaches to obtain an approximate posterior density

during particle filtering (Arulampalam et al. 2002b). Doucet et al. proposed a local

linearization method to obtain an importance density that is close to the posterior

density. This local linearization method is similar to the EKF (Doucet et al. 2000).

The Unscented Particle Filter (Van Der Merwe et al. 2000), which obtains the

importance density using the unscented particle filter. These methods can improve

the filtering results when the posterior density | : is significantly different

from the prior density | : . However, the addition approximation step makes

the filtering algorithm more computational expensive. Therefore, when the posterior

density and the prior density do not have significant difference, increasing the

number of particles is a more efficient way to improve the filtering result.


Adopting an importance density close to the posterior density can only alleviate the

degeneracy. The variance of the importance weights : still increases over time.

Therefore, resampling particles according to their weights is often indispensable in

particle filtering methods. However, the resampling brings in another problem. After

resampling, a particle with a large weight can repeat for many times and the

diversity among the particles is lost. This phenomenon is named sample

impoverishment. A filtering result that suffers from a severe sample impoverishment

is a poor representation of the posterior density. Some methods to solve the sample

impoverishment were discussed in (Arulampalam et al. 2002a). Similar to the

algorithms for approximating posterior densities, these algorithms that reduce the

sample impoverishment also require additional computational efforts.

Both the significant difference between the importance density and the posterior

density and sample impoverishment are caused by small observation noise.

However, in degradation modelling, the observation noise is often considerable

otherwise the threshold crossing model is more appropriate than the state space

degradation model. Therefore, this paper does not consider the approximation of the

posterior density during importance sampling. The Sampling Importance

Resampling (SIR) filter, one of the most commonly used particle filters, is adopted

in this paper. The SIR filter chooses the prior density as the importance density,

i.e. | , | , and particles are resampled in every time step.

During the resampling, random numbers : ; 1,2, … , are

sampled from : ; 1,2, … , according to weights : . When the

observation noise is not much smaller than the process noise, the SIR filter is an

efficient and effective approach to estimate underlying system states of a state space

model.


2.3.1.2 Smoothing

Filtering algorithms only use the observations up to the current time when estimating

the posterior distribution of a system state. Smoothing algorithms, on the other hand,

use the entire sequence of observations to estimate the distribution of a system state,

i.e. | : . Therefore, smoothing algorithms can obtain more accurate and

robust estimates of underlying system states.

Two types of particle smoothing algorithms are commonly used, i.e. the forward-

backward smoother, and the two-filter smoother. The two methods are both based on

the result of particle filtering, i.e., :: ; 1,2, … , 1,2, … , . During

the forward-backward smoothing, the filtering result at time step is adopted as the

smoothing result at that time, i.e. : : . Then the weights are calculated

recursively from the end to the beginning according to

| ∑ 1|

∑ 1 11 1,2, … , 1,2, … , 1, (2-19)

where :: ; 1,2, … , 1,2, … , are the smoothing results. After

that, the smoothing result at the th step are resampled from the filtering result :

according to the weights | . The idea of the two-filter smoother can be

demonstrated as

| : | : , :| : : | : ,

: | :

| :

: |

. (2-20)

For details of the two-filter smoother, readers can refer to (Klaas et al. 2006). A

drawback of the two smoothing algorithms is that the system states at different time

steps are estimated independently. In many applications the joint distribution of

system states at different time steps is required. For example in this research joint

distributions of two adjacent system states are required duration parameter

estimation.


To address this drawback of particle smoothing, Godsill et al. (Godsill et al. 2004)

proposed the particle smoother using backward simulation algorithm. This algorithm

can obtain the joint distribution of the whole sequence of system states given the

entire sequence of observations. Therefore, this smoothing algorithm was widely

adopted in the parameter estimation of non-Gaussian non-linear state space models

(Gibson and Ninness 2005; Kim 2005; Schön et al. 2006). The particle smoother

using the backward simulation algorithm is based on the results of the particle filter,

i.e. :: . The recursive algorithm to calculate the weights of the smoothing

particles is given by

|, 1|

∑ 1 1

1,2, … , 1,2, … , 1, (2-21)

where |, denotes the weight of the th filtering particle at the th time step

corresponding to the th smoothing particle at the 1 th time step. A random

number is then resampled from : according to the weights |, :

|, ; 1,2, … , .

2.3.2 Parameter Estimation Algorism

The main difficulty in parameter estimation for a state space model is that the

underlying system states are not observable. Therefore, the complete likelihood

function cannot be directly used to estimate the parameters. Instead, the parameters

are obtained by maximising the marginal likelihood function of the observations.

This marginal likelihood function entails integration. For a linear Gaussian state

space model, the marginal likelihood function can be calculated analytically

(Christer et al. 1997). On the contrary, for a nonlinear non-Gaussian state space

model, the closed-form of this marginal likelihood function is not available.

Consequently, Monte Carlo-based methods are often used in parameter estimation.

Three different types of parameter estimation algorithms are often used for nonlinear

non-Gaussian state space model: i.e., Gradient-based methods, EM algorithms, and

Markov chain Monte Carlo (MCMC) algorithms.


2.3.2.1 Gradient-based Methods

The gradient of the marginal likelihood function of a nonlinear non-Gaussian state

space model can be evaluated by sequential Monte Carlo methods. These sequential

Monte Carlo methods were reviewed in (Andrieu et al. 2004). Given these

calculation methods, the marginal likelihood function can be maximised through

gradient-based methods. Doucet and Tadić developed gradient-based parameter

estimation methods for nonlinear non-Gaussian state space models (Doucet and

Tadić 2003). The parameter estimation method developed by Doucet can be

performed recursively both in batch or online.

For a nonlinear non-Gaussian state space model, the marginal likelihood function

and its gradients are approximated by particle-based method. Consequently, large

number of local maxima exists in the values of the marginal likelihood function.

Schön shows that the Gradient-based methods can be easily converged to these local

maxima (Schön et al. 2006).

2.3.2.2 Expectation-maximization (EM) Algorithms

The EM algorithm is an extension of the maximum likelihood estimation (MLE)

method to deal with a model with unobservable variables. The EM algorithm was

first proposed by Dempster et al. (Dempster et al. 1977). After that, Wu investigated

the convergence property of the EM algorithm (Wu 1983). The EM algorithm

consists of two steps, i.e., the Expectation (E) step and the Maximization (M) step.

The E step estimates the expected complete likelihood function given a set of

parameters. In the M step, a new set of parameters is obtained by maximising the

expected complete likelihood function that is established in the E step. These new

parameters are again used in the E step. The E-M iteration continuous until a

convergence condition is satisfied. The EM algorithm uses the expectation of the

complete likelihood function instead of the marginal likelihood function. In most

situations, the complete likelihood function can be estimated and evaluated more


easily. Therefore, the EM algorithm is often used to estimate the parameters of a

model with unobservable variables. When the EM algorithm is used to estimate the

parameters of a state space model, the gradient of the marginal likelihood function is

not required. Furthermore, the EM algorithm is more robust against attraction to

local maxima than gradient-based methods (Gibson and Ninness 2005).

Some research has been performed to estimate the parameters of nonlinear non-

Gaussian state space models through the EM algorithm. Schön et al. developed an

EM algorithm based on the particle smoother for a state space model with Gaussian

noise (Schön et al. 2006). In their research, a numerical experiment was also carried

out to demonstrate the robust of the EM algorithm against local maxima. Wills et al.

developed an EM algorithm based on the particle smoother for a general stochastic

nonlinear state space model (Wills et al. 2008). Kim used the EM algorithm based

on the particle smoother to estimate the parameters of the stochastic volatility

models (Kim 2005). The missing observation issue was also discussed by Kim. In

addition, Kim also developed a method of moments to identify the initial parameters

for the EM algorithm. Olsson et al. applied the fixed-lag particle smoother to process

a long sequence observations when performing the EM algorithm (Olsson et al.

2008)

2.3.2.3 Markov Chain Monte Carlo (MCMC) Algorithms

The MCMC parameter estimation algorithm is similar to the EM algorithm. The

difference is that the EM algorithm updates parameters deterministically while the

MCMC method generates new parameters from a distribution. This property of

MCMC algorithm can effectively prevent parameter estimates from converging to a

local mode.

Chopin proposed an MCMC algorithm to perform particle filtering and identify

parameters simultaneously (Chopin 2002). Doucet et al. combined the idea of

simulated annealing to the process of MCMC parameter estimation and developed

an algorithm named State-Augmentation for Marginal Estimation (SAME) (Doucet


et al. 2002). Jacquier et al. used an algorithm similar to SAME to estimate the

parameters of two latent state models central to financial econometrics (Jacquier et

al. 2007). Jacquier also proposed methods that provide standard errors and

convergence diagnostics.

Compared with the EM algorithm, the MCMC algorithm is more robust against

attraction to local maxima. However, in some situations, especially when the size of

a parameter vector is large, generate a sample from the conditional distribution of

the parameter vector can be troublesome and less efficient. In addition, the MCMC

algorithm may suffer from an accumulation of error over time and can even diverge

over time (Andrieu et al. 2004).

2.3.3 Control Algorithms for the State Space Model

In some applications of the state space model, the transition of system states can

cause reward or cost and some actions can be adopted to change the transition

probability of the system states. Control algorithms for the state space model are

used to select an optimal action according to the current system state so as to

minimise the cost or maximise the reward. A commonly used model to describe this

control process is the POMDP. By solving the POMDP, the optimal strategy can be

obtained to minimise the cost or maximise the reward.

To further discuss the POMDP the complete observable MDP needs to be introduced

first. As an extension of the Markov chain, the MDP considers additional actions

that can change state transition probabilities and the rewards (costs) that are caused

by state transitions. A typical MDP can be represented by a tuple , , · ·,· , · ·

`,· , where and denote the finite sets of states and actions, respectively. The

transition probability , Pr | , denotes the

probability that the state at time 1 is given that the state at time was and an

action was adopted at time . The reward function , represents the reward

that can be obtained when the state changes from to and an action is selected.


In some applications, the reward function , is replaced by the cost function

, . An optimal policy can be obtained as a policy function by

solving the MDP. When the parameters of a MDP are known, two commonly used

solving algorithms are the value iteration and the cost iteration. For details of the

two solving algorithms, readers can refer to (Puterman 1994). During the

optimisation, the objective function is the expected long-term average reward (cost)

per unit time or the expected long-term discounted reward (cost).

In a POMDP underlying states of a system is partially revealed by observations, and

a probability distribution over the current state can be obtained by filtering

algorithm. This probability distribution, namely the belief, summarises the entire

history of observations and actions. A POMDP can be converted to an MDP by

maintaining a consistent belief set. However, since the belief space of a POMDP is

continuous, conventional solution methods for a discrete state MDP cannot be

directly applied to solve a POMDP. The main difficulty in solving the POMDP is

the representation of the value function that is a crucial component of both the value

iteration and the policy iteration. For a discrete state MDP, the value function can be

maintained easily as a table with one entry per state. However, for a POMDP, the

belief space is continuous and the value function can be an arbitrary function over

this continuous belief space. This arbitrary function cannot be represented by a table

with finite number of entries.

Fortunately, for a discrete state POMDP, the value function is piecewise-linear and

convex (PWLC). This PWLC function can be represented by the supremum of a

finite number of hyperplanes that are denoted by an -vector (Monahan 1982).

Based on these hyperplanes, some effective solution algorithms have been

developed (Sondik 1978; Cassandra et al. 1997; Kaelbling et al. 1998). However,

these methods for a discrete state POMDP cannot be generalised to solve a

continuous state POMDP, because infinite-dimensional -vectors are required to

represent the value function of a continuous state POMDP.


To address this difficulty, some approximate solution methods for a continuous state

POMDP have been proposed. Porta generalised -vectors to -functions and

modelled beliefs, actions, observations, and rewards by Gaussian mixtures (Porta et

al. 2005). A disadvantage of Prota’s method is that the number of Gaussian mixtures

used to represent the functions of interest increases exponentially with the number of

value iterations. Bertsekas proposed a closed-form solution method for a linear

system with quadratic cost (Bertsekas 2005). The linear and quadratic assumption

limits the application of Bertsekas’ method. Thrun presented a Monte Carlo

algorithm for learning to act in a continuous state POMDP (Thrun 2000). In Thrun’s

algorithm, each belief state was represented by a group of samples, and each sample

was presented by particles. The number of the samples determined the dimension of

the belief space. A large number of samples were required to get a close

approximation result. In addition, Kullback–Leibler divergences between a new

belief and different samples were required in the process of reinforcement learning.

Therefore, the efficiency of Thrun’s algorithm still needed to be enhanced. An

effective method to improve the efficiency of solution methods for a continuous

POMDP is reducing the dimension of the belief space. Brooks et al. proposed a

parametric method to solve continuous state POMDPs (Brooks et al. 2006). Brooks’

method reduced the dimension of a POMDP by representing the belief of a POMDP

using a Gaussian distribution. Thus Brooks compressed the dimension of a POMDP

from infinite to two. The EKF was used to estimate the belief of a POMDP at

different time points. To make the solution method more effective for nonlinear non-

Gaussian state space model, a more recent paper of Brooks adopted the particle filter

to estimate the belief of a POMDP (Brooks and Williams 2007). The Gaussian

distribution was still used to approximate beliefs. Zhou proposed a similar Monte

Carlo based solution method for POMDP (Zhou et al. to appear). However, Zhou

considered the whole exponential family distributions when approximating the belief

of a POMDP. This extension made Zhou’s method more appropriate when the belief

state did not follow the Gaussian distribution. Moreover, Zhou developed rigorous

theoretical error bounds for her algorithm.


2.3.4 Comments

Compared with state space models with discrete states and those with linear and

Gaussian assumptions, most nonlinear non-Gaussian state space models cannot be

solved analytically. Some approximate algorithms are used to calculate system state

estimates, model parameters, and the optimal control policy of nonlinear non-

Gaussian state space models. With the enhancement of the calculation ability of

computers, Monte Carlo-based algorithms are becoming more and more popular in

processing nonlinear non-Gaussian state space models. Some effective Monte Carlo-

based algorithms have been proposed. However, the applications of these algorithms

in asset degradation process modelling and maintenance strategy optimisation are

still limited. This research systematically addresses the issues that are encountered

when these Monte Carlo-based algorithms are used to predict asset lives and

optimise maintenance strategies based on the Gamma-based state space degradation

model.

54

3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators

3.1 Introduction

Asset health inspections can produce two types of indicators: (1) direct indicators

(e.g. the thickness of a brake pad and the crack depth on a gear) which directly relate

to a failure mechanism; and (2) indirect indicators (e.g. the indicators extracted from

vibration signals and oil analysis data) which can only partially reveal a failure

mechanism.

Direct and indirect indicators both have advantages and disadvantages. Direct

indicators provide more accurate references for asset degradation modelling, while

they are often technically or economically impossible to sample frequently. For

example, the crack on the tooth of a gear cannot be measured online. Similarly, the

wear of the impeller in a pump cannot be measured during its operating period.

Directly applying degradation models to these direct indicators with limited sample

size is often not practically possible. Different from direct indicators, indirect

indicators can be often obtained easily through various CM techniques. However,

some statistical models (e.g., PHM (Jiang and Jardine 2008), and the logistic

regression model (Xu and Zhao 2005)) are required to identify the uncertain failure

threshold – “gray boundary” (Liao et al. 2006b) on an indirect indicator. These

statistical models require sufficient failure history which is sometimes not available

in practice.

Instead of straight applying degradation models to direct indicators or identifying

“gray boundaries” on indirect indicators, some researchers investigated quantitative

relationships between direct and indirect indicators. For example, the wear status of

the impeller in a slurry pump can be assessed through the cumulative amplitude

measure evaluated from its vane pass frequency (Mani et al. 2008). The average

vibration amplitude sampled at a bearing changes with the degrees of the angular

3 Modelling Correlated Degradation Processes of Direct and Indirect Indicators 55

misalignment of the shaft (Sun et al. 2006). Indicators extracted from a vibration

signal relate to the crack size on a bearing (Shiroishi et al. 1997). These relationships

between the two types of indicators make it possible to estimate more desirable

direct indicators through the related indirect indicators which can be obtained more

easily.

An efficient approach to estimating direct indicators using indirect indicators is the

state space model. In the state space model, the degradation process of a direct

indicator is modelled by a state equation, and the relationship between a direct

indicator and its corresponding indirect indicator is described by an observation

equation. Subsequently, the state space model is able to consider both the

information from the stochastic degradation process of the direct indicator and the

uncertain relationship between a direct indicator and an indirect indicator.

This chapter develops a state space model that does not have discrete state, discrete

time, linear, and Gaussian assumptions to describe degradation processes of direct

and indirect indicators. Among non-Gaussian stochastic processes, the Gamma

process has been widely used to model a range of direct indicator degradation

processes, e.g. fatigue crack growth (Lawless and Crowder 2004), corrosion of

pressure vessel (Kallen and Van Noortwijk 2005), and brake-pad wear for

automobiles (Crowder and Lawless 2007). The prevalence of the Gamma process is

due to its monotonically increasing property which is consistent with most direct

indicator degradation processes. Therefore, a Gamma-based state space model is

investigated as an example of the non-Gaussian non-linear state space model. Monte

Carlo-based parameter estimation and life prediction algorithms are developed to

solve the Gamma-based state space model.

The body of this chapter is organised as follows. Section 3.2 introduces the

formulations and solving methods of the Gamma-based state space degradation

model. Then, simulation studies are performed in Section 3.3 to demonstrate the

performance of the solving algorithms. A case study using the data from an


accelerate life test of a gear box is conducted in Section 3.4. Finally a belief

summary of this chapter is provided in Section 3.5.

3.2 Model Formulations and Solving Algorithms

3.2.1 Model Formulations

The formulation of the Gamma-based state space model can be divided into two

components, i.e. the state equation given by

~ , , (3-1)

and the observation equation given by

. (3-2)

Here, denote the direct indicator at time , and represents the indirect

indicator at time . As shown in Equation (3-1), the direct indicator is assumed to

follow a Gamma process whose increments follow the Gamma distribution,

where , denotes the Gamma distribution with a shape

parameter and a scale parameter . In asset degradation modelling,

a commonly used formulation of is given by

· . (3-3)

A brief introduction of the Gamma process has been given in Section 2.1.1.1, and

readers can also refer to (van Noortwijk 2009). The observation equation (3-2)

assumes that the indirect indicator follows a function of the corresponding direct

indictor, i.e. , plus the observation noise . In this chapter, follows

a power formulation, i.e.

· . (3-4)

This power formulation can model various nonlinear relationships and has only two

parameters. Subsequently, the power formulation is an appropriate candidate to

model the nonlinear relationship between direct and indirect indicators. For some

practical dataset, other formulations may be more appropriate. These formulations

can be also treated by the algorithms proposed in this chapter. The formulation


selection methods for a practical dataset are discussed in Section 3.2.4. The

observation noise at different inspections is presumed to follow an identical

independent normal distribution, i.e. ~ 0, . Both the underlying health state

process and the observation process are continuous in time and state. In

addition, the underlying state process follows a non-stationary Gamma process.

Subsequently the proposed Gamma-based state space model can operate without the

discrete state, discrete time, linear, and Gaussian assumptions.

The degradation indicator observations used in this chapter are denoted as follows.

Only a single sequence of degradation indicators is considered to make the

formulations in parameter estimation algorithm more concise and understandable.

The formulations for multiple indicator sequences can be established by extending

the formulations developed in this chapter without any theoretical difficulties.

Inspection times are assumed to be : ; 1,2, … , , where is the number

of inspections. The direct and indirect indicators at the th inspection are denoted as

and , respectively. Only a part of direct indicators : ; 1,2, … ,

are assumed to be observable due to the difficulties of measurement, while indirect

indicators : ; 1,2, … , are all known. A function · given by

0 is not observable1 is observable (3-5)

is used to indicate the observability of a direct indicator. The inspection index of the th observable direct indicator is denoted as 1, … where is the number of

observable direct indicators. Obviously, 1, 1, … . The direct

indicator is assumed to be observable at least at one inspection time, i.e., 0.

3.2.2 Parameter Estimation

The EM algorithm is adopted to estimate the parameters of the Gamma-based state

space model. Three issues need to be addressed to perform the EM algorithm.

Firstly, the state process of the Gamma-based state space model does not follow the

linear and Gaussian assumptions. Consequently, the expectation of the complete


likelihood function, the marginal likelihood function, and the variance-covariance

matrix of parameter estimates cannot be calculated analytically. Motivated by (Kim

2005), this research uses the particle filter and smoother that are based on Monte

Carlo simulations to deal with this non-Gaussian and non-linear situation. The

second issue is the combination of observable direct indicators into the marginal

likelihood function. These observable direct indicators are brought in by the particle

filter and smoother during the E step of the EM algorithm. The last issue is

enhancing the efficiency of the time-consuming Monte Carlo-based EM algorithm.

This issue is addressed by dividing the EM algorithm into two stages with different

numbers of particles, and improving the convergence checking strategy for the EM

iterations.

The whole process of the EM algorithm can be divided into four steps. The first step

estimates initial parameters. Inappropriate initial parameters may cause the final

optimisation result to become trapped in a local maximum point, or even make the

EM algorithm divergent (Wu 1983). The second step, namely the E step, estimates

the expectation of the complete likelihood function. Subsequently, a new set of

parameters are obtained by maximising the expected complete likelihood function

during the M step. The final step checks the convergence of the EM loop. If the

convergence condition is satisfied, the final result of parameter estimation is

obtained. Otherwise, an additional EM iteration begins. These four steps are

discussed in detail as follows:

3.2.2.1 Initial parameter identification

According to the assumptions of this research, direct indicators are known at some

inspection time points. The complete likelihood function can be established based on

these observable direct indicators : ; 1,2, … , and their

corresponding indirect indicators : ; 1,2, … , . Subsequently, initial

parameters can be obtained by maximising this complete likelihood function based

on : , and : .


3.2.2.2 E step

The E step estimates the expectation of the complete likelihood function that can be

decomposed into two components as

log : , : | log : | log : | , : .(3-6)

In Equation (3-6), the notations , , and denote the vectors of parameters to

estimate, where , , , , , , , , , and , , .

The two components of Equation (3-6) can be further calculated as

log : | log ∏ ; ,∑ log logΓ 1 log ⁄

(3-7)

and

log : | , :

log log2 ∑ 2, (3-8)

respectively, where 2,3, … , , and

2,3, … , . The four expected values (i.e., , , , and log ) in

Equations (3-7) and (3-8) are estimated through the particle smoothing algorithm.

To perform the particle filter and smoother, the state process of the Gamma-based

state space model should be identified first. According to the Gamma Bridge

property (van Noortwijk 2009), the system state process changes from the Gamma

process to a hybrid stochastic process after the observations of direct indicators are

considered. The hybrid stochastic process can be written as

: , :

,

; ,0,

|; , 0,

, (3-9)


where is the inspection index of the next observable direct indicator given the

current inspection index , i.e. ; , 1 , and

·; , denotes the PDF of the Beta distribution with shape parameters and .

The derivation process of Equation (3-9) is given in the Appendix. When the state

process follows other stochastic processes, the corresponding hybrid stochastic

processes considering observable direct indictors can be calculated by a similar

process.

The formulations of the original particle filter and smoother are modified according

to the posterior stochastic process given by Equation (3-9). The weights of filtering

particles are updated recursively according to

· | | , | , ,

· | | | ,. (3-10)

The two importance density functions, i.e. , , and , ,

follow the two equations in Equation (3-9), i.e. , and | . At the

time points : ; 1,2, … , , when direct indicators are known, all the

particles are simply set to the values of observable direct indicators : . At the

other time steps, the SIR algorithm is performed. Similarly, the recursive weights

evaluation equation for smoothing particles is modified to

|, | , ∑ ,

| ∑ . (3-11)

The two components in Equation (3-11), | , and , follow

the two equations in Equation (3-9). At times , when 1 or 1

1 , the results of particle filtering are directly taken as the results of particle

smoothing. At the other time points, particle smoothing is carried out by the

backwards simulation algorithm. Finally, sequences of samples are generated,

i.e. :: ; 1,2, … , 1,2, … , . Using :

: , the expected values in

Equations (3-7) and (3-8) are calculated according to


∑ ∑

∑ log ∑ log .(3-12)

3.2.2.3 M step

After the expectation of the complete likelihood function is obtained by the E step,

the maximisation step (i.e. M step) can be carried out. Equations (3-7) and (3-8) are

optimised separately during the M step.

By calculating the partial derivative of Equations (3-7) with respect to the variable ,

the parameter can be represented as:

⁄ . (3-13)

A new equation with parameters and can be established by substituting Equation

(3-13) into Equation (3-7). Estimates of parameters and can be then achieved by

maximising the new equation using a multivariate optimisation algorithm. Estimates

of and from the last EM iteration can be used as the initial values for the

multivariate optimisation algorithm According to the two estimates and , the

parameter estimate can be calculated as Equation (3-13).

Similarly, by calculating partial derivatives of Equation (3-8), relationships between

parameters can be obtained as:

∑ ∑ , (3-14)

and

∑ 2 . (3-15)

A new equation with the parameter can be established by substituting Equations

(3-14) and (3-15) into Equation (3-8). The parameter estimate can be then

obtained by maximising the new equation. Subsequently, the parameter estimates

and can be calculated according to Equations (3-14) and (3-15).


3.2.2.4 Convergence checking

The EM algorithm essentially maximises the marginal likelihood function.

Therefore, the increment of the marginal likelihood function is a commonly used

indicator for the convergence of the EM algorithm. However, the computation of the

marginal likelihood function of the Gamma-based state space model involves

recursive Monte Carlo sampling, which cannot be performed efficiently. An

alternative method is to calculate the relative likelihood function (Kim 2005). This

calculation is based on the result of the particle smoother in the E step with no

additional Monte Carlo sampling being required. Consequently, evaluating the

relative likelihood function is a more efficient solution than directly calculating the

marginal likelihood function. The relative likelihood function of the Gamma-based

state space model is given by

log : , :

: , :log : , :

: , :: , :

log ∑ : , :

: , :

, (3-16)

where : is the th sequence of smoothing particles during the th EM loop, and

· denotes the PDF about direct and indirect indicators calculated using the

parameters estimated at the th EM loop.

Theoretically, the relative likelihood function converges to zero. However, the EM

algorithm used in this research is based on Monte Carlo approximation, and the

relative likelihood function itself is estimated using a Monte Carlo method.

Therefore, the relative likelihood function does not decrease smoothly to zero.

Rather, according to the simulation study, it decreases with an obvious trend during

the first several iterations and then becomes more fluctuating. Following on, if the

number of particles used in the EM algorithm increases, the relative likelihood

function decreases again and fluctuates within a smaller range closer to zero.


Therefore, the fluctuating value of the relative likelihood function indicates the

convergence of the EM algorithm for the current number of particles.

An alternative method to check the convergence of the EM algorithm is to directly

monitor the trend of parameter estimates. When EM iterations converge, parameter

estimates finally fluctuate within a certain range. This range becomes smaller when

more particles are used. Therefore, the development of parameter estimates is also

an indicator for the convergence of the EM algorithm. However, it usually takes a

relatively larger number of EM iterations to detect that parameter estimates are

fluctuating without an apparent trend.

Follow the idea of (Kim 2005), this research task develops a two-stage EM

algorithm to strike a balance between the efficiency and accuracy of parameter

estimation. During the first stage, 1,000 particles are used. The development

processes of parameter estimates are used as the indicator of convergence, because

the relative likelihood function calculated using 1,000 particles contains relatively

larger errors. Another reason is that EM iterations using 1,000 particles are still

efficient, and a small number of additional iterations do not cause a drop in overall

efficiency. In the second stage, 2,000 particles are adopted. At this stage, the relative

likelihood function – calculated using 2,000 particles – is preferred because it is

more accurate. Thus, the parameters estimated at the last loop of the second stage are

taken as the final results. The number of particles was chosen after some simulation

experiments that are discussed in Section 3.3.1.

3.2.3 Variance-Covariance Matrix of the Parameter Estimates

After the parameters of the Gamma-based state space model are estimated, the

variance-covariance matrix should be calculated to obtain the confidence intervals of

parameter estimates. Kim gives a method to calculate the Variance-Covariance

matrix via particle smoothing (Kim 2005). According to Kim’s method, the

observed information matrix can be written as


: , : ∑ : , :

∑ : , : · : , :

∑ : , : ∑ : , :

, (3-17)

where : is the th sequence of smoothing particles. Once the observed information

matrix is calculated, the variance-covariance matrix can be obtained by taking the

inverse of it.

3.2.4 Model Selection

In reality, a degradation dataset can be often fitted by state space models with

different formulations. Among these candidate formulations, the one with the best

fitness result can be identified by using a model selection criterion.

Some model selection methods have been proposed in literatures. The first type is

sequential null hypothesis methods. This kind of methods allow for variables to be

added or deleted at each step. The sequential null hypothesis methods are mainly

used to deal with the nested models. The results may depend on the choice of

subjective levels. When a model is used to do the estimation and prediction, a

more straightforward solution is comparing the mean square error (MSE) of the

results derived by candidate models. This kind of methods is effective and easy to

carry out. However, a large number of samples are required to get a confident

conclusion. Some methods are proposed to conduct this kind of methods when the

dataset is limited, e.g. cross-validation methods. However, it is still computer

intensive to derive the MSE for many times which is required when the MSEs

derived by two models are similar to each other. Two more quantitative and efficient

model selection methods are Akaike's information criterion (AIC) and Bayesian

information criterion (BIC). AIC, proposed by Akaike in 1974 (Akaike 1974), is

developed based on the information theory. On the other hand, BIC proposed by

Schwarz is based on Bayesian theory (Schwarz 1978). AIC and BIC have been used

in some paper of degradation modelling (Park and Padgett 2005a; Park and Padgett


2006). AIC and BIC are both based on an important assumption: the candidate

models should use the same dataset, and AIC further requires a large sample size.

Cavanaugh and Shumway summarised and compared the results of model selection

criteria in dealing with the state space model (Cavanaugh and Shumway 1997). A

new Bootstrap variant AIC was developed in their research to deal with the same

sample size. The new Bootstrap variant AIC developed by Cavanaugh and Shumway

(Cavanaugh and Shumway 1997) may have better results for state space models.

However, the computer intensity makes it impossible to deal with the proposed non-

Gaussian state space model.

In this chapter, the choice of formulations for different components in the state space

model is conducted using the Akaike's information criterion with a second order

correction (AICc) (Cavanaugh and Shumway 1997). The AICc is a relative measure

of lost information when a given model is used to describe a real dataset. A smaller

value of AICc indicates a better fitness result. Compared with the commonly used

AIC, the AICc is more effective for a small sample size. As illustrated in

2 2 log 2 1 1⁄ 1, (3-18)

the AICc considers the value of the marginal likelihood function , the parameter

number , and the sample size . For a given dataset, a model with a high likelihood

and a small number of parameters is preferable.

The model selection criteria, i.e., BIC, AIC, AICc, all require the value of the

marginal likelihood function. For the Gamma-based state space model the marginal

likelihood function cannot be calculated analytically. An algorithm based on the

particle filter is developed in this section to obtain the marginal likelihood function

of the Gamma-based state space model.

The marginal likelihood function of the Gamma-based state space model is given by

: , :

| ∏ : , , , : , ,, (3-19)


where is the inspection index of the last observable direct indicator given the

current inspection index , max ; , 1 . Due to the non-

Gaussian non-linear property of the Gamma-based state space model, the conditional

PDFs at the right side of the equality sign in Equation (3-19) are calculated by

particle filtering.

Unlike the particle filter developed during the E step of the EM algorithm, the

particle filter used to estimate the conditional PDFs in Equation (3-19) does not

depend on the direct indicators after the current inspection time. Subsequently, the

importance density function is given by

, ; , . (3-20)

After filtering, the PDFs in Equation (3-19) can be calculated by

: , , | , : , ,

∑ · ; 0, (3-21)

and

, : , , | , : , ,

· ; 0, · ∑ ; · ,, (3-22)

using the filtering results :: and :

: , where : represent the samples

generated from the prior density function : , , :

3.2.5 Monte Carlo-Based Lifetime Prediction

In this chapter, a failure is assumed to happen when a direct indicator crosses a

failure threshold . Two situations are considered here: One is that failures can be

detected during inspections; the other is that failures are unobservable during

inspections, which can happen when failures do not cause immediate breakdowns or

sharp changes of indirect indicators.


When a failure is observable, the CDF of the lifetime (i.e., the survival function) is

illustrated as

Pr | : , Pr | | : , (3-23)

which consists of two components. The first component is the PDF of the current

direct indicator , given the observations of indirect indicators up to the current

time and the fact that a failure has not yet happened; i.e. | : , , where

denotes the current inspection index. The PDF | : , can be obtained

by particle filtering. After filtering, | : , is represented by a set of

particles : . The second component is the conditional survival function given the

current direct indicator , which is obtained as

Pr | Pr , ⁄ (3-24)

according to the properties of the Gamma process, where · is the indicator

function given by

0,1, . (3-25)

After substituting Equation (3-24) into Equation (3-23), and using the result of

particle filtering, the survival function is obtained as

Pr | : , Λ Pr Λ Λ | | :

∑ Pr Λ Λ |. (3-26)

When η t is differentiable, the conditional PDF of the lifetime can be calculated as

| : , Λ ∑

ln. (3-27)

When a failure is unobservable, the survival function becomes a piecewise function

given by


| : | :

∑,

∑∞

∑B ; ,

B ,

·∑

0

. (3-28)

Equation (3-28) is constructed using the results of particle smoothing, i.e. :: .

After the current inspection time , the survival function is similar to Equation

(3-26). Before , the survival function is inferred from smoothing particles.

Between two known points, the sample points of a Gamma process follow the Beta

distribution. According to the characteristics of the Beta distribution, the second

equation of Equation (3-28) is obtained, where denotes the next inspection

index given the time , Be , 1 is the Beta function, and

Be ; , 1 is the incomplete Beta function.

The lifetime PDF is also divided into two components by the current inspection

time . After , the lifetime PDF is similar to Equation (3-27); before , the

lifetime PDF is approximated by calculating the average values of inspection

intervals:

| : | :

∑ ·

log∞

∑ ∑

·0

.(3-29)

3.3 Simulation Study

To demonstrate the implemented process and the performance of the proposed

algorithms, a simulation study was conducted. When a modest sized training sample


was used, the standard deviation of the parameter estimate in the shape function

given by Equation (3-3) was significant. Moreover, the variance-covariance matrix

of parameter estimates showed that the estimates and were highly correlated.

Therefore, and cannot be regarded as unknown simultaneously for average sized

training data. This chapter only considers the situation when is fixed to one.

The parameters of the Gamma-based state space model investigated in this

simulation study were set as 0.8 0.1 1.5 1.2 0.5 .

The simulation data were assumed to be collected from a test lasting 200 hours. The

sampling intervals of direct and indirect indicators were 20 hours and one hour,

respectively. A sequence of simulated data was plotted in Figure 3-1.

Figure 3-1: The simulated indirect indicators and direct indicators


First of all, initial parameters were estimated using observable direct indicators and

the corresponding indirect indicators. In this situation, only 10 of 200 indirect

indicators were used. The initial parameters were then obtained as


0.8401 0.0956 1.4847 1.2024 0.3174 .

Following on, the EM iterations were conducted in two stages. In the first stage,

1,000 particles were used. The EM loop converged after 71 iterations. In the second

stage, 2,000 particles were used for a better result. The EM loop converged after

eight additional iterations. The convergence processes of different parameter

estimates are presented in Figure 3-2 which shows that parameter estimates became

less fluctuating when 2,000 particles were used. The final results of the parameter

estimation were obtained as

0.7893 0.1179 1.5221 1.1938 0.4878 .

The variance-covariance matrix of the parameter estimates was then calculated as

Σ

1.6255 1.0011 0.2171 0.0571 0.10711.0011 0.8298 0.1434 0.0393 0.06400.2171 0.1434 2.2141 0.7738 0.01940.0571 0.0393 0.7738 0.2758 0.00730.1071 0.0640 0.0194 0.0073 0.6991

10 .

According to the variance-covariance matrix, the standard deviations of the

parameter estimates were:

var diag Σ 0.0403 0.0288 0.0471 0.0166 0.0264 .

The estimation result shows that the proposed EM algorithm has the power to

recover unknown parameters.


Figure 3-2: The development of the parameter estimates

To investigate the performance of the two-stage EM algorithm, ten additional

degradation sequences were generated. The generated simulation data were

processed by EM algorithms using six different strategies, i.e., a single-stage

strategy with 1000 particles, a single-stage strategy with 1500 particles, a single-

stage strategy with 2000 particles, a two-stage strategy with 1000 and 1500 particles,

a two-stage strategy with 1000 and 2000 particles, and a two-stage strategy with

1500 and 2000 particles. Originally, 500 particles were also considered; this made

EM loops more efficient. However, the small population of particles caused EM

iterations to sometimes diverge. In contrast, 1,000 particles made EM loops more

stable, and the elapsed time of EM iterations was still satisfactory. Similarly, it is

possible to consider more than 2,000 particles. However, additional particles could

not improve parameter estimation results significantly while the efficiency of EM

loops dropped considerably.

The three single-stage strategy used the relative likelihood function as the criteria of

convergence. The simulation study was carried out on a laptop computer with Intel


T2400 and 1G memory .The elapsed times and mean likelihood function values of

the last three EM iterations were recorded as Table 3-1. The results showed that the

two-stage strategy with 1000 and 2000 particles has the smallest relative likelihood

value, which indicates a better parameter estimation results. In addition, the two-

stage strategy with 1000-2000 particles consumes less time than the single-stage

strategy with 2000 particles and another two-stage strategy with 1500-2000

particles. In practice, degradation models are often trained using historical data

offline and the requirement of efficiency is relatively low. In this situation, a strategy

that can derive better parameter estimates is preferred. Therefore the two-stage EM

algorithm with 1000-2000 particles that has a relatively smaller mean relative

likelihood value is adopted.

Table 3-1: The mean likelihood function values and the elapsed times of six different strategies

Number of

Stages

Number of

Particles

Relative likelihood

function (10-3)

Elapsed time

(Seconds)

Single 1000 5.816 1111

Single 1500 5.094 1872

Single 2000 3.296 3257

Two 1000-2000 2.590 2648

Two 1000-1500 3.271 2505

Two 1500-2000 3.044 4509

3.3.2 Performance Investigation

This section demonstrates the advantages of the proposed EM algorithm and the

state space model by comparing three approaches to estimating direct indicators. In

the first approach, the parameters of the state space model were identified by the EM

algorithm, and then the direct indicators were estimated by the particle filter. In the

second and the third approaches, the model parameters were both estimated by the

maximum likelihood method which only considers observable direct indicators with

their corresponding indirect indicators. Direct indicators were estimated using the


particle filter in the second approach while they were estimated only using the

observation equation (3-2) in the third approach.

Firstly, the simulated training data were generated. To investigate the effects of the

observation noise, two Gamma-based state space models with different observation

noise ( 0.5 and 0.05) were considered. The other parameters were the same

as those in Section 3.3.1. For the two state space models, 60 sequences of simulated

data with 200 indirect indicator observations were generated, respectively. To

explore the effects of the quantity of observable direct indicators, the 60 simulated

sequences were divided into three equal-sized groups, with 5, 10, and 20 observable

direct indicators, respectively. Then, 60 additional sequences of simulated data were

generated for test. The training and testing data were processed by the three

approaches discussed above. The estimates of direct indictors in the testing dataset

were obtained. To evaluate the effectiveness of the three methods, the mean square

errors (MSE) of the direct indicator estimates were calculated. The values of MSE

when 0.5 and 0.05 are demonstrated in Figure 3-3 and Figure 3-4,

respectively. In these two figures, MSE1 denotes the MSE of the direct indicators

estimates by the particle filter whose parameters were identified by the EM

algorithm (i.e. the first approach). MSE2 denotes the MSE of the direct indicators

estimates by the particle filter whose parameters were estimated using observable

direct indicators and their corresponding indirect indicators (i.e. the second

approach). MSE3 denotes the MSE of the direct indicators estimated by the

observation equation whose parameters were estimated using observable direct

indicators and their corresponding indirect indicators (i.e. the third approach).


Figure 3-3: MSEs of the direct indicator estimates when the observation noise is 0.5 ( . )

Figure 3-4: MSEs of the direct indicator estimates when the observation noise is 0.05 ( . )


In Figure 3-3 ( 0.5), MSE2 is 42.41%, 57.62%, and 66.43% smaller than MSE3

for the three different numbers (i.e., 5, 10, and 20) of observable direct indicators.

This indicates that the particle filter using more accurate parameter estimates can

improve the estimation results achieved by observation equation more significantly.

On the contrary, MSE1 is 43.63%, 41.68%, and 23.04% smaller than MSE2 in the

three different situations. The decreasing difference between MSE1 and MSE2 shows

that EM algorithm can achieve more significant improvement of performance for

smaller number of underlying health state observations.

In Figure 3-4 ( 0.05), when 10 or 20 indirect indicators were observable, MSE1

was slightly larger than MSE2. In this situation, only considering the indirect

indicators whose corresponding direct indicators were observable could get better

parameter estimation results. The reason is that the sample size of direct indicators

was relatively larger and the errors introduced by Monte Carlo algorithms in the E

step were significant compared with the small observation noise ( 0.05). In

practice, the number of observable direct indicators is usually limited due to the

difficulties of measurement. Moreover, due to the uncertain relationship between

direct and indirect indicators and the measurement errors of direct indicators, the

noise in the observation equation is often significant in reality. Therefore, it is

beneficial to consider additional indirect indicators without corresponding

observable direct indicators using the proposed EM algorithms in most real case

studies.

In most situations of the simulation experiment, MSE2 was smaller than MSE3.

Therefore, the particle filter based on the state space model can have a more accurate

estimation result than that obtained only using the observation equation. The only

exception happened when 0.05 and five direct and indirect indicators were

used. In this situation, the estimation result derived by the particle filter was less

accurate than that achieved using the observation equation only. The reason is as

follows: The parameter estimation results for the state equation can be poor when

only five direct indicators are available. This problem is also called overfit. In


statistical modelling the overfit problem refers to a model with too many degrees of

freedom corresponding to the training sample size. The overfit problem also exists

when the direct and indirect indicators are used to estimate the parameters of the

observation equation. In Figure 3.3, the particle filter can still achieve better

underlying state estimation results than observation equation when the parameter

estimates of the state equation and the observation equation are both poor. When the

observation noise is insignificant ( 0.05), the parameters in the observation

equation can be identified accurately only using a small sample size. However, the

overfit problem of the state equation still exists. In this situation, considering the

state equation cannot improve the underlying state estimates derived only by

observation equation. This overfit problem was solved using the whole sequence of

indirect indicators. In reality, indirect indicators can often be sampled easily.

Therefore, this overfit problem of the state space model can be overcome by

increasing the sampling rate of indirect indicators.

3.3.3 Life Prediction

After the model parameters have been estimated, the lifetime of an engineering asset

can be predicted according to the algorithm developed in Section 3.2.5. The

parameter estimates obtained in Section 3.3.1 were used, i.e.,

0.7893 0.1179 1.5221 1.1938 0.4878 .

A sequence of simulated degradation indicators was generated for testing. After a

failure threshold Λ 8 was set on the sequence of direct indicators , the failure

time was obtained as 253.3.

In the first situation, the failure was assumed to be observable. The conditional

survival function given indirect indicator observations was then obtained using

Equation (3-26). The survival functions predicted at different inspections were

plotted in Figure 3-5, which shows that the jumps of the survival functions were

sharper and closer to the actual failure time when more indirect indicators were

available. The lifetime PDF can be calculated via Equation (3-27). The lifetime


distributions derived at 0 and 240 were demonstrated in Figure 3-6. The

figure shows that the lifetime PDF is biased and with a wide confidence interval at

first. On the other hand, at the last stage of life, the failure time can be predicted

accurately. The reason is that, at the initial stage, the lifetime PDF is a prior

distribution, and this prior distribution is updated by the information from indirect

indicators and the fact that failure has not yet occurred.

Figure 3-5: Life prediction results when the failure is observable


Figure 3-6: The lifetime distribution predicted at different time points

Figure 3-7: The lifetime distribution prediction at when the failure is not observable


The situation when the failure is unobservable was also considered. The survival

function and the lifetime PDF derived by Equations (3-28) and (3-29) were similar

to those plotted in Figure 3-5 and Figure 3-6. The difference was that the survival

function and the lifetime PDF were not conditional on the fact that the item had

survived to current time. Therefore, as shown in Figure 3-7, the values of the

lifetime PDF before current time were not necessarily equal to zero. The piece-wise

lifetime PDF in Figure 3-7 was calculated using Equation (3-29).

The life prediction result shows that the proposed life prediction method can

combine the information from indirect indicators and the age of an engineering

asset.

3.4 Case study: Crack Size Propagation Modelling

The data used in this case study was collected from the accelerated life test of a

single stage spur gearbox. The gear investigated in this research was 10 mm wide

and had 27 teeth. The shaft speed was 2400 RPM. To accelerate the degradation

process, a semi-circle notch of 1mm radius was initially spark eroded at the root

fillet of a tooth, and the gearbox worked under an overload condition. The vibration

signal was sampled at 73 time points with irregular intervals. On the contrary, the

crack depth was only measured at six different time points (listed in Table 3-2) due

to the difficulties of measurement. This case study modelled the development

processes of the crack depth (a direct indicator) and the change of the related

indicator (an indirect indicator) extracted from vibration signals.

Table 3-2: The measurements of the crack size during the accelerated life test

Measure time (hour) 0.0917 3.3383 3.7536 4.6383 5.5064 5.6864

Crack depth (mm) 1 2.57 2.73 3.11 3.81 4.16

To extract an indicator that relates to the crack depth on the gear, this numeric study

adopted an indicator extraction method discussed in (Wang 2003a). Firstly, a

residual signal was obtained from the signal average by filtering out gear meshing


harmonics (i.e. using a multi band-stop filter). This residual signal represents

random transmission errors for healthy gears. For faulty gears (e.g. gears with tooth

cracking or tooth pitting), the transmission errors include a sudden change (e.g. a

spike) and the signal becomes non-Gaussian. Kurtosis is a good measure of non-

Gaussianity (e.g. spikiness) in a signal. Therefore, the kurtosis of the residual signal

is an effective indicator to reveal crack development processes. Previous research

has also revealed that the kurtosis of the residual signal has a sound co-relationship

with the crack on the test gear (Wang and Wong 2002; Wang 2003a). Therefore, the

kurtosis of the residual signal was adopted as the indirect indicator of the crack

depth.

In this case study, the Gamma-based state space model given by Equations (3-1),

(3-2), and (3-3) was considered first, where the crack depth was denoted by Λ ,

and the kurtosis of the residual vibration signal was represented by . The crack

depth on a gear cannot decrease during a degradation process. Therefore, the

monotonically increasing Gamma process was an appropriate candidate to model the

enlargement of the crack depth. The effectiveness of the Gamma process in

modelling the development of a crack has been verified by existing research

(Lawless and Crowder 2004; Park and Padgett 2005a). As the development of the

crack depth demonstrated a nonlinear relationship with the time, the nonlinear

Gamma process given by Equations (3-1) and (3-3) was used to model the

development of the crack depth. Due to the small sample size, the one parameter

shape function (i.e. ) was adopted by the nonlinear Gamma process. The

relationship between kurtosis values of the residual vibration signal and the crack

depth was also nonlinear. Consequently, the power formulation (i.e. Λ ·

Λ ) was used in Equation (3-2). For practical degradation process, the

observation noise in Equation (3-2) does not necessarily follow the identical

independent normal distribution. Motivated by Christer’s research (Christer et al.

1997), four typical formulations of the observation noise were considered as showed

in Table 3-3, i.e. time-independent noise, noise with linear standard deviation, noise

with linear variance, and noise with exponential variance. More complex


formulations of the observation noise were not considered in this case study, because

the sample size in this case study was limited and more parameters may cause an

overfit problem.

Table 3-3: The AICc of different models

Observation noise PDF of AICc

Time independent noise √

exp 117.9231

Noise with a linear standard

deviation √

exp ; · 138.4049

Noise with a linear variance √

exp ; √ · 117.1404

Noise with an exponential

variance √

exp ;

· exp ·

106.9659

The AICc was used to choose the most appropriate formulation of the observation

noise from the four candidates listed in Table 3-3. The AICc values of the four

different formulations of the observation noise were calculated (see Table 3-3).

According to the results, the noise with exponential variance had the lowest AICc

value and was selected to model the observation noise. Finally, the model

parameters were estimated as:

1.939 0.1087 0.2269 1.893 0.1456 .

A state space model with linear and Gaussian assumption given by

Λ Δ Λ ~ · Δ , · √Δ (3-30)

and

~ · Λ t , (3-31)

was also used to fit to the dataset collected from the accelerated life test. When the

linear and Gaussian assumption is adopted, the parameters can be estimated

efficiently using the EM algorithm based on the Kalman smoother (Khan and Dutt

2007) The parameter estimation results were:


0.5557 2.288 0.5443 0.2487 .

The corresponding AICc value was calculated as 1227.997, which is much greater

than that of the Gamma-based state space models. In addition, the Gaussian

assumption makes the development of the crack depth fluctuant, which is not

consistent with the fact that the crack depth monotonically increases. Therefore, the

linear Gaussian assumptions are not appropriate for this dataset collected in the

accelerated life test of the gear box.

3.5 Chapter Summary

This chapter models correlated degradation processes of direct and indirect

indicators using a Gamma-based state space model. An EM algorithm based on the

particle smoother has been developed to estimate the parameters of the Gamma-

based state space model. The results of the simulation experiment demonstrate that

the proposed EM algorithm can estimate the underlying parameters accurately.

When samples of a direct indicator are limited and observation noise is significant,

the proposed EM algorithm can improve the parameter estimates by considering

more indirect indicators that are easier to obtain in reality. In addition, a lifetime

prediction approach using the particle filter, the particle smoother, and Bayesian

theory has been developed. The simulation study shows that the lifetime prediction

algorithm can combine indirect indicator observations and age information to

estimate the failure time of an engineering asset. Finally, a case study using

experimental data has been conducted to demonstrate the model selection method

that can identify the candidate model formulation with best fitness result. The case

study also shows that the linear and Gaussian assumption is not appropriate for some

practical data.

The proposed Monte Carlo-based algorithms enable the state equation of the state

space model to adopt other non-linear non-Gaussian stochastic processes. The

observation equation can also use a range of non-linear formulations to describe the

relationship between direct and indirect indicators. These state space models –

without linear and Gaussian assumptions – are expected to be more effective when


fitted to practical data. Moreover, the parameter estimation method developed in this

research can deal with the situation when more indirect indicators than direct

indicators are known. Consideration of these additional indirect indicators can

improve parameter estimation results and avoid the overfit problem when the

observations of direct indicators are limited.

84

4 Joint Modelling of Failure Events and Multiple Indirect Indicators

4.1 Introduction

Chapter 3 investigates the state space degradation model that describes correlated

degradation processes of a direct indicator and a indirect indicator. However, for

some engineering asset, the failure mechanism is complex and no physical direct

indicator can be extracted to represent the underlying degradation process. For

example, Wang used a generic wear condition as a direct indicator of aircraft

engines (Wang 2007). This generic wear condition was not extracted directly from

the condition monitoring data. When no direct indictor can be extracted from CM

data, the underlying degradation process is only observable at failure times.

Therefore, the indicators extracted from the CM data and the lifetime data should be

combined to model the degradation process when a direct indicator is not available.

Moreover, in some situations multiple indirect indicators can be extracted from CM

data. The effectiveness of these indicators in life prediction should be evaluated, and

information from these indirect indicators should be fused properly. This chapter

develops a state space model that describes an asset degradation process using

multiple degradation indicators and failure events.

The state space model is an effective mathematical model that can combine multiple

degradation indicators and lifetime data. The state space model presumes the

existence of an underlying degradation process. When the underlying degradation

process crosses a predetermined threshold, a failure happens. The underlying

degradation process is partially revealed by multiple degradation indicators.

Compared with other degradation models, the state space model considers both the

stochastic underlying degradation process and uncertain relationships between the

underlying degradation process and the degradation indicators. Therefore,

degradation indicators are used more efficiently, and no additional mathematical

models for time dependent degradation indicators are needed when predicting asset

4 Joint Modelling of Failure Events and Multiple Indirect Indicators 85

lives. Moreover, the state space model is an effective tool for indicators fusion.

Compared with commonly used multivariate statistical approaches and multivariate

time series analysis methods, the state space model can analyse degradation

indicators with uneven sampling intervals.

Existing research of the state space degradation models that combine degradation

indicators and failure events largely adopts discrete time or state assumptions. Wang

proposed a state space model whose underlying health state increments followed a

beta distribution (Wang 2007). Subsequently, Wang’s new model had a monotonic

increasing underlying degradation process that was similar to irreversible

engineering asset wear processes. However, Wang’s new model was discrete in

time. Makis and Jiang developed a state space model based on a continuous time

discrete state Markov process (Makis and Jiang 2003). The discrete state assumption

requires discretising continuous degradation processes, which needs expert

knowledge and may introduce additional errors. To remove discrete time and state

assumptions, state space models continuous in time and state have also been

developed. Wang et al. developed a state space model to predict the RUL of bearings

using RMS values of vibration signals (Wang 2002). Wang’s model used values of

RUL as underlying health states. This deterministic underlying degradation process

did not consider stochastic heterogeneous degradation processes of different

individuals. Whitmore et al. proposed a bivariate Wiener process (Whitmore et al.

1998) to model a partially revealed degradation process. However, the bivariate

Wiener process only considered the covariates collected at failure and censoring

times, while degradation indicators at other occasions were ignored.

To address the limitations in the existing state space degradation models, this

chapter applies the Gamma-based state space model to combine multiple

degradation indicators and failure events. Continuous time property enables the

proposed model to process irregular inspection intervals. Continuous states, on the

other hand, avoid discretising indicators with continuous values. This chapter uses

Monte Carlo based parameter estimation and lifetime prediction algorithms to

process the Gamma-based state space model. The censored failure data problem


which has been ignored by most existing state space degradation models (Christer et

al. 1997; Wang 2002; Makis and Jiang 2003; Wang 2007) is considered. In addition,

a parametric Bootstrap algorithm is developed to evaluate the effectiveness of

different indicators in asset degradation modelling. The proposed algorithms are

validated by both simulated data and field data.

4.2 Model Formulations and Solving Algorithms

4.2.1 Model Formulations and Notations

In this chapter, the system equation of the Gamma-based state apace model given by

Λ Δ Λ ~Ga · Δ , (4-1)

is assumed to follow a Gamma process. The scalar variable Λ 0 denotes the

underlying health state at time 0. A larger value of Λ indicates a worse health

state, and a failure is assumed to happen when Λ crosses a predetermined

threshold . An asset is assumed to be non-defective at the initial time, i.e.,

Λ 0 0. The increments of Λ follow a Gamma distribution given by Equation

(4-1), where Ga · Δ , denotes the Gamma distribution with shape parameter

· Δ and scale parameter . The second component of the Gamma-based state

space model is the observation equation. In this chapter indirect indicators are

assumed to follow a multivariate normal distribution given by

~N · Λ , Σ , (4-2)

where denotes the indirect indicator vector at time , and N · Λ , Σ denotes

the multivariate normal distribution with mean vector · Λ and covariance

matrix Σ. Here, the multivariate normal distribution is selected due to the following

reasons: The multivariate normal distribution is the most important multivariate

continuous distribution. The mathematical property of the multivariate normal

distribution has been well investigated, and the inference algorithms have been well

developed. Furthermore, as the most commonly used multivariate distribution, the

multivariate normal distribution has been widely used to approximate joint random


variables in practice. In addition, the multivariate normal distribution is also widely

used in state space model (Stathopoulos and Karlaftis 2003; Proust et al. 2006;

Proust-Lima and Jacqmin-Gadda 2007).

To formulate the parameter estimation algorithm more concisely, only degradation

indicators from one degradation process are considered in this chapter. Inspection

times are denoted as 1,2, … , ), where is the number of inspections. The

values of the underlying health state and indirect indicator vector at the th inspection

are denoted as and respectively. The failure time and the failure threshold of the

underlying degradation state are denoted as and . Note that is assumed equal

to 1, because the identical life time distribution can be obtained by changing the

scale parameter for different values of . For an asset preventively replaced

before failure, the censoring time is denoted as . Unlike the commonly used PHM,

the degradation indicators at or are not indispensible during parameter

estimation of the Gamma-based state space model.


Similar to Chapter 3, this chapter uses the Monte Carlo based EM algorithm to

estimate the parameters of the Gamma-based state space model. In this chapter, the

underlying degradation process is only observable at the failure time. Therefore, the

initial parameters cannot be estimated using the method adopted in Chapter 3. This

chapter estimates the initial parameter using the method of moments. In addition,

due to the difference in model formulations and assumptions, the E step of the EM

algorithm is also different from that in Chapter 3. The details of the initial

parameters estimation method and the E step of the EM algorithm are introduced as

follows, while other steps that are identical to those in Chapter 3 are not discussed in

this chapter.


4.2.2.1 Initial parameters estimation

The initial parameters for the EM algorithm are estimated by the method of

moments. Due to uneven inspection intervals, the increments of degradation

indicator vectors should be scaled before treated by the method of moments. The

method of moments used in this research is motivated by that adopted in (Cinlar et

al. 1977). Firstly, the equation

Λ · ·· ·

· · · · (4-3)

can be obtained according to the property of the Gamma process. Then the first-

order and second-order moments of the scaled increments of degradation indicator

vectors can be calculated as

∑ ∑ · 1 · (4-4)

and

∑ · ∑· · ·

∑ · · · · · · ·· ·

. (4-5)

After that, given an initial estimate , the estimate of , and Σ are calculated using

·⁄ , (4-6)

∑ ∑/

∑ , (4-7)

and

∑ · · · ∑

1 · · 2 ∑ (4-8)

The estimate is obtained by experience. When any diagonal element of is

negative, a bigger value of is required.


4.2.2.2 E step

The E step is to estimate the expectation of the complete likelihood function. In this

section, both complete and censored failure data are considered. When complete

failure data are available, the expected complete likelihood function given

degradation indicators and failure time can be written as:

: , log : , : ,

: , log : ,: , log : | : ,

, (4-9)

where , , and represent the model parameters

to estimate. To make the equations more concise, in this chapter, ; ,

1, … , is denoted by : ; similarly ; , 1, … , is denoted by : .

The two components of Equation (4-9) can be written as:

: , log : | ∑ log log Γ 1

·: , log

: , : , 1

(4-10)

and

: , log : | : , ⁄ log Σ

tr Σ ∑: , · · ·

, (4-11)

respectively, where , , 2,3, … , 1 , and is

the size of the indirect indicator vector. To achieve a shorter equation, denotes

, and represents in Equation (4-10). To calculate Equations (4-10) and

(4-11), three components (i.e., : , ,

: , , and : , log

should be estimated first. The three components are estimated through the particle

smoother algorithm. The particle smoother can approximate conditional distributions

of underlying health states given degradation indicators : and failure time by a

set of random samples :: ; 1,2, … , 1 1,2, … , as:

: , ∑ 1,2, … , . (4-12)


In Equation (4-12), · is the Dirac delta measure given by

0, 1,

. (4-13)

Using these smoothing results :: , the three components in Equations (4-10) and

(4-11) can be approximated as:

: ,1 ∑

: , log1 ∑ log 1

: ,2 1 ∑

: ,

2

: ,

2

. (4-14)

To conduct the particle smoother, the conditional PDF of the underling health state

at the next inspection time given the failure time and the current heath state should

be calculated first. In the developed model, the failure time is assumed as the first

crossing time of the underlying Gamma process Λ ; 0 to a predetermined

failure threshold . Therefore, the conditional PDF of the underlying health state at

the next inspection time can be written as:

, Be ; , (4-15)

according to the Gamma bridge property.

For censored data, the expected complete likelihood function is similar to Equation

(4-9), except replacing the failure time with the censored time . The expected

complete likelihood function for censored data is also approximated by the results of

particle smoothing. When conducting particle smoothing, the conditional PDF of the

underlying health state at the 1 th inspection point is modified from Equation

(4-15) to

, Λ Λ Ga ; ,

· , / /, / /

1, … , 1. (4-16)


The derivation process of Equation (4-16) is demonstrated in Appendix.

4.2.3 Indicator Effectiveness Evaluation

In real applications, it is important to evaluate the relative effectiveness of different

degradation indicators in parameter estimation and lifetime prediction. After

effective indicators are identified, a more economical condition monitoring system

can be built by only installing necessary sensors. Moreover, the size of the database

that stores condition monitoring data can be also reduced. In addition, the over-

fitting problem when applying a degradation model to a real dataset may be

overcome by ignoring unnecessary degradation indicators. Some degradation models

can identify the effectiveness of different degradation indicators. For example, the

importance of different covariates of the PHM can be revealed by the regression

coefficients. For the composite scale model, the effectiveness of different

degradation indicators can be disclosed by weight parameters and mean values of

degradation indicators (Jiang and Jardine 2006). In the proposed Gamma-based state

space model, the relationships between degradation indicators and underlying health

states are modelled by an observation equation that is in various formulations.

Consequently, the effectiveness of a degradation indicator cannot be simply

evaluated by a certain parameter.

This research develops a parametric bootstrap method to evaluate the effectiveness

of indicators. Because the parameters of the Gamma-based state space model cannot

be estimated efficiently, the bootstrap method that estimates the parameters of a

large number of simulated data is not appropriate. An alternative method is

comparing the influences of different indicators on the result of particle filtering. An

indicator that affects particle filtering results significantly can have a considerable

impact on the result of parameter estimation, because the estimation of the expected

complete likelihood function during the EM algorithm is based on the particle

filtering and smoothing. In addition, the asset life prediction method also relies on

the particle filter. Therefore, the influence of an indicator during particle filtering


reveals the effectiveness of the indicator in degradation modelling and life

prediction.

The process of the proposed indicator effectiveness evaluation method is as follows:

Firstly, the proposed model is fitted to a training dataset and the parameters are

estimated as . Then, sequences of simulated data are generated using the

parameter estimates . After that, the particle filter is carried out to estimate

underlying health states of the simulated degradation sequences. During the

particle filtering, each degradation indicator is omitted in turn, and MSE of the

underlying health state estimates is calculated. Thus MSEs are calculated as

( 1,2, … , ), where is the size of a degradation indicator vector and

denotes the MSE of underlying health state estimates when the th indicator is

omitted. After that a particle filter considering all the indicators is applied to the

simulated data, and the MSE of the underlying health state estimates is obtained as

. A relative contribution ratio is calculated as ⁄ (obviously

1 ∞ ) for the th degradation indicator. A bigger value of indicates that the th

degradation indicator is more important. On the contrary, if is close to one, the th

degradation indicator can be omitted. However, degradation indicators which are

highly correlated to each other may have relative contribution ratios close to one

simultaneously. These indicators cannot be removed altogether. One solution is only

omitting the indicator with the smallest relative contribution ratio, and then

calculating the relative contribution ratios of the rest indicators again. Subsequently,

highly correlated degradation indicators will not be omitted simultaneously.

4.3 Simulation Study

To investigate the performance of the proposed algorithms, a simulation study was

conducted. First of all, a set of simulation data was generated. The simulation dataset

consisted of two complete degradation sequences and two censored degradation

sequences of degradation indicators. The parameters adopted to generate a

simulation dataset were as follows:


0.005, 0.05, 2 2.5 3 , and Σ5 1 11 5 21 2 6

10 .

These parameters are illustrative only and without any particular meaning. The

inspection interval was assumed to be 60 hours, i.e. 60. One of the four

sequences of degradation indicators is shown in Figure 4-1.

Figure 4-1: Three Simulated degradation indicators


Given the four degradation sequences, parameter estimation was conducted. First of

all, according to Equations (4-6), (4-7), and (4-8), initial parameters were estimated

as:

0.01, 0.02535, 2.007 2.322 2.811 ,

and

5.057 0.317 2.2980.317 5.624 3.5722.298 3.572 7.892

10 .


Then, EM iterations started with this initial parameter set. The EM iterations were

conducted in two stages. In the first stage which lasted 57 iterations, 1,000 particles

were used to perform particle smoothing. At the second stage, 2,000 particles were

adopted for a better estimation result. As shown in Figure 4-2, the convergence

process of parameter estimates became much smoother when 2,000 particles were

used. After 67 iterations, the final results were acquired as:

0.005475, 0.04454, 2.024 2.516 3.037 ,

and

4.729 1.256 0.9271.256 4.481 1.9650.927 1.965 6.162

10 .

The parameter estimation results showed that the proposed EM algorithm can detect

the unknown parameters accurately.

Figure 4-2: The convergence process of the EM algorithm

4.3.2 Lifetime Prediction

To test the lifetime prediction ability of the proposed model, an additional simulated

sequence of degradation indicators was generated. As described in Section 3.2.5, the


lifetime prediction algorithm is divided into two steps. The first step is estimating

the distribution of current underlying health state using the particle filter. As to the

simulated data for test, underlying health states at different inspections were

estimated as Figure 4-3. The second step is predicting the RUL based on the

underlying health state estimation results. The life prediction results and

corresponding confidence intervals are demonstrated in Figure 4-4. As shown in

Figure 4-4, when more condition monitoring indicators were available, the RUL

prediction results became more accurate and the confidence intervals were narrower.

The reason is that the prior estimate of the URL was updated by more degradation

indicators and the fact that the asset still survived. Therefore, the proposed lifetime

prediction algorithm can combine the information from degradation indicators and

survived time.

Figure 4-3: Estimation of underlying health states


Figure 4-4: RUL prediction results

4.3.3 Effectiveness Evaluation of Indicators

The effectiveness evaluation method for indicators was also tested by a simulation

study. Firstly, forty-four complete sequences of simulated degradation indicators

were generated using the parameters:

0.005, 0.05, 0.2 2.5 3 ,

and 1 0 00 0.005 00 0 0.005

.

The inspection interval was still assumed to be 60 hours. Four sequences of these

simulated degradation indicators were used as training data; the other 40 sequences

were used as test data. Based on the training data, the parameters were estimated as:

0.004903, 0.04979, 0.2292 2.568 3.08 ,

and

1032 1.677 5.4261.677 4.622 0.48435.426 0.4843 4.613

10 .


The bootstrap algorithm developed in Section 4.2.3 was then conducted. Forty

sequences of simulated indicators were generated during the bootstrap process, and

relative contribution ratios of different indicators were calculated as the second row

of Table 4-1. On the laptop computer with Intel T2400 CPU and 1 G memory, the

bootstrap algorithm lasted 126 seconds.

Table 4-1: The results of effectiveness evaluation for indicators

Index of the indicator j 1 2 3

Relative contribution ratio 1.012 1.655 2.205

MSE 4.511×10-4 6.729×10-4 11.58×10-4

To investigate the performance of the proposed effectiveness evaluation algorithm

for indicators, parameter estimation was conducted using the original training

dataset when different indicators were omitted. When the first indicator was not

considered the parameters were estimated as:

0.004847, 0.05037, 2.577 3.091 ,

4.67 0.45550.4555 4.67 10 .

Similarly, when the other two indicators were omitted, the parameter estimates were:

0.004746, 0.05144, 0.2343 3.11 ,

1031 8.1728.172 4.631 10 ,

and

0.004288, 0.05693, 0.2334 2.631 , 1032 2.7612.761 4.534 10

Using these parameter estimates, the particle filter was carried out to process the test

data. The MSEs (denoted by ) of the underlying health state estimates are

given by the third row of Table 4-1. The MSE of the underlying health state

estimates using all the three indicators was also calculated as 3.904×10-4. The results

displayed in Table 4-1 show that ignoring an indicator with a larger relative

contribution ratio during parameter estimation can cause more significant error in


underlying health state estimation. On the contrary, considering the first indicator

whose relative contribution ratio is near one, cannot improve the underlying health

estimates significantly. Therefore, the proposed indicator effectiveness evaluation

method can recognize the importance of different degradation indicators.

4.4 Case Study: Lifetime Prediction for the Bearing on a

Liquefied Natural Gas (LNG) Pump

4.4.1 Data Introduction

LNG pumps are critical in the LNG industry. An unexpected breakdown of an LNG

pump can reduce the amount of LNG at the receiving terminal and cause

performance degradation of the whole plant. The specifications of LNG pumps

investigated in this case study are listed in Table 4-2, and the structure of an LNG

pump is shown in Figure 4-5. The LNG pump is enclosed within a suction vessel

and mounted with a vessel top plate. Three ball bearings are installed to support the

entire dynamic load of the integrated shaft of the pump and a motor. The three

bearings in the LNG pump are self-lubricated at both sides of the rotor shaft and tail

using LNG. Due to the low viscous value (about 0.16 cP) of LNG, the three bearings

are poorly lubricated. In addition, the bearings work at a high speed (3,600 rpm).

Therefore, bearings installed in these LNG pumps are failure-prone.

Table 4-2: The specifications of the pump

Capacity Pressure Impeller Stage Speed Voltage Rating Current

241.8 m3/hr 88.7 kg/cm2. g 9 3,585 RPM 6,600V 746 kW 84.5 A


Figure 4-5: Pump schematic

To monitor the health of the bearings, for each bearing, three accelerometers were

installed on housing near the bearing assembly in horizontal, vertical, and axial

directions respectively. In this case study, vibration signals from two bearings

installed on two LNG pumps were investigated. The vibration signals were sampled

at irregular intervals. At the beginning and last stage of life, the vibration signals

were measured more frequently; while at the middle stage of life, the vibration

signals were collected at relatively larger intervals. This kind of irregular inspection

strategy is often used in reality, because it is not necessary to measure vibration

signals frequently when a bearing is running smoothly. The vibration signals

investigated in this case study were all measured at the horizontal direction. The

overall features of the vibration signals are listed in Table 4-3. The outer raceway

spalling and the inner raceway flaking on the bearings are shown in Figure 4-6 and

Figure 4-7. In this case study, vibration signals collected from the bearing installed


on Pump P301D were used to estimate the parameters of the proposed model, while

the vibration signals collected from the bearing installed on Pump P301C were used

to test the lifetime prediction ability of the proposed model.

Table 4-3 Vibration data features

Machine No

Life Time Failure Mode Sample

Number Sampling Frequency

P301C 4,698Hrs Outer raceway spall 120 12,800 Hz

P301D 3,511Hrs Inner raceway flaking 136 12,800 Hz

Figure 4-6: Outer raceway spall of P301C Figure 4-7: Inner raceway flaking of P301D

4.4.2 Model Application

Bearing failures (e.g. inner race crack, outer race crack, and rolling element crack)

often generate shock pulses whose energy emanates at a relatively high frequency

band. Therefore, a vibration signal, after a high pass filter (HPF), is often more

sensitive to early defects of a bearing. For a raw vibration signal, the kurtosis and the

crest factor which reveals the number of extreme deviations can also indicate early

defects. After investigating different features of the vibration signals used in this

case study, three features were adopted as degradation indicators of the proposed

model: the entropy of the vibration signal after a HPF at 3,000 Hz, the crest factor of


the vibration signals after a HPF at 2,500 Hz, and the crest factor of the raw

vibration signals.

Using vibration signals collected from Pump P301D, the parameters of the proposed

model were estimated as:

0.01087, 0.02621, 1.658 0.6134 2.392 ,

and

Σ5.295 1.439 1.7641.439 5.356 1.0991.764 1.099 5.741

10

Following on, the effectiveness of the three indicators was investigated. Table 4-4

shows that the crest factor of the raw signals has the highest relative contribution

ratio. However, the relative contribution ratios of the three features are close to each

other. Therefore, none of the features can be omitted.

Table 4-4: Effectiveness evaluation for the three features extracted from the vibration signals

Features Entropy after HPF at 3000 Hz

Crest factor after HPF at 2500 Hz

Crest factor of the raw signal

Relative contribution ratio 1.594 1.305 2.155

Using the model parameters estimated using the vibration signals collected form

P301D, the RUL of the bearing installed on Pump P301C was estimated as Figure

4-8. At the beginning, the prediction error was significant. This was caused by the

difference between the lifetimes of the training dataset and the test dataset. At the

beginning, only few condition monitoring observations were collected. The RUL

was largely predicted based on the lifetime of the training dataset which was much

shorter than that of the test data. Consequently, the predicted RUL was shorter than

the actual value. When a longer indicator history was considered, the slower

degradation progress of the bearing from P301C was detected. As a result, the

prediction error decreased. Especially at the last stage of the life, prediction results

were very close to real values. Figure 4-8 also illustrates that most actual RUL

values fall in the 95% confidence interval, even at the beginning of the life.


Figure 4-8: RUL prediction results of the bearing on P301C

This research also used degradation indicators from both P301C and P301D as

training data. The RUL of the bearing on P301C was then predicted using this

training result. The RUL predication errors of P301C at different times using both

the degradation indicator sequences and one degradation indicator sequence is

shown in Table 4-5. After the degradation data sequence from the P301C was also

used as the training dataset, the prediction results were improved. The over

conservative estimate of RUL at the beginning was partially overcome; the non-

conservative estimate of RUL was also resolved. This indicates that more reliable

prediction results can be obtained if more training datasets are available.

Table 4-5: RUL prediction results of the bearing on P301C

Operation Hours (Hour) 1 342 480 654 836 1170 2072 2876 3369 3482 3783 4228

RUL prediction errors (both the degradation sequences) (Hour)

540 333 480 394 352 938 250 560 712 375 162 87

RUL prediction errors (one degradation sequence) (Hour)

1141 889 948 950 777 1038 390 346 373 45 126 207


4.4.3 Discussion

In this case study, the inspection intervals were extremely irregular, which varied

from 3 hours to 133 hours. Converting these uneven observation intervals to equal

ones by interpolation is extremely difficult. Therefore, degradation models (e.g.

(Wang 2007)) with the discrete time assumption are not appropriate for this case

study. Moreover, discretising of the degradation indicators is also difficult due to the

inadequate knowledge of the degradation process of a bearing on a LNG pump.

Therefore, the Gamma-based state space degradation model continuous in time and

state is preferable in this case study.

The results of this case study show that the proposed Gamma-based state space

model can combine the information from degradation indicators and lifetimes.

Furthermore, using the particle filtering method, the remaining useful life estimate

can be updated recursively by considering the current degradation indicators.

4.5 Chapter Summary

This chapter jointly models multiple indirect indicators and event data using a

Gamma-based state space model. To deal with the non-Gaussian property of the

proposed model, a Monte Carlo-based EM algorithm has been proposed to estimate

the parameters and the censored degradation data have been considered in the

parameter estimation algorithm. The asset life prediction algorithm has been also

developed using the Monte Carlo method and Bayesian theory. In addition, this

paper has developed an effectiveness evaluation method for degradation indicators

to identify the relative importance of the degradation indicators adopted in the state

space model. The performance of the proposed algorithms has been evaluated in

simulation studies and a real application.

Compared with existing state space degradation models, the developed model is

continuous in time and states, and does not follow the Gaussian assumption. This

continuous property enables the proposed model to process irregular inspection


intervals and avoid discretising continuous degradation indicators. Furthermore, the

monotonic increasing Gamma process used in the proposed model is more

appropriate to model the irreversible asset health degradation processes than the

commonly used Gaussian process. The monotonically increasing property of the

Gamma process also makes the construction of the likelihood function easier than

non-monotonically increasing stochastic processes when failure events are

considered.

105

5 Maintenance Strategy Optimisation Using the POSMDP

Chapter 3 and 4 develop degradation modelling methods that can consider direct

indicators, indirect indicators, and event data. Based on these degradation modelling

methods and additional information about costs and durations of maintenance

activities, optimal maintenance strategies with respect to long-run average cost per

unit time or availability can be further developed. This chapter develops a POSMDP

to optimise maintenance strategies of engineering assets with continuous

degradation processes and partially observable health states.

Due to the limitation of current CM technologies, the actual health state of an asset

may not be revealed accurately by health inspections. A maintenance strategy

ignoring this uncertainty of health inspections can cause additional costs or

downtime. Therefore, the maintenance decision-making should be based on a

degradation model that considers these imperfect inspections, when the uncertainty

of asset health inspections is not negligible. In this Chapter, the state space model is

used to model degradation processes with imperfect inspections. In the state space

degradation model, the current health state can be represented as a distribution

conditional on historical maintenance activities and inspection results. For a state

space model discrete in state, the dimension number of this distribution is equal to

the number of health states minus one. For a continuous state space model, the

dimension number of this distribution can become infinite. Maintenance decision-

making based on these multi-dimensional health state distributions is more complex

than that based on known values of health states.

A commonly used approach to performing maintenance strategy optimisation for a

partially observable degradation process is the POMDP. As an extension of the

MDP, the POMDP can deal with the state dependent maintenance costs (or

durations) and multiple maintenance actions effectively. Moreover, when

performing maintenance strategy optimisation, the POMDP does not assume special

5 Maintenance Strategy Optimisation Using the POSMDP 106

strategy structures (e.g., the control limit theory) which are not necessary optimal.

However, it has been identified that the existing POMDPs adopted in maintenance

decision-making are largely discrete in time and have a limited number of health

states. While these two assumptions make the POMDP more mathematically

tractable, the discrete time assumption requires the health state transitions and

maintenance activities only happen at discrete epochs, which cannot model the

failure time accurately and is not cost-effective. A limited number of health states,

on the other hand, may not be elaborate enough in improving the effectiveness of

maintenance.

To optimise maintenance strategy for the Gamma-based state space model that is

continuous in time and state, this chapter develops a POSMDP which is continuous

in time and state. When the state of a POMDP is continuous, the dimension number

of the health state distributions may become infinite. To reduce the dimension

number of the health state distributions, this research adopts the density projection

method that was used in the parametric POMDP (Brooks et al. 2006; Brooks and

Williams 2007; Zhou et al. to appear). By using a Monte Carlo-based density

projection method, the POSMDP is converted to a completely observable SMDP.

The converted SMDP is then solved using the policy iteration adopted in (Tijms and

van der Duyn Schouten 1985; Moustafa et al. 2004). Because Monte Carlo-based

methods are used during density projection, the proposed POSMDP can deal with

non-Gaussian non-linear state space models.

The remnant of this chapter is organised as follows. Section 5.1 introduces

formulations and notations used in this Chapter. Section 5.2 applies the POSMDP to

optimise the maintenance strategy in which inspection intervals are fixed and

preventive replacement can only happen immediately after inspections. Section 5.3

optimises both the next maintenance activity and the waiting duration until the next

maintenance activity simultaneously. Section 5.4 further considers imperfect

maintenance with random effects and state dependent random durations.


5.1 Problem Formulation

In this chapter, the degradation process of an engineering asset is assumed to follow

the Gamma-based state space model given by the state equation

∆ ~Ga · ∆ , (5-1)

and the observation equation

~ , , (5-2)

where denotes the health state of an asset at time . A larger value of

indicates a worse health state. When crosses a predetermined failure threshold

Λ , a failure will occur. The underlying health state follows a Gamma process.

The function Ga · ∆ , denotes the PDF of a Gamma distribution with a shape

parameter · ∆ and a scale parameter . An asset does not have any initial defects

( 0 0). The observation of the health state is assumed to follow a normal

distribution with a mean value and a standard deviation .

Three types of maintenance activities are considered in this chapter, i.e., health

inspections, replacement (preventive or corrective), and imperfect maintenance. The

costs of the inspection, replacement, and imperfect maintenance are denoted by ,

, and . The corresponding durations are , , and . An additional cost

and a breakdown of length will be incurred when a failure happens. The failure

is assumed to be detected immediately, and followed by a corrective replacement.

Both the corrective and preventive replacement bring an asset to a state as good as

new, while an imperfect maintenance improves the asset health to a state better than

old and worse than new.

In Sections 5.2 and 5.3, the objective function of the maintenance strategy

optimisation is the long-run expected cost per unit time. The optimisation with the

objective to maximise long-run availability is investigated in Section 5.4.


5.2 Regular Maintenance Intervals

This section demonstrates the solving process and the performance of the POSMDP

by investigating a CBM strategy with regular maintenance intervals. In this

maintenance strategy, only preventive replacement and health inspections are

considered, and the inspection interval ∆ is a state independent constant. The

degradation process and the inspection results are assumed to follow Equations (5-1)

and (5-2). A failure is assumed to be detected immediately, and is followed

immediately by a corrective maintenance. The durations to carry out maintenance

activities and the breakdown caused by a failure are assumed negligible (i.e.,

0 ). An optimal maintenance strategy minimising the long-run

expected cost per unit time is developed using the POSMDP. The obtained strategy

is compared with a strategy simply ignoring the observation noise and a heuristic

strategy setting a fixed threshold on average values of filtering particles.

5.2.1 Solving the POSMDP

When a degradation process follows the Gamma-based state space model, the

particle filter is used to estimate the health state and the dimension number of the

beliefs is outsized. These high dimensional beliefs make the POSMDP difficult to

solve. To reduce the dimension of the beliefs, this research first performs a Monte

Carlo based density projection, which projects the beliefs to a parametric distribution

space. Then grid points are selected in this projected belief space. After that the

relative cost functions starting in these grid points are established and the policy

iteration is conducted to identify optimal maintenance strategies at these grid points.

The detailed solving process of the POSMDP is introduced as follows.

5.2.1.1 Density Projection

After particle filtering, the beliefs of the POSMDP can be obtained as a set of

particles and the corresponding weights, i.e. , ; 1,2, … , ∑

1 , where is the th particle of the filtering result, is the weight of and


is the number of particles used during the filtering. The dimension number of the

belief space is equal to 2 1. A large number of particles are often used during

particle filtering to obtain accurate estimates of asset health states. Consequently, the

belief space has a high dimension. To reduce the belief dimension, space is

projected to a new parametric density space Ω ·; ; Θ , where ·; is

the PDF of a certain parametric distribution, and Θ is the parameter space of this

distribution. This density projection is performed by the maximum likelihood

estimation as

Ω b arg max Ω ∑ log ; . (5-3)

Thus the dimension number of the projected belief space Ω is reduced to the the

number of parameters used by the distribution ·; .

The parametric distribution ·; should be able to closely approximate the

original belief with a small number of parameters. According to the particle

filtering results of the Gamma-based state space model, two candidates are

considered in this research, i.e. the Gaussian distribution and the Beta distribution.

The Gaussian distribution is one of the most commonly used distributions, and is

straightforward to apply. However, the domain of the Gaussian distribution is

∞ ∞ , while the filtering particles vary from 0 to Λ . Consequently, this

research censors the original Gaussian distribution to the domain 0 . The PDF

of the censored Gaussian distribution is given by

; , √·

√

. (5-4)

Dissimilar to the Gaussian distribution, the Beta distribution is defined on the

domain 0 1 . Therefore, the ratio of the health state to the failure threshold

can be assumed to follow the Beta distribution, and the PDF of is given by

; ,,

1 · . (5-5)

To compare the two candidate parametric distributions, a simulation study is carried

out. During the simulation study, the parameters of the Gamma-based state space


model was set as 2.5 and 0.5. Different levels of observation noise were

considered, i.e. 0.1, 0.3, 0.5, 0.7, and 0.9. The failure

threshold was Λ 4. These parameters were selected for demonstration only and

without particular physical meanings. The inspection interval was ∆ 0.65, which

is the optimal inspection interval when health states are completely observable. The

derivation of this optimal inspection interval is given in Section 5.2.2. One thousand

sequences of simulated degradation data were generated, and treated by particle

filtering with 1000 particles. The filtering results were then fitted by the censored

Gaussian distribution and the Beta distribution respectively. Because the two

distributions both have two parameters, the fitness results can be simply evaluated

by likelihood values. The distribution with a higher likelihood value is preferred.

The mean values of the likelihood when the censored Gaussian distribution and the

Beta distribution fitted to the data were calculated as Table 5-1. Table 5-1 shows that

the fitting results of the censored Gaussian distribution were better than those of the

Beta distribution, especially for small observation noise. The parameter estimation

results when the two distributions were fitted to the filtering results of the simulation

data with observation noise 0.3 are plotted as Figure 5-1 and Figure 5-2. The

two figures show that the parameter spreading of the censored Gaussian distribution

is more regular than that of the Beta distribution. Consequently, the parameter space

of the censored Gaussian distribution can be discretised more easily. For the above

two reasons, the censored Gaussian distribution is adopted as the projected

parametric distribution ·, .


Figure 5-1: Parameters spreading of the censored Gaussian distribution

Figure 5-2: Parameters spreading of the Beta distribution


Table 5-1: Mean likelihood values of the Censored Gaussian distribution and the Beta

distribution under different observation noise

Observation noise σ 0.1 0.3 0.5 0.7 0.9

Censored Gaussian distribution 956.5 17.62 -331.1 -522.2 -641.4

Beta distribution -2331 -1384 -1037 -850.0 -732.0

To solve the POSMDP, the projected belief space Ω ·, ; Θ should be

discretised. The discretisation of Ω is essentially selecting a set of grid points

, ; 1,2, … , in the parameter space Θ, where is the number of the

grid points. The corresponding sampling points in the projected belief space are

Ω ·, ; 1,2, … , Ω. As shown in Figure 5-1, most points of

parameters appear in a certain area of the parameter space, i.e.,

, |0 4, 0.1 0.34 . In this situation, the grid points are only

chosen from this area with a proper resolution. The principle of selecting the grid

points in Θ depends on the relative cost function that is used in policy iteration and is

discussed in Section 5.2.1.2.

5.2.1.2 The Relative Cost Function of the POSMDP

The relative cost function is a crucial part of the policy iteration algorithm that

solves the SMDP (Maillart 2006). It formulates the relative cost of a single step in a

long-run decision process. According to the assumptions discussed at the beginning

of Section 5.2, the relative cost function can be written as

min , . (5-6)

Here, denotes the relative cost starting in the projected belief state , where

and denote the relative costs starting in if the “preventive

replacement” and “do nothing” strategies are adopted, respectively. Further,

is given by

, (5-7)

where denotes the relative cost when an asset is brand new, i.e., 0;

can be calculated as


1 ∆ |

∑ ∆ | ∆ |; (5-8)

∆ is the inspection interval; is the long-run minimum expected cost per unit time;

∆ | is the expected reliability at the next inspection epoch given that the

current belief state is projected as ; ∆ | is the expected survival time during

the next inspection interval when the current projected belief is . According to the

properties of the Gamma process, ∆ | and ∆ | can be calculated as

∆ | Pr Λ ∆ |·∆ ,

·∆ √

√

(5-9)

and

∆ | |∆

· ,· √

∆

√

. (5-10)

The matrix is the transition matrix in the discretised projected belief space

Ω over one inspection interval, i.e.

Pr ∆ , ∆ . (5-11)

The calculation of is discussed in detail later.

After the relative cost functions are established, an efficient strategy to select grid

points from the parameter space Θ can be developed. According to Equation

(5-7), the relative cost is independent from the projected belief state when the

preventive replacement is the optimal maintenance action. The preventive

replacement is optimal only if an asset is in a poor health state. Subsequently, when

constructing grid points in Θ, only one value of is needed to represent the situation

when the optimal strategy is preventive replacement. On the other hand, a high

resolution should be applied when is near the preventive replacement threshold.

According to Equation (5-8), the calculation of the relative cost at when is near

the preventive replacement threshold may depends on the relative costs at all the


sample points. However, due to the monotonous increasing property of the

underlying degradation process, the value is close to zero if . Therefore,

the resolution of can be lower when it is much smaller than the preventive replace

threshold.

5.2.1.3 Calculation of the Transition Matrix

Due to the non-Gaussian property of the Gamma-based state space model, the

transition matrix is calculated through Monte Carlo-based methods. The belief

state is obtained using the particle filter based on both the previous belief state

∆ and the current observation . Therefore, the observation is

considered during the calculation of . The elements of are calculated as

Pr ∆ , ∆Pr ∆ , ∆ , ∆

· ∆ , ∆. (5-12)

The first component of Equation (5-12) denotes the conditional probability density

of the observation after one inspection interval given the current belief state and the

fact that the failure does not happen during that inspection interval. The second

component of Equation (5-12) is the conditional probability that the discretised

projected belief state equals to at the next inspection epoch given the current

projected belief state, the observation at the next inspection epoch, and the fact that

the failure does not happen. According to Equation (5-12) the Monte Carlo-based

algorithm that calculates is developed as in Table 5-2.

Table 5-2: The Monte Carlo-based method that calculates the transition matrix

Step 1: Generate 2 samples of the health state , , … , from the censored

Gaussian distribution ·, .

Step 2: Predict the corresponding health states , , … , after one inspection

interval according to the state equation (2-1).


Step 3: Resample 2 samples of the health state , , … , from ;

1,2, … ,2 and , i.e. the subset of the health state samples

, , … , whose values indicate that a failure does not happen.

Step 4: Generate observation samples , , … , corresponding to the

health state samples , , … , using the observation equation (5-2).

Step 5: Calculate the weights of the health state samples , , … , ,

using each observation sample , according to the observation equation

(5-2). The calculation process is as follows:

; , ∑ ; , ; 1,2, … , ;

1,2, … , .

Step6: Project the sample-weight sets , ; 1,2, … , ;

1,2, … , to the parametric distribution space Ω according to the projection

function (5-3), and get samples in the projected belief space Ω as

; 1,2, … , .

Step 7: Find the nearest neighbour of each projected belief state in the

discretised projected belief space Ω ; 1,2, … , , record the

frequency of all the elements in Ω as ; 1,2, … , , obviously,

∑ .

Step 8: Obtain elements in the th row of the transition matrix , as

/ ; 1,2, … , . The elements in the other rows of can be obtained

in the same way.


At the first step, 2 instread of samples of the health state are generated so that

the independence between the observation samples , , … , and the health

state samples , , … , is guaranteed. The resampling in the third step is

to satisfy the condition contains in the first component of Equation (5-12), i.e. the

failure does not happen. When the distance between two distributions is measured

during the seventh step, the commonly used Kullback–Leibler divergence (KL

divergence) is a candidate. However, in this research, the grid points

, ; 1,2, … , in the parameter space all locate at the vertices of rectangles.

Therefore, for any point in the parameter space, there is a unique grid point

that satisfies the following two equations simultaneously: arg min , ,…, |

| and arg min , ,…, | | . This research defines a distance measure

·,· given by

·, , ·, | |, (5-13)

which can be calculated much more efficiently than the KL divergence. Because the

calculation of the distance between distributions is performed times for each row

in , adopting ·,· instead of the commonly used KL divergence can improve the

overall efficiency of the algorithm in Table 5-2 significantly.

5.2.1.4 Policy Iteration

The policy iteration is to find an optimal maintenance policy that minimises the

long-run expected cost per unit time. A policy is denoted as

Ω and , , where represents “do nothing”, and stands for

“preventive replacement”. The main idea of policy iteration is calculating a new

policy · iteratively by minimising the relative cost obtained using the current

policy · . This iteration continues until · convergences to an optimal policy

· . For this particular maintenance strategy optimisation problem, the process of

policy iteration is demonstrated in Table 5-3.


Table 5-3: The process of policy iteration for the POSMDP Step 1: Set an initial maintenance policy: and for

1,2, … , , where the belief denotes the brand new health state.

Step 2: Solve the following system of equations of ; 1,2, … , and :

· · ; 1,2, … ,

, where 0, and are given by (5-8) and (5-7), and

· is the indicator function given by

0,1, (5-14)

Step 3: Calculate relative cost functions and 1,2, … , using

the solutions obtained in Step 2.

Step 4: Obtain the improved policy by:

,, 1,2, … , (5-15)

Step 5: If · · , the optimal maintenance strategy · is obtained as

· . Otherwise, go to Step 2 and start a new iteration.

The obtained optimal maintenance strategy · is defined in the discretised

projected belief space Ω . Therefore, when this strategy is implemented, the belief

obtained by the particle filter should be projected to the parametric distribution space

Ω, and then discretised to the space Ω using the nearest neighbourhood method.

5.2.2 Simulation Study

The process of solving the POSMDP entails the projection of beliefs that are

obtained by particle filtering to a parametric distribution space, the discretisation of

the projected belief space, and the Monte Carlo-based method that calculates the


transition matrix. These approximations may affect the optimisation results.

Therefore, it is important to investigate the performance of the developed POSMDP

through a simulation study. During this simulation study, the maintenance strategy

developed using the POSMDP was compared with a strategy simply ignoring the

observation noise and a heuristic strategy setting a preventive replacement threshold

on the mean values of filtering particles.

In this simulation study, the parameters of the Gamma-based state space model were

selected as follows: 2.5, 0.5, and different values of were considered,

i.e., 0.1 , 0.3 , 0.5 , and 0.9 . The failure threshold on the

underlying Gamma process was set as 4 . A large number (i.e. 106) of

degradation sequences were generated for each value of . The costs of

maintenance activities are set as: 0.1, 1, and 10.

When the underlying health states can be observed deterministically, the renewal

theory can be applied to identify the preventive replacement threshold that

minimises the long-run expected cost per unit time. The derivation of the preventive

replacement threshold using the renewal theory was discussed in (Park 1988). Using

the algorithm developed in (Park 1988), the optimal thresholds for different

inspection intervals were calculated and the corresponding long-run expected costs

per unit are plotted in Figure 5-3. As shown in Figure 5-3, the optimal inspection

interval is 0.65 and the corresponding preventive replacement threshold is 1.836.

Setting this preventive replacement threshold on the imperfect inspection results of

the simulation data can get the average costs per unit time as the first column of

Table 5-4. The preventive replacement threshold was also set on the mean values of

filtering particles, and the obtained average costs per unit time are listed in the

second column of Table 5-4. During the particle filtering, 400 particles were used.


Figure 5-3: Minimum long-run average cost according to different inspection intervals when

actual health states are observable

This maintenance decision-making problem was also solved through the POSMDP.

When discretising the projected belief space, the selection of grid points followed

the principle discussed in Section 5.2.1.2. For example, when the observation noise

0.3, the sampling points of the standard deviation were selected from 0.1 to

0.34 with a sample interval 0.03. The mean value was sampled with multiple

resolutions: From 0.2 to 1.4, the resolution was 0.2; from 1.5 to 1.86 the resolution

was 0.01; a single sample point at 1.9 was selected to represent the defective health

state that needs preventive replacement. Four hundred particles were used in the

Monte Carlo method that calculates the transition matrix . The obtained optimal

maintenance strategies when observation noise σ 0.3 are shown in Figure 5-4.

The figure shows that the optimal maintenance strategies do not only depend on the

mean value parameter of censored Gaussian distribution, but also depend on its

standard deviation parameter. Therefore, the maintenance strategy developed using

the POSMDP is different from the heurist strategy that just sets a threshold on the


mean value of the filtering results. The average costs per unit time derived by the

POSMDP are listed in the last column of Table 5-4.

Figure 5-4: The results of the policy iteration when maintenance intervals are regular and the

standard deviation of the observation noise is .

Table 5-4: The long-run average costs derived by three methods (i.e., the method simply

ignoring the observation noise, the heuristic method, and POSMDP) when the observation noise

level is different

σ Ignoring the noise Heuristic method POSMDP

0.1 0.8028 0.8029 0.8028

0.3 0.8176 0.8163 0.8157

0.5 0.8474 0.8427 0.8397

0.7 0.8902 0.8779 0.8706

0.9 0.9363 0.9157 0.9002

Table 5-4 shows that the average costs are almost the same when the observation

noise is not significant. In contrast, when the observation noise becomes


considerable, the POSMDP outperforms the method simply ignoring the observation

noise and the heuristic method that applies a preventive replacement threshold on the

mean values of filtering particles. Therefore, the POSMDP shows its advantages

when dealing with partially observable degradation processes, although

approximations are involved in its solution algorithm. However, the advantages of

the POSMDP cannot be fully reflected under this simple maintenance strategy

structure. Benefits to use the POSMDP are more obvious when state-dependent

inspection intervals, state-dependent costs, and durations of maintenance activities

are involved in maintenance strategy optimisation. In these more complex situations,

the long-run expected cost per unit time derived by the commonly used renewal

theory is difficult to evaluate. The POSMDP is used to investigate the maintenance

decision-making problems under more complex situations in the following two

sections.

5.3 State-Dependent Maintenance Intervals

A fixed maintenance interval is often not cost-effective in practice, and a dynamic

maintenance interval that depends on the current health state of an asset is more

rational. When a defect is detected, a further inspection or a replacement should be

scheduled in a short time to avoid a failure without pre-alarm. On the other hand,

unnecessary maintenance activities should be avoided when an asset is still in a good

health condition, because these unnecessary maintenance activities can introduce

additional cost. A premature preventive replacement can reduce the useful life of an

asset. Some health inspections can be also expensive. For example, the inspection of

compressor blades on an aircraft engine involves engine disassembly (Hopp and

Kuo 1998). Some inspections in a process industry require disturbance of production

and the removal of highly corrosive and/or toxic chemicals from equipment.

Therefore, maintenance intervals should be optimised according to the current health

state to reduce the cost.

This section develops a POSMDP that can develop maintenance strategies with

state-dependent maintenance intervals. In this section, the action space of the


POSMDP consists of two components: one is the next maintenance activity (i.e. the

inspection or the preventive replacement), and the other is the waiting duration till

the next maintenance activity. Consequently, both the next maintenance activity and

its corresponding waiting time are optimised by the POSMDP.

In this section, the degradation process still follows Equations (5-1) and (5-2). The

costs of inspection, replacement, and unexpected breakdown (i.e., , , and )

are assumed to be state independent. The durations for maintenance activities and

the unexpected breakdowns are still not considered.

5.3.1 The Formulations and Solution Method of the POSMDP

When the waiting duration for the next maintenance activity is considered, the

relative cost function becomes

min , ,,

, ∆ , , ∆ . (5-16)

Here, , ∆ given by

, ∆1 ∆ |∆ | ∆ |

1, ,

0

(5-17)

denotes the relative cost when the current projected belief state is , and preventive

replacement is conducted after ∆ , where 0,1, , and ∆ is the

maximum waiting time for the next preventive replacement. Similarly,

, ∆ given by

, ∆ 1 ∆ |∑ ∆ | ∆ | (5-18)

denotes the relative cost when the current projected belief state is , and an

inspection is conducted after ∆ , where 1,2, , and ∆ is the

maximum delay time for the next inspection. In this research, the resolution of the

sampling points of the waiting time for the next preventive replacement is higher

than that of the waiting time for the next inspection, i.e. ∆ ∆ . This difference

in resolutions is due to two reasons. Firstly, the next replacement time need to be


determined more accurately, because the cost for preventive replacement is much

higher than that of a inspection in most situations, i.e., . Secondly,

, ∆ can be calculated more efficiently and a higher resolution does not

bring down the overall efficiency of the solving algorithm.

In Equations (5-17) and (5-18), · | · and · | · are the expected conditional

reliability and survival time which are calculated according to Equations (5-9) and

(5-10) respectively. ( 1,2, , ) is the transition matrix of the

discretised projected beliefs given that the transition epoch is ∆ and a health

inspection is conducted. The calculation of follows the same process in Table

5-2. The values of · | · and · | · are calculated analytically, while ;

1,2, , are identified through time consuming Monte Carlo-based method.

Therefore, the calculation of ; 1,2, , is the bottleneck of the

efficiency of the whole solving algorithm. This research partially addresses this

bottleneck by avoiding calculating the unnecessary rows in ; 1,2, , .

According to Equations (5-17) and (5-18), is only used to calculate the relative

cost when an inspection is conducted after ∆ . Therefore, when the reliability of

an asset is above some threshold, i.e., ∆ | , the inspection is not

necessary and is not required. Similarly, when the reliability is below some

threshold, i.e. ∆ | , preventive replacement is preferred and is

not necessary neither. By choosing proper thresholds and , the computing time

can be reduced significantly. Because, rows in are calculated independently,

additional rows can be added into the original calculation results when optimisation

results show that the and are not set appropriately.

After the elements in Equations (5-17) and (5-18) have been calculated, the optimal

maintenance activity and waiting time according to each discretised projected belief

state can be identified using the policy iteration.


In this section, beliefs of POSMDP are still projected into the censored Gaussian

distribution given by Equation (5-4). However, the sampling strategy of the

parameters in Equation (5-4) is different from that used in Section 5.2. In Section 5.2

the mean parameter of the projected belief states near the preventive replacement

threshold was sampled at a higher resolution. In this section, both the maintenance

activity and its waiting duration are to be optimised. The approximate thresholds for

these action-duration combinations are difficult to estimate beforehand. Therefore,

regular grid points are adopted to discretise the parameter space of the projected

parametric density in this section.


A simulation study was conducted to investigate the performance of the developed

POSMDP with state-dependent maintenance intervals. Firstly, optimal maintenance

strategies for different observation noise levels and inspection costs were identified

and effects of these two parameters on the optimal maintenance strategies structures

were investigated. Secondly, optimisation results obtained using the POSMDP were

compared with those derived by another maintenance optimisation algorithm

developed by Wang (Wang and Christer 2000; Wang 2003b).

In this simulation study, parameters of the Gamma-based state space model were

assumed as: 2.5, 0.5, and 4. Different values of (i.e., 0.1,

0.3, 0.5, and 0.7) were used. The costs of preventive replacement

and unexpected breakdown were 1 and 10 . Different values of

inspection cost, i.e., 0.1, 0.3, and 0.5, were used.

In this simulation study, the sampling resolutions and the maximum sampling unit

number of the waiting duration till the next maintenance action were ∆ 0.01,

200, ∆ 0.05, and 26. The mean parameter of the projected belief

state was sampled from 0.2 to 3.8 with a fixed interval of 0.02. The sampling points

of the standard deviation parameter were selected according to different standard

deviations of observation noise. The upper and lower thresholds of the reliability


for calculating the elements of transition matrices were set as: 0.9999 ,

0.95. The results of the policy iteration showed that the interval

included all the situations when the inspection was the optimal action. When

calculating the elements of the transition matrices, 400 particles were used by the

Monte Carlo method. Optimal maintenance strategies were then obtained through

the policy iteration. Some results of the policy iteration for different standard

deviations of observation noise and inspection costs are demonstrated in Figure 5-5,

where the numbers above the marks are the waiting durations till the corresponding

maintenance actions. As shown in Figure 5-5, when the observation noise and

inspection cost were not substantial, an inspection was the optimal maintenance

activity when the asset health state was under a certain threshold. On the contrary,

when the observation noise was significant or the inspection was costly, the only

optimal maintenance activity was preventive replacement. In this situation,

performing an inspection was not economical for every projected belief state and the

optimal maintenance strategy became a time-based preventive maintenance strategy.

In this simulation study, when , 0.1, 0.5 , 0.3, 0.5 , 0.5, 0.3 ,

0.5, 0.5 , 0.7, 0.3 , 0.7, 0.5 , the inspection was not economical for every

projected belief state. The optimal time-based replacement interval obtained by the

POSMDP was 1.5. This result was consistent with that derived using the renewal

theory, i.e. 1.5014. Therefore, the proposed POSMDP can adopt different

maintenance strategy structure according to the change of observation noise level

and inspection cost.

Figure 5-5 (a) shows that the waiting durations can increase with standard deviation

for small mean values at certain points. Similarly, Figure 5-5 (c) shows that the

optimal maintenance strategy shift from preventive replacement to inspections with

the increment of the mean value . This is due to the approximations adopted in the

developed solving algorithms. The censored normal distribution is not close to the

filtering result of the Gamma-based state space model when the mean parameter is

small. Therefore, using different types of distributions to approximate filtering

results can improve the maintenance strategy optimisation result at the expense of


increasing complexity of algorithm. The result of this simulation study shows that

current algorithms have already obtained better performance than that of an existing

approximate optimisation algorithm proposed by Wang (Wang and Christer 2000;

Wang 2003b). The POSMDP that performs density projection to a space with multi-

type distributions will be investigated in future.

(a) 0.3, 0.1


(b) 0.3, 0.5

(c) 0.1, 0.3


(d) 0.7, 0.3

Figure 5-5: Some results of the policy iteration for POMDP with irregular maintenance

intervals (the numbers in rectangles are the optimal waiting durations till the corresponding

maintenance actions)

These obtained maintenance policies were applied to the simulated degradation data.

For each pair of an observation noise standard deviation value and an inspection

cost, 5 10 sequences of simulated degradation data were generated. The average

costs per unit time were then calculated as the third column of Table 5-5. The

simulation results showed that a lower average cost could be obtained when the

observation noise was small and the cost of inspection was inexpensive. This is

consistent with the intuition: the maintenance cost can be reduced by using accurate

and cost-effective health inspection technologies.

The proposed POSMDP was compared with another approximate maintenance

strategy optimisation method developed by Wang (Wang and Christer 2000; Wang

2003b). Wang identified the optimal replacement time given a fixed inspection

interval through an approximate renewal theory. The optimal inspection interval was


identified through simulation studies. The state space degradation model

investigated by Wang is different from the Gamma-based state space model.

However, the assumptions of the two models are similar, and the method developed

by Wang can also be used to process the Gamma-based state space model with slight

modification. For the Gamma-based state space model, the approximate renewal

theory used by Wang can be written as

1 | 1 ∆ 1 | , (5-19)

where is the remainder time to a replacement and denotes the health state

estimate at the th inspection point. For the Gamma-based state space model, the

health state estimates are obtained by particle filtering. The | is the expected

reliability before the next replacement given the current belief state . The |

is the expected survival time before the next replacement when the current belief

state is . An optimal replacement time can then be obtained by minimising

Equation (5-19). If ∆ , a preventive replacement is carried out after .

Otherwise, another inspection is performed after ∆ , and a new decision is to be

made based on the new inspection result.

To identify a cost-effective inspection interval, Wang performed repeated simulation

studies using different inspection intervals. The inspection interval that has the

lowest average cost per unit time is selected as the optimal inspection interval ∆ . In

this research, the number of simulations for each inspection interval is 6×104. The

average costs corresponding to the optimal inspection interval ∆ are listed in the last

column of Table 5-5.

As shown in Table 5-5 and Table 5-4, both the POSMDP that considers state

dependent maintenance intervals and Wang’s method can develop more cost

effective maintenance strategy than the POSMDP that adopts fixed maintenance

intervals. However, the POSMDP has better performance than the approximated

renewal theory developed by Wang, especially when the inspection noise and cost is

not significant. The reason is that the proposed POSMDP uses a state dependent

inspection interval instead of the fixed inspection interval adopted by Wang. Wang


also provided a method to identify state dependent inspection intervals in (Wang

2003b) by conducting intensive simulation studies after every inspection. This

method that identifies the optimal inspection interval is applicable for Wang’s state

space model in which the approximate average cost per unit time can be evaluated

efficiently. For the Gamma-based state space model, Wang’s method is not

applicable due to its low efficiency. On the other hand, when the strategy derived by

the POSMDP is implemented, the optimal maintenance activities and their

corresponding waiting durations are identified according to the pre-calculated policy

function · . Therefore, the proposed POSMDP can consider the state dependent

inspection interval more efficiently.

Table 5-5: The long-run average costs per unit time derived by the POSMDP with irregular

inspection interval and the method proposed by Wang (Wang and Christer 2000; Wang 2003b).

Observation

noise ( )

Inspection

cost ( )

Average cost derived

by POSMDP

Average cost derived

by Wang’s method

0.1 0.1 0.7329 0.7548

0.1 0.3 0.8745 0.8765

0.1 0.5 0.8927 0.8925

0.3 0.1 0.7477 0.7723

0.3 0.3 0.8871 0.8924

0.3 0.5 0.8920 0.8949

0.5 0.1 0.7739 0.7985

0.5 0.3 0.8930 0.8955

0.5 0.5 0.8926 0.8938

0.7 0.1 0.8020 0.8239

0.7 0.3 0.8921 0.8934

0.7 0.5 0.8928 0.8940


5.4 Maintenance Strategy Considering Imperfect

Maintenance

The above two sections only consider preventive replacement that brings an asset to

a brand new state. However, for some engineering asset, a more cost-effective

option for a moderate degradation state is imperfect maintenance. Compared with

preventive replacement, imperfect maintenance can be performed more

economically and efficiently; though it can only partially improve the health state of

an asset. Therefore, when preventive replacement and imperfect maintenance can be

both adopted, a maintenance strategy should strike a balance between maintenance

effects and maintenance costs (or durations). For the commonly used renewal

theory, the selection from preventive replacement and imperfect maintenance is not

straightforward. Some special strategy structure (e.g., control limit theory) should be

assumed to establish the ratio of the expected cost per renewal cycle and the

expected length of a renewal cycle. Unfortunately, these special strategy structures

are not optimal in all the situations (Moustafa et al. 2004).

As an extension of MDP, the proposed POSMDP simply treats preventive

replacement and imperfect maintenance as two different actions when modelling a

maintenance decision process. The two actions are selected according to the current

health state of an asset, and no special strategy structure is required. Subsequently,

the POSMDP can develop a strategy considering both preventive replacement and

imperfect maintenance under a flexible strategy structure. In addition, the POSMDP

decomposes a long-run maintenance decision process into single maintenance

cycles. Consequently, random effects and durations of imperfect maintenance can be

formulated easily.

In this section, imperfect maintenance is assumed to improve the health of an asset

to a random level which depends on the current health state. The duration of

imperfect maintenance is also assumed to be a random value that relates to the

current health state. In this section, the objective function of the maintenance

strategy optimisation is the long-run availability (i.e. the ratio of the running time to


the total time). In some real applications, the costs of maintenance activities and

failures are difficult to estimate accurately, and the cost objective function may be

sensitive to the uncertainty of these cost estimates. On the other hand the running

time and the down time of an asset can be measured accurately. Therefore, the

availability is a more effective objective to optimise maintenance strategies in these

applications, and this section shows that the proposed POSMDP can be also used to

develop a maintenance strategy that maximises the availability of an asset.

It is worth mentioning that the uncertainty of the costs of different maintenance

actions can be also considered by the algorithms developed in this section. Due to

limited course duration, this consideration has not been further explored in this

thesis.

5.4.1 The Formulations and the Solution Method of the POSMDP

In this section, the degradation process is still assumed to follow the Gamma-based

state space model given by Equations (5-1) and (5-2). Three types of maintenance

activities are considered, i.e. health inspections, replacement, and imperfect

maintenance. The duration of a health inspection is denoted by ; the duration of

the replacement is ; the length of a breakdown caused by a failure is denoted by

. It is assumed that , and , , are all state-independent. The

duration of imperfect maintenance follows the exponential distribution with a mean

value given by

· exp , (5-20)

where is the current underlying health state and denotes the corresponding

random duration of imperfect maintenance. The ratio of the underlying health states

after and before imperfect maintenance is modelled by a Beta distribution and the

PDF of the underlying health state after the imperfect maintenance can be calculated

as:

|,

1 · ,(5-21)


where and denote the underlying health states before and after the

imperfect maintenance respectively. It is assumed that, inspections, replacement, and

imperfect maintenance can be only performed when an asset is shut down.

Maximising the long-run availability is equal to minimising the long-run expected

downtime per unit time. Consequently, the relative cost functions developed in

Section 5.2 and 5.3 are modified to a relative downtime function given by

min , ,,

, ,

, ∆ , , ∆ , , ∆ . (5-22)

Here, , ∆ denotes the relative downtime when the initial projected

belief state is and preventive replacement is performed after ∆ . Similarly

the relative downtime when the inspection and imperfect maintenance is selected is

denoted as , ∆ and , ∆ , respectively, where ∆ and

∆ are the delay time to carry out an inspection and imperfect maintenance. An

assumption used in this section is that every imperfect maintenance activity is

followed by an immediate inspection. This inspection is to measure the result of the

imperfect maintenance, and its duration is included in that of the imperfect

maintenance.

The relative downtime starting in projected belief state when preventive

replacement will be performed after ∆ is calculated as

, ∆1 ∆ |∆ | ∆ |

1, ,

0

, (5-23)

where ∆ | is the expected waiting time till the next maintenance activity.

The next maintenance activity can be the planned preventive replacement, or a

corrective replacement that follows an unexpected failure. ∆ | is

calculated as

∆ | |∆ 1 ∆ | . (5-24)


The relative downtime when the inspection is performed after ∆ is formulated

as

, ∆ 1 ∆ |∑ ∆ | ∆ | . (5-25)

The expected waiting time till the next maintenance activity, i.e., ∆ | , is

given by

∆ | |∆

∆ | 1 ∆ |. (5-26)

The relative downtime starting in projected belief state when the imperfect

maintenance is performed after ∆ is given by

, ∆1 ∆ |

∆ | ∑· ∆ | ∆ |

1, ,

0| ∑ 0| 0

. (5-27)

Here, Δ | is the expected duration of the imperfect maintenance performed

after Δ given the current projected belief state , which is given by

Δ | Δ , Δ

· exp Gam ; · ∆ , ; ,. (5-28)

After Δ | is worked out, the expected waiting time till the next maintenance

activity can be calculated as

∆ | |∆

∆ | 1 ∆ |. (5-29)

Another component in Equation (5-27) is the transition matrix (i.e.

0,1 , )) of the projected belief states after imperfect maintenance.

Elements of can be denoted as

Pr b ∆ |b , ∆ ,

∆ , Λ ∆, (5-30)


where ∆ means that the optimal action after ∆ is

imperfect maintenance, and ∆ is the observation after the imperfect

maintenance. The transition matrix is worked out through Monte Carlo-based

method as Table 5-6.

Table 5-6: The process to calculate the transition matrix using the Monte Carlo based method Step 1: Generate 2 samples of the health state , , … , from the censored

Gaussian distribution ·, .

Step 2: Predict the corresponding health states , , … , after ∆

according to the system equation (5-1).

Step 3: Resample 2 samples of the health state , , … , from ;

1,2, … ,2 , i.e., the subset of the samples , , … ,

whose values indicate a failure does not happen.

Step 4: Generate 2 samples of the health state , , … , after the

imperfect maintenance corresponding to the original health state samples

, , … , through Equation (5-21).

Step 5: Generate observation samples , , … , corresponding to the

health state samples , , … , using the observation equation (5-2).

Step 6: For each observation sample , calculate the weights of the health state

samples , , … , , according to the observation equation (5-2):

; , ∑ ; , ; 1,2, … , ;

1,2, … , .


Step7: Project the sample-weight sets , ; 1,2, … , ;

1,2, … , to the parametric distribution space Ω according to the projection

function (5-3), and get samples in the projected belief space Ω as

; 1,2, … , .

Step 8: Find the nearest neighbour of each projected belief state in the

discretised projected belief space Ω ; 1,2, … , , record the

frequency of all the elements in Ω as ; 1,2, … , , obviously,

∑ .

Step 9: Obtain the th row of the transition matrix , as / ;

1,2, … , . The elements in the other rows of can be obtained in the

same way.

After different components in the relative downtime function are worked out, the

policy iteration is used to identify the optimal maintenance strategy that maximises

the availability.


This simulation study investigated the effects of different maintenance activity

durations on policy iteration results. The parameters of the Gamma-based state space

model were selected as 2.5, 0.5, 0.3. The length of the breakdown

caused by an unexpected failure was assumed as 1. Different durations of

inspections and the preventive replacement were adopted, i.e., 0.003,

0.1 , 0.01, 0.1 , 0.003, 0.3 , and 0.003, 0.05 .

The expected duration of imperfect maintenance was assumed to follow Equation

(5-20), where: 0.02 , 0.003 , and 0.75 . The effects of imperfect

maintenance were given by Equation (5-21), where 2, and 3.


The optimal maintenance strategies were derived as the process discussed in Section

5.4.1. Different strategies were obtained as Figure 5-6 according to different

durations of inspection and replacement durations. The four subfigures in Figure 5-6

demonstrate four different maintenance structures. Figure 5-6 (a) shows a typical

maintenance strategy structure that uses all the three types of maintenance activities.

The inspection is adopted when an asset is in a good health state. After the asset

degrades to a certain level, the imperfect maintenance is a better option. As the

degradation continuous, a preventive replacement becomes the most cost-effective

option. Figure 5-6 (b) shows that an inspection is not an optimal action for all belief

states if the duration of an inspection is too long. Similarly, as shown in Figure 5-6

(c) when a replacement is time consuming, the imperfect maintenance is performed

even when the asset is in a highly degraded state. On the contrary, as shown in

Figure 5-6 (d), when the preventive replacement can be carried out efficiently, the

imperfect maintenance is not optimal in all situations.

(a) 0.003, 0.1


(b) 0.01, 0.1

(c) 0.003, 0.3


(d) 0.003, 0.05

Figure 5-6: Some results of the policy iteration for POMDP considering imperfect maintenance

(the numbers in rectangles are the optimal waiting durations till the corresponding maintenance actions)

This simulation study shows that the POSMDP can develop maintenance strategies

with various structures according to different durations of maintenance activities.

Therefore, the POSMDP is an effective tool to optimise the maintenance strategy

when multiple maintenance actions can be selected from. Moreover, because the

POSMDP decomposes a long-run decision process into single steps, the state

dependent durations and effects of maintenance activities can be formulated easily.

Similar to the results obtained in Section 5.3 the next optimal maintenance activities

and the corresponding waiting time demonstrated in Figure 5-6 do not change

monotonically. This is caused by the approximate solving algorithm used in this

research. Adopting a hybrid parametric distribution set as the projected belief space

may improve the result. However the current solving algorithm already shows its

effectiveness in identifying the structure property of maintenance strategies, and the

non-monotone of obtained optimal maintenance strategies is not significant. The


POSMDP using multi-type parametric distributions as projected belief space will be

investigated in the future.

5.5 Chapter Summary

This chapter has developed a POSMDP for maintenance strategy optimisation when

the health state of an asset can be only partially observed. Compared with the

existing POMDP methods that optimise maintenance strategies, the developed

POSMDP does not have the assumption of discrete time and state. Without these

two assumptions, degradation processes can be modelled more accurately and more

cost-effective maintenance strategies can be developed. In this chapter, the

formulations and solving methods of the POSMDP for three different maintenance

decision-making problems have been discussed in detail. Simulation studies have

been performed to validate the effectiveness of the POSMDP applied to maintenance

decision-making. The results show that the developed POSMDP can derive cost-

effective maintenance strategies with flexible structures.

The proposed maintenance decision-making method has several advantages. Firstly,

Monte Carlo-based methods are used to solve the POSMDP. Consequently, the

POSMDP can be adopted to deal with various state space models without Gaussian

and linear assumptions. Secondly, as an extension of MDP, the POSMDP can

optimise the maintenance strategies without specifying a predetermined strategy

structure. Therefore, the POSMDP can derive more flexible maintenance strategies

when multiple maintenance activities are available. Finally, the POSMDP

decomposes a long-run decision process into single steps. Therefore, some practical

issues (e.g., the state dependent maintenance costs or durations, and the uncertain

maintenance effects) can be formulated easily.

Though some investigations on the POSMDP have been performed by this chapter,

further research is still needed. One possible extension of the proposed POSMDP is

changing the horizon of the current POSMDP from infinite to finite. For a finite


horizon, the policy iteration used in this chapter is no longer effective and a new

solving method is required.

142

6 Conclusions and Future Research Directions

6.1 Conclusions

The state space model has proven to be an effective tool to model the asset

degradation process where health states are only partially observable. A

comprehensive literature review has divulged the existing state space model used in

asset degradation modelling largely follows discrete time, discrete state, linear and

Gaussian assumptions:

1) The discrete time assumption implies that inspections and failures only

happen at predetermined discrete time with fixed intervals.

2) The discrete state assumption requires classifying continuous degradation

states into finite number of states. This classification largely depends on the

expert knowledge. The finite number of states may not be elaborate enough

in describing the asset health state.

3) The linear and Gaussian assumption, on the other hand, is not in consistent

with nonlinear and monotonically increasing property of most engineering

asset degradation processes between two adjacent maintenance activities.

To address these limitations, this research adopts a Gamma-based state space model

that describes partially observable asset degradation processes. The parameter

estimation, lifetime prediction, and maintenance strategy optimisation algorithms for

the Gamma-based state space degradation model have been developed. The

developed models and algorithms have been justified by simulation and field data.

The thesis has presented three pieces of original work as follows:

1) Degradation process modelling of direct and indirect indicators using the

Gamma-based state space model;

2) Joint modelling of failure events and multiple indirect indicators using the

Gamma based state space model;

3) Maintenance strategy optimisation using the continuous state POSMDP.

Detailed conclusions are summarised in following sections.

6 Conclusions and Future Research Directions 143

6.1.1 Modelling Correlated Degradation Processes of Direct and Indirect Indicators

Direct indicators provide more accurate references for asset life prediction and

maintenance decision-making; however, they are often more difficult to obtain and

are often incomplete. Indirect indicators, on the other hand, can be collected more

easily through various condition monitoring techniques and are more available. The

state space model provides an efficient approach to estimate direct indicators using

indirect indicators. However, existing state space models that describe the

degradation processes of direct and indirect indicators largely follow the discrete

time, discrete state, linear and Gaussian assumptions.

To address this research gap, several original contributions have been made:

1) This research for the first time uses a Gamma-based state space model to

describe the degradation processes including both direct and indirect

indictors.

2) This research has developed a Monte Carlo-based EM algorithm to estimate

the model parameters. Both direct and indirect indicator observations are

considered by the EM algorithm.

The Gamma-based state space model has several advantages while modelling the

degradation processes of direct and indirect indicators

1) The underlying Gamma process can achieve a better fitness result when

describing the monotonically increasing degradation process of a direct

indicator than the Gaussian process.

2) More complex relationships between direct and indirect indicators can be

described by the nonlinear property of the Gamma-based state space model.

3) The situation that direct indicator observations are less than indirect indicator

observations can be dealt with by the EM algorithm developed in this

research.


6.1.2 Joint Modelling of Failure Events and Multiple Indirect Indicators

For some degradation processes, direct indicators are difficult to obtain; instead,

failure events information is more available. In these situations, failure event data

should be utilised, and should especially be utilised jointly with multiple indirect

indictors to improve the confidence of the model outcome for an asset degradation

process. Two issues need to be addressed on this regard. Firstly both the failure

times and censoring times of assets should be considered. The second issue is that

the information from multiple indirect indicators should be fused appropriately. This

research uses the Gamma-based state space model to address these two issues, and

some original work has been done in this research:

1) A Monte Carlo based EM algorithm to consider multiple indirect indicators

failure times and censoring times has been developed.

2) A parametric bootstrap method to evaluate the effectiveness of different

degradation indicators has also been developed.

The models and algorithms developed in this research for asset life prediction using

failure events and multiple degradation indicators have several advantages:

5) The situation where event data are insufficient can be overcome considering

both degradation indicators and event data.

6) The likelihood function that considers failure events can be established more

concisely using the monotonically increasing Gamma process.

7) A more cost effective condition monitoring system can be established when

only installing necessary sensors, and the size of the database that stores

degradation indicators can be reduced, after the effectiveness of different

degradation indicators is identified.

8) Other types of nonlinear non-Gaussian state space models can be also

processed by the algorithms developed in this research. Therefore, the failure

time following different distributions can be modelled by the state space

degradation model.


6.1.3 Maintenance Strategy Optimisation Using the Continuous State POSMDP

This research models the maintenance decision-making process for the Gamma-

based state space model as a continuous state POSMDP. When the state of the

POSMDP is continuous, the dimension of the belief space can become infinite. This

research converts the POSMDP to a SMDP through a Monte Carlo density

projection method. The converted SMDP is then solved by the policy iteration.

Compared with existing POMDPs that are also applied in maintenance strategy

optimisation, the new features of the POSMDP is as follows:

1) The POSMDP used in this research is continuous in time which can model

maintenance activities and failures that happen at random times.

2) The POSMDP is continuous in state, while existing POMDPs used in

maintenance strategy optimisation only have limited number of states.

3) The POSMDP adopted in this research is based on a non-Gaussian state

space degradation model.

The simulation study shows that the continuous state POSMDP has several

advantages:

1) The POSMDP can derive more cost-effective maintenance strategies than

existing approximate maintenance strategy optimisation methods when state

space degradation models continuous in time and state are used.

2) Maintenance strategies with both regular and state-dependent inspection

intervals can be optimised by the POSMDP.

3) Flexible maintenance strategies with multiple maintenance activities can be

developed for different values of maintenance costs (durations)

4) The POSMDP decomposes a long-run decision process into single steps.

Therefore, concise formulations can be obtained.


6.2 Future Research

As state earlier, this research for the first time systematically investigates the

application of nonlinear non-Gaussian state space model in asset degradation

modelling. Several potential future research directions are presented as follows.

1) The computational efficiency of the proposed Monte Carlo-based algorithm

is still to be enhanced. The parameter estimation and maintenance

optimisation algorithms entails Monte Carlo-based algorithm that are

inefficient. Although, this research has developed strategies to make the

algorithms less computational expensive, more efficient algorithm will

improve the applicability of nonlinear non-Gaussian state space models in

the real world scenario.

2) During the parameter estimation of the Gamma-based state space model, the

choice of particle number and the convergence criterion largely follow

empirical approaches. A method to adaptively choosing the particle number

is to be developed. In addition, a more effective and efficient convergence

criterion is to be investigated.

3) In this research, an asset is regarded as a single component. In practice, an

asset often consists of multiple components. Two relationships between the

components should be taken into consideration, i.e. the stochastic

dependence and the economic dependence (Castanier et al. 2005).

Applications of the state space model to the life prediction and maintenance

decision-making of a multi-component system are required.

4) This research optimises maintenance strategies according to the long-run

expected cost per unit time or long-run expected availability. The

maintenance strategy optimisation method for a finite horizon has not been

discussed. When the optimisation horizon is finite, the POSMDP cannot be

solved by the policy iteration, and further research is required.


5) This research used the censored Gaussian distribution and the projected

belief space in POSMDP. In practice, using multi-type parametric

distributions as the projected space may improve the results of maintenance

strategies optimisation. However, efficient projection and distance

measurement algorithms are to be developed before apply the projected

belief space with multi-type distributions.

6) This research optimises maintenance strategies by simply minimising or

maximising an objective function. In reality, some constraints need to be

considered. These constraints can be reliability, availability, costs, or the

resource used during maintenance activities.

148

7 References

Akaike, H. (1974). "A New Look at the Statistical Model Identification." Automatic

Control, IEEE Transactions on 19(6): 716-723.

Amari, S. V. and L. McLaughlin (2004). Optimal Design of a Condition-Based

Maintenance Model. Reliability and Maintainability, 2004 Annual

Symposium - RAMS. L. McLaughlin: 528-533.

Amari, S. V., L. McLaughlin, et al. (2006). Cost-Effective Condition-Based

Maintenance Using Markov Decision Processes. Reliability and

Maintainability Symposium, 2006. RAMS '06. Annual: 464-469.

Andrieu, C., A. Doucet, et al. (2004). "Particle Methods for Change Detection,

System Identification, and Control." Proceedings of the IEEE 92(3): 423-

438.

Arulampalam, M. S., S. Maskell, et al. (2002a). "A Tutorial on Particle Filters for

Online Nonlinear/Non-Gaussian Bayesian Tracking." Signal Processing,

IEEE Transactions on 50(2): 174-188.

Arulampalam, M. S., S. Maskell, et al. (2002b). "A Tutorial on Particle Filters for

Online Nonlinear/Non-Gaussian Bayesian Tracking." IEEE Transactions on

Signal Processing 50(2): 174-188.

Banjevic, D. and A. K. S. Jardine (2006). "Calculation of Reliability Function and

Remaining Useful Life for a Markov Failure Time Process." IMA J

Management Math 17(2): 115-130.

Barata, J., C. G. Soares, et al. (2002). "Simulation Modelling of Repairable Multi-

Component Deteriorating Systems for `on Condition' Maintenance

Optimisation." Reliability Engineering & System Safety 76(3): 255-264.

7 References 149

Bertsekas, D. P. (2005). Dynamic Programming and Optimal Control. Belmont,

Mass., Athena Scientific.

Black, M., A. T. Brint, et al. (2005). "A Semi-Markov Approach for Modelling

Asset Deterioration." The Journal of the Operational Research Society

56(11): 1241.

Blischke, W. R. and D. N. P. Murthy (2000). Reliability : Modeling, Prediction, and

Optimization. New York, Wiley.

Bris, R., E. Châtelet, et al. (2003). "New Method to Minimize the Preventive

Maintenance Cost of Series-Parallel Systems." Reliability Engineering &

System Safety 82(3): 247-255.

Brooks, A., A. Makarenko, et al. (2006). "Parametric Pomdps for Planning in

Continuous State Spaces." Robotics and Autonomous Systems 54(11): 887-

897.

Brooks, A. and S. B. Williams (2007). A Monte Carlo Update for Parametric

Pomdps. International Symposium on Research Robotics.

Bunks, C., D. McCarthy, et al. (2000). "Condition-Based Maintenance of Machines

Using Hidden Markov Models." Mechanical Systems and Signal Processing

14(4): 597-612.

Cadini, F., E. Zio, et al. (2009). "Model-Based Monte Carlo State Estimation for

Condition-Based Component Replacement." Reliability Engineering &

System Safety 94(3): 752-758

Cassandra, A. R., M. L. Littman, et al. (1997). Incremental Pruning: A Simple, Fast,

Exact Method for Partially Observable Markov Decision Processes.

Uncertainty in Artificial Intelligence (UAI).

7 References 150

Castanier, B., A. Grall, et al. (2005). "A Condition-Based Maintenance Policy with

Non-Periodic Inspections for a Two-Unit Series System." Reliability

Engineering & System Safety 87(1): 109-120.

Cavanaugh, J. and R. Shumway (1997). "A Bootstrap Variant of Aic for State-Space

Model Selection." Statistica Sinica 7: 473-496.

Chan, G. K. and S. Asgarpoor (2006). "Optimum Maintenance Policy with Markov

Processes." Electric Power Systems Research 76(6-7): 452-456.

Chen, D. and K. S. Trivedi (2005). "Optimization for Condition-Based Maintenance

with Semi-Markov Decision Process." Reliability Engineering & System

Safety 90(1): 25-29.

Chopin, N. (2002). "A Sequential Particle Filter Method for Static Models."

Biometrika 89(3): 539-551.

Christer, A. H. and W. Wang (1995). "A Simple Condition Monitoring Model for a

Direct Monitoring Process." European Journal of Operational Research

82(2): 258-269.

Christer, A. H., W. Wang, et al. (1997). "A State Space Condition Monitoring Model

for Furnace Erosion Prediction and Replacement." European Journal of

Operational Research 101(1): 1-14.

Cinlar, E., E. Osman, et al. (1977). "Stochastic Process for Extrapolating Concrete

Creep." Journal of the Engineering Mechanics Division 103(6): 1069-1088

Cox, D. R. (1972). "Regression Models and Life-Tables." Journal of the Royal

Statistical Society. Series B (Methodological), 34(2): 187-220.

Crowder, M. and J. Lawless (2007). "On a Scheme for Predictive Maintenance."

European Journal of Operational Research 176(3): 1713-1722.

7 References 151

Dempster, A. P., N. M. Laird, et al. (1977). "Maximum Likelihood from Incomplete

Data Via the Em Algorithm." Journal of the Royal Statistical Society. Series

B (Methodological) 39(1): 1-38.

Doucet, A., S. Godsill, et al. (2000). "On Sequential Monte Carlo Sampling Methods

for Bayesian Filtering." Statistics and Computing 10(3): 197-208.

Doucet, A., S. J. Godsill, et al. (2002). "Marginal Maximum a Posteriori Estimation

Using Markov Chain Monte Carlo." Statistics and Computing 12(1): 77-84.

Doucet, A. and V. Tadić (2003). "Parameter Estimation in General State-Space

Models Using Particle Methods." Annals of the Institute of Statistical

Mathematics 55(2): 409-422.

Frangopol, D. M., M.-J. Kallen, et al. (2004). "Probabilistic Models for Life-Cycle

Performance of Deteriorating Structures: Review and Future Directions."

Steel Construction 6(4): 197-212.

Garcia Marquez, F. P., D. J. Pedregal Tercero, et al. (2007). "Unobserved

Component Models Applied to the Assessment of Wear in Railway Points: A

Case Study." European Journal of Operational Research 176(3): 1703-1712.

Ge, M., R. Du, et al. (2004). "Hidden Markov Model Based Fault Diagnosis for

Stamping Processes." Mechanical Systems and Signal Processing 18(2): 391-

408.

Ghasemi, A., S. Yacout, et al. (2008). "Optimal Stategies for Non-Costly and Costly

Observations in Condition Based Maintenance." IAENG International

Journal of Applied Mathematics 38(2).

Gibson, S. and B. Ninness (2005). "Robust Maximum-Likelihood Estimation of

Multivariable Dynamic Systems." Automatica 41(10): 1667-1682.

Godsill, S. J., A. Doucet, et al. (2004). "Monte Carlo Smoothing for Nonlinear Time

Series." Journal of the American Statistical Association 99(465): 156.

7 References 152

Goode, K. B., J. Moore, et al. (2000). "Plant Machinery Working Life Prediction

Method Utilizing Reliability and Condition-Monitoring Data." Proceedings

of the Institution of Mechanical Engineers 214(2): 109.

Grall, A., C. Berenguer, et al. (2002). "A Condition-Based Maintenance Policy for

Stochastically Deteriorating Systems." Reliability Engineering & System

Safety 76(2): 167-180.

Grosfeld-Nir, A. (2007). "Control Limits for Two-State Partially Observable

Markov Decision Processes." European Journal of Operational Research

182(1): 300-304.

Hashemi, R., H. Jacqmin-Gadda, et al. (2003). "A Latent Process Model for Joint

Modeling of Events and Marker." Lifetime Data Analysis 9(4): 331-343.

Heng, A., A. C. C. Tan, et al. (2009). "Intelligent Condition-Based Prediction of

Machinery Reliability." Mechanical Systems and Signal Processing 23(5):

1600-1614.

Hontelez, J. A. M., H. H. Burger, et al. (1996). "Optimum Condition-Based

Maintenance Policies for Deteriorating Systems with Partial Information."

Reliability Engineering & System Safety 51(3): 267-274.

Hopp, W. J. and Y.-L. Kuo (1998). "An Optimal Structured Policy for Maintenance

of Partially Observable Aircraft Engine Components." Naval Research

Logistics 45(4): 335-352.

Huitian, L., W. J. Kolarik, et al. (2001). "Real-Time Performance Reliability

Prediction." Reliability, IEEE Transactions on 50(4): 353-357.

Ilgin, M. and S. Tunali (2007). "Joint Optimization of Spare Parts Inventory and

Maintenance Policies Using Genetic Algorithms." The International Journal

of Advanced Manufacturing Technology 34(5): 594-604.

7 References 153

Jacquier, E., M. Johannes, et al. (2007). "Mcmc Maximum Likelihood for Latent

State Models." Journal of Econometrics 137(2): 615-640.

Jardine, A. K. S., D. Lin, et al. (2006). "A Review on Machinery Diagnostics and

Prognostics Implementing Condition-Based Maintenance." Mechanical

Systems and Signal Processing 20(7): 1483-1510.

Jiang, R. and A. K. S. Jardine (2006). "Composite Scale Modeling in the Presence of

Censored Data." Reliability Engineering & System Safety 91(7): 756-764.

Jiang, R. and A. K. S. Jardine (2008). "Health State Evaluation of an Item: A

General Framework and Graphical Representation." Reliability Engineering

& System Safety 93(1): 89-99.

Jie, Y., T. Kirubarajan, et al. (2000). "A Hidden Markov Model-Based Algorithm for

Fault Diagnosis with Partial and Imperfect Tests." Systems, Man, and

Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 30(4):

463-473.

Julier, b. S. J. and J. K. Uhlmann (1997). A New Extension of the Kalman Filter to

Nonlinear Systems. Int. Symp. Aerospace/Defense Sensing, Simul. and

Controls: 182-193.

Kaelbling, L. P., M. L. Littman, et al. (1998). "Planning and Acting in Partially

Observable Stochastic Domains." Artificial Intelligence 101(1-2): 99-134.

Kallen, M. J. and J. M. Van Noortwijk (2005). "Optimal Maintenance Decisions

under Imperfect Inspection." Reliability Engineering & System Safety 90(2-

3): 177-185.

Khan, M. E. and D. N. Dutt (2007). "An Expectation-Maximization Algorithm

Based Kalman Smoother Approach for Event-Related Desynchronization

(Erd) Estimation from Eeg." Biomedical Engineering, IEEE Transactions on

54(7): 1191-1198.

7 References 154

Kim, J. (2005). Parameter Estimation in Stochastic Volatility Models with Missing

Data Using Particle Methods and the Em Algorithm. United States --

Pennsylvania, University of Pittsburgh.

Klaas, M., M. Briers, et al. (2006). Fast Particle Smoothing: If I Had a Million

Particles. Proceedings of the 23rd international conference on Machine

learning. Pittsburgh, Pennsylvania, ACM.

Kobbacy, K. A. H., B. B. Fawzi, et al. (1997). "A Full History Proportional Hazards

Model for Preventive Maintenance Scheduling." Quality and Reliability

Engineering International 13(4): 187-198.

Kravdal, Ø. (1997). "The Attractiveness of an Additive Hazard Model: An Example

from Medical Demography." European Journal of Population/Revue

européenne de Démographie 13(1): 33-47.

Kumar, D. and U. Westberg (1996). "Proportional Hazards Modeling of Time-

Dependent Covariates Using Linear Regression: A Case Study [Mine Power

Cable Reliability]." Reliability, IEEE Transactions on 45(3): 386-392.

Kumar, D. and U. Westberg (1997). "Maintenance Scheduling under Age

Replacement Policy Using Proportional Hazards Model and Ttt-Plotting."

European Journal of Operational Research 99(3): 507-515.

Lawless, J. and M. Crowder (2004). "Covariates and Random Effects in a Gamma

Process Model with Application to Degradation and Failure." Lifetime Data

Analysis 10(3): 213-227.

Lee, M.-L. T. and G. A. Whitmore (2006). "Threshold Regression for Survival

Analysis: Modeling Event Times by a Stochastic Process Reaching a

Boundary." Statistical Science 21(4): 501–513.

Lee, M.-L. T., G. A. Whitmore, et al. (2004). "Assessing Lung Cancer Risk in

Railroad Workers Using a First Hitting Time Regression Model."

Environmetrics 15(5): 501-512.

7 References 155

Li, W. and H. Pham (2005). "An Inspection-Maintenance Model for Systems with

Multiple Competing Processes." Reliability, IEEE Transactions on 54(2):

318-327.

Liao, H., E. A. Elsayed, et al. (2006a). "Maintenance of Continuously Monitored

Degrading Systems." European Journal of Operational Research 175(2): 821-

835.

Liao, H., W. Zhao, et al. (2006b). Predicting Remaining Useful Life of an Individual

Unit Using Proportional Hazards Model and Logistic Regression Model.

Reliability and Maintainability Symposium, 2006. RAMS '06. Annual: 127-

132.

Lin, D., D. Banjevic, et al. (2006). "Using Principal Components in a Proportional

Hazards Model with Applications in Condition-Based Maintenance." The

Journal of the Operational Research Society 57(8): 910.

Lin, D. Y. and Z. Ying (1994). "Semiparametric Analysis of the Additive Risk

Model." Biometrika 81(1): 61-71.

Lin, D. Y. and Z. Ying (1995). "Semiparametric Analysis of General Additive-

Multiplicative Hazard Models for Counting Processes." The Annals of

Statistics 23(5): 1712-1734.

Logan, B. T. and A. J. Robinson (1997). Enhancement and Recognition of Noisy

Speech within an Autoregressive Hidden Markov Model Framework Using

Noise Estimates from the Noisy Signal. Acoustics, Speech, and Signal

Processing, 1997. ICASSP-97., 1997 IEEE International Conference on. A. J.

Robinson. 2: 843-846 vol.2.

Lu, S., H. Lu, et al. (2001). "Multivariate Performance Reliability Prediction in

Real-Time." Reliability Engineering & System Safety 72(1): 39-45.

Maillart, L. M. (2006). "Maintenance Policies for Systems with Condition

Monitoring and Obvious Failures." IIE Transactions 38: 463-475.

7 References 156

Makis, V. and A. K. S. Jardine (1992). "Optimal Replacement in the Proportional

Hazards Model." INFOR 30(2): 172-183.

Makis, V. and X. Jiang (2003). "Optimal Replacement under Partial Observations."

Mathematics of Operations Research 28(2): 382.

Makis, V., J. Wu, et al. (2006). "An Application of Dpca to Oil Data for Cbm

Modeling." European Journal of Operational Research 174(1): 112-123.

Mani, G., D. Wolfe, et al. (2008). Slurry Pump Wear Assessment through Vibration

Monitoring. WCEAM-IMS 2008. Beijing, China, Springer-Verlag London

Ltd: 1068-1076.

Marseguerra, M., E. Zio, et al. (2002). "Condition-Based Maintenance Optimization

by Means of Genetic Algorithms and Monte Carlo Simulation." Reliability

Engineering & System Safety 77(2): 151-165.

McKeague, I. W. and P. D. Sasieni (1994). "A Partly Parametric Additive Risk

Model." Biometrika 81(3): 501-514.

Miao, Q. (2005). Application of Wavelets and Hidden Markov Model in Condition-

Based Maintenance. Canada, University of Toronto (Canada).

Mohanta, D. K., P. K. Sadhu, et al. (2007). "Deterministic and Stochastic Approach

for Safety and Reliability Optimization of Captive Power Plant Maintenance

Scheduling Using Ga/Sa-Based Hybrid Techniques: A Comparison of

Results." Reliability Engineering & System Safety 92(2): 187-199.

Monahan, G. E. (1982). "A Survey of Partially Observable Markov Decision

Processes: Theory, Models, and Algorithms." Management Science 28(1): 1-

16.

Morcous, G. (2006). "Performance Prediction of Bridge Deck Systems Using

Markov Chains." Journal of Performance of Constructed Facilities 20(2):

146-155.

7 References 157

Moustafa, M. S., E. Y. A. Maksoud, et al. (2004). "Optimal Major and Minimal

Maintenance Policies for Deteriorating Systems." Reliability Engineering &


Munõz, A., S. Martorell, et al. (1997). "Genetic Algorithms in Optimizing

Surveillance and Maintenance of Components." Reliability Engineering &


Olsson, J., O. Capp´e, et al. (2008). "Sequential Monte Carlo Smoothing with

Application to Parameter Estimation in Nonlinear State Space." Bernoulli

14(1): 155–179.

Orchard, M., G. Kacprzynski, et al. (2009). Advances in Uncertainty Representation

and Management for Particle Filtering Applied to Prognostics. Applications

of Intelligent Control to Engineering Systems: 23-35.

Park, C. and W. Padgett (2005a). "Accelerated Degradation Models for Failure

Based on Geometric Brownian Motion and Gamma Processes." Lifetime

Data Analysis 11(4): 511-527.

Park, C. and W. J. Padgett (2005b). "New Cumulative Damage Models for Failure

Using Stochastic Processes as Initial Damage." Reliability, IEEE

Transactions on 54(3): 530-540.

Park, C. and W. J. Padgett (2006). "Stochastic Degradation Models with Several

Accelerating Variables." Reliability, IEEE Transactions on 55(2): 379-390.

Park, K. S. (1988). "Optimal Continuous-Wear Limit Replacement under Periodic

Inspections." Reliability, IEEE Transactions on 37(1): 97-102.

Porta, J. M., M. T. J. Spaan, et al. (2005). Robot Planning in Partially Observable

Continuous Domains. Robotics: Science and Systems I. Cambridge,

Massachusetts.

7 References 158

Prasad, P. V. N. and K. R. M. Rao (2002). Reliability Models of Repairable Systems

Considering the Effect of Operating Conditions. Reliability and

Maintainability Symposium, 2002. Proceedings. Annual: 503-510.

Proust-Lima, C. and L. L. H. Jacqmin-Gadda (2007). "A Nonlinear Latent Class

Model for Joint Analysis of Multivariate Longitudinal Data and a Binary

Outcome." Statistics in Medicine 26(10): 2229-2245.

Proust, C., H. Jacqmin-Gadda, et al. (2006). "A Nonlinear Model with Latent

Process for Cognitive Evolution Using Multivariate Longitudinal Data."

Biometrics 62(4): 1014-1024.

Puterman, M. L. (1994). Markov Decision Processes : Discrete Stochastic Dynamic

Programming. Hoboken, N.J. ; [Great Britain], Wiley-Interscience.

Ross, S. M. (1971). "Quality Control under Markovian Deterioration." Management

Science 17(9): 587-596.

Ross, S. M. (1996). Stochastic Processes. New York, Wiley.

Schön, T., A. Wills, et al. (2006). Maximum Likelihood Nonlinear System

Estimation Proceedings 14th IFAC Symposium on System Identification.

Schwarz, G. (1978). "Estimating the Dimension of a Model." The Annals of

Statistics 6(2): 461-464.

Shiroishi, J., Y. Li, et al. (1997). "Bearing Condition Diagnostics Via Vibration and

Acoustic Emission Measurements." Mechanical Systems and Signal

Processing 11(5): 693-705.

Singpurwalla, N. D. (1995). "Survival in Dynamic Environments." Statistical

Science 10(1): 86-103.

Singpurwalla, N. D. (2006). Reliability and Risk : A Bayesian Perspective. New

York, J. Wiley & Sons.

7 References 159

Sondik, E. J. (1978). "The Optimal Control of Partially Observable Markov

Processes over the Infinite Horizon: Discounted Costs." Operations Research

26(2): 282-304.

Stathopoulos, A. and M. G. Karlaftis (2003). "A Multivariate State Space Approach

for Urban Traffic Flow Modeling and Prediction." Transportation Research

Part C: Emerging Technologies 11(2): 121-135.

Stavropoulos, C. N. and S. D. Fassois (2000). "Non-Stationary Functional Series

Modeling and Analysis of Hardware Reliability Series: A Comparative Study

Using Rail Vehicle Interfailure Times." Reliability Engineering & System

Safety 68(2): 169-183.

Sun, Y., L. Ma, et al. (2006). "Mechanical Systems Hazard Estimation Using

Condition Monitoring." Mechanical Systems and Signal Processing 20(5):

1189-1201.

Thrun, S. (2000). "Monte Carlo Pomdps." Advances in Neural Information

Processing Systems 12: 1064-1070.

Tijms, H. C. and F. A. van der Duyn Schouten (1985). "A Markov Decision

Algorithm for Optimal Inspections and Revisions in a Maintenance System

with Partial Information." European Journal of Operational Research 21(2):

245-253.

Torben, M. and H. S. Thomas (2002). "A Flexible Additive Multiplicative Hazard

Model." Biometrika 89(2): 283.

Van Der Merwe, R., A. Doucet, et al. (2000). The Unscented Particle Filter. Adv.

Neural Inform. Process. Syst.

van Noortwijk, J. M. (2009). "A Survey of the Application of Gamma Processes in

Maintenance." Reliability Engineering & System Safety 94(1): 2-21.

7 References 160

Vlok, P. J., J. L. Coetzee, et al. (2002). "Optimal Component Replacement Decisions

Using Vibration Monitoring and the Proportional-Hazards Model." The

Journal of the Operational Research Society 53(2): 193-202.

Wang, P. and D. W. Coit (2004). Reliability Prediction Based on Degradation

Modeling for Systems with Multiple Degradation Measures. Reliability and

Maintainability, 2004 Annual Symposium - RAMS: 302-307.

Wang, R. C. (1976). "Computing Optimal Quality Control Policies: Two Actions."

Journal of Applied Probability 13(4): 826-832.

Wang, W. (2002). "A Model to Predict the Residual Life of Rolling Element

Bearings Given Monitored Condition Information to Date." IMA Journal of

Management Mathematics 13(1): 3.

Wang, W. (2003a). "An Evaluation of Some Emerging Techniques for Gear Fault

Detection." Structural Health Monitoring 2(3): 225-242.

Wang, W. (2003b). "Modelling Condition Monitoring Intervals: A Hybrid of

Simulation and Analytical Approaches." The Journal of the Operational

Research Society 54(3): 273.

Wang, W. (2006). "Modelling the Probability Assessment of System State Prognosis

Using Available Condition Monitoring Information." IMA Journal of

Management Mathematics 17(3): 225.

Wang, W. (2007). "A Prognosis Model for Wear Prediction Based on Oil-Based

Monitoring." Journal of the Operational Research Society 58: 887-893.

Wang, W. (2009). "An Inspection Model for a Process with Two Types of

Inspections and Repairs." Reliability Engineering & System Safety 94(2):

526-533.

7 References 161

Wang, W. and A. H. Christer (2000). "Towards a General Condition Based

Maintenance Model for a Stochastic Dynamic System." The Journal of the

Operational Research Society 51(2): 145-155.

Wang, W., P. A. Scarf, et al. (2000). "On the Application of a Model of Condition-

Based Maintenance." The Journal of the Operational Research Society

51(11): 1218.

Wang, W. and A. K. Wong (2002). "Autoregressive Model-Based Gear Fault

Diagnosis." Journal of Vibration and Acoustics 124(2): 172-179.

Wang, W. and W. Zhang (2005). "A Model to Predict the Residual Life of Aircraft

Engines Based Upon Oil Analysis Data." Naval Research Logistics 52(3):

276-284.

White, C. C., III (1978). "Optimal Inspection and Repair of a Production Process

Subject to Deterioration." The Journal of the Operational Research Society

29(3): 235-243.

White, C. C., III (1979). "Bounds on Optimal Cost for a Replacement Problem with

Partial Observations." Naval Research Logistics Quarterly 26(3): 415-422.

Whitmore, G. and F. Schenkelberg (1997). "Modelling Accelerated Degradation

Data Using Wiener Diffusion with a Time Scale Transformation." Lifetime

Data Analysis 3(1): 27-45.

Whitmore, G. A., M. J. Crowder, et al. (1998). "Failure Inference from a Marker

Process Based on a Bivariate Wiener Model." Lifetime Data Analysis 4(3):

229-251.

Wills, A., T. B. Schön, et al. (2008). Parameter Estimation for Discrete-Time

Nonlinear Systems Using Em. 17th IFAC World Congress. COEX, Korea,.

Wu, C. F. J. (1983). "On the Convergence Properties of the Em Algorithm." The

Annals of Statistics 11(1): 95-103.

7 References 162

Xu, D. and W. Zhao (2005). Reliability Prediction Using Multivariate Degradation

Data. Reliability and Maintainability Symposium, 2005. Proceedings.

Annual: 337-341.

Yashin, A. I., K. G. Arbeev, et al. (2007). "Stochastic Model for Analysis of

Longitudinal Data on Aging and Mortality." Mathematical Biosciences

208(2): 538-551.

Yashin, A. I. and K. G. Manton (1997). "Effects of Unobserved and Partially

Observed Covariate Processes on System Failure: A Review of Models and

Estimation Strategies." Statistical Science 12(1): 20-34.

Yu, B. M., K. V. Shenoy, et al. (2004). Derivation of Kalman Filtering and

Smoothing Equations, Department of Electrical Engineering Stanford

University.

Yuan, X. (2007). Stochastic Modeling of Deterioration in Nuclear Power Plant

Components. Civil and Environmental Engineering. Waterloo, University of

Waterloo.

Zeng, D., G. Yin, et al. (2005). "Inference for a Class of Transformed Hazards

Models." Journal of the American Statistical Association 100(471): 1000.

Zhou, E., M. C. Fu, et al. (to appear). "Solving Continuous-State Pomdps Via

Density Projection." IEEE Transactions on Automatic Control.

Zhou, J. (2007). Joint Decision Making on Preventive Maintenance and

Reconfiguration in Complex Manufacturing Systems. United States --

Michigan, University of Michigan.

Zuashkiani, A., D. Banjevic, et al. (2006). Incorporating Expert Knowledge When

Estimating Parameters of the Proportional Hazards Model. Reliability and

Maintainability Symposium, 2006. RAMS '06. Annual: 402-408.

7 References 163

Zuo, M. J., R. Jiang, et al. (1999). "Approaches for Reliability Modeling of

Continuous-State Devices." Reliability, IEEE Transactions on 48(1): 9-18.

164

8 Appendix

The derivation of the conditional PDF of the underlying health state

given the health states and , where :

According to Bayesian theory and Markovian property, the conditional PDF can be

calculated as:

,, ,

,

| | , (8-1)

where

| ; , , (8-2)

; ,, (8-3)

and

; ,. (8-4)

The , can be finally obtained as

,

; , . (8-5)

8 Appendix 165

The derivation of the conditional PDF of underlying health states for

censored data, where is the censored time:

Λ Λ , Λ

Pr Λ Λ , Λ |

Pr Λ |Λ |Pr Λ |Λ

Pr Λ , Λ |Λ |

Pr Λ |Λ

Λ , Λ |Λ

Pr Λ |Λ

Λ |Λ Λ |Λ

Pr Λ |Λ

Λ |Λ Pr Λ |ΛPr Λ |Λ

Ga ; ,

·1 Γ , Λ / /Γ

1 Γ , Λ / /Γ

where Γ , is the incomplete Gamma function

Asset Life Prediction and Maintenance Decision-Making ... · Asset Life Prediction and Maintenance...

Documents

Transcript of Asset Life Prediction and Maintenance Decision-Making ... · Asset Life Prediction and Maintenance...