Availability and Maintainability · Introduction: reliability and availability • Non maintained...

42
1 Piero Baraldi Availability and Maintainability

Transcript of Availability and Maintainability · Introduction: reliability and availability • Non maintained...

Page 1: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

11Piero Baraldi

Availability and Maintainability

Page 2: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

22Piero Baraldi

Introduction: reliability and availability

• Non maintained systems: they

cannot be repaired after a failure

(a telecommunication satellite, a

F1 engine, a vessel of a nuclear

power plant).

• Maintained systems: they can be

repaired after the failure (pump of

an energy production plant, a

component of the reactor

emergency cooling systems)

Reliability (𝑹(𝑻𝑴))

It quantifies the system capability

of satisfying a specified mission

within an assigned period of time

(𝑇𝑀): 𝑅 𝑇𝑀 = 𝑃 𝑇 > 𝑇𝑀

Availability (𝑨(𝒕))

It quantifies the system ability to

fulfill the assigned mission at any

specific moment of the lifetime

Performance parametersSystem types

Page 3: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

33Piero Baraldi

Definition of Availability

• X(t) = indicator variable such that:

• X(t) = 1, system is operating at time t

• X(t) = 0, system is failed at time t

• Instantaneous availability p(t) and unavailability q(t)

X(t)

1

0

F F F

R R R t

F = Failed

R = under Repair

( ) ( ) 1 ( )p t P X t E X t ( ) ( ) 0 1 ( )q t P X t p t

Page 4: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

44Piero Baraldi

Contributions to Unavailability

• Repair

A component is unavailable because under repair

Page 5: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

55Piero Baraldi

Contributions to Unavailability

• Repair

A component is unavailable because under repair

• Testing / preventive maintenance

A component is removed from the system because:

a. it must undergo preventive maintenance

b. it has to be tested (safety system, standby components)

EMERGENCY CORE COLLING SYSTEM

Page 6: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

66Piero Baraldi

Contributions to Unavailability

• Repair

A component is unavailable because under repair

• Testing / preventive maintenance

A component is removed from the system because:

a. it must undergo preventive maintenance

b. it has to be tested (safety system, standby components)

• Unrevealed failure

A stand-by component fails unnoticed. The system goes on without noticing

the component failure until a test on the component is made or the component

is demanded to function

Page 7: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

77Piero Baraldi

Average availability descriptors

• How to compare different maintenance strategies?

Page 8: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

88Piero Baraldi

Average availability descriptors

• How to compare different maintenance strategies?

• We need to define quantities for an average description of the system

probabilistic behavior:

• If after some initial transient effects, the instantaneous availability assumes a time

independent value → Limiting or steady state availability:

)(lim tppt

Page 9: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

99Piero Baraldi

Average availability descriptors

• How to compare different maintenance strategies?

• We need to define quantities for an average description of the system

probabilistic behavior:

• If after some initial transient effects, the instantaneous availability assumes a time

independent value → Limiting or steady state availability:

• If the limit does not exist → Average availability over a period of time Tp:

p

p

p

p

T

pp

T

T

pp

T

T

DOWNTIMEdttq

Tq

T

UPTIMEdttp

Tp

0

0

...)(1

...)(1

)(lim tppt

Page 10: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

1010Piero Baraldi

Average Availability

10

p(t)

p(t)

p(t)

t

t

1

1

1

t

𝑇𝑝

𝑇𝑝

𝑇𝑝

pT

dttp0

)(

p

p

T

p

T

pT

dttp

0 )(

pT

dttp0

)(

UPTIMEDOWNTIME

2𝑇𝑝

pT

dttp0

)(

pT

dttp0

)(

pT

dttp0

)(

Page 11: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

1111Piero Baraldi

Maintenability

• How fast a system can be repaired after failure?

• Repair time TR depends from many factors such:• time necessary for the diagnosis

• time required to extract the component from the system

• time necessary for installing the new component or repair the component

• Repair time TR varies statistically from one failure to another, depending on the conditions associated to the particular maintenance events

• Let TR denotes the downtime random variable, distributed according to the pdf 𝑔𝑇𝑅(𝑡)

• Maintenability:

11

t

TR dgtTPR0

Page 12: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

1212Piero Baraldi

Failure classification

• Failure can be:

• revealed

• unrevealed (stanby components, safety systems)

12

Page 13: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

1313Piero Baraldi

REVEALED FAILURE

Page 14: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

• Objective:

• Computation of the availability p(t) of a component continuosly

monitored

• Hypotheses:

• Restoration starts immediately after the component failure

• Probability density function of the random time duration TR of

the repair process = 𝑔𝑇𝑅(𝑡)

• Constant failure rate 𝜆

• 𝑁0 = number of identical components at time t = 0 (assumption)

Balance equation between time t and time t+t

Availability of a continuously monitored component (1)

CONCEPTUAL EXPERIMENT

Page 15: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

working

at t+Δt-

Failure

between t

and t+ Δt

working

at t

t

ttgpNttpNtpNttpN0

)()()()()(

Repair between

t and t+ Δt+=

(1) (2) (3) (4)

1) Number of items UP at time t+t

2) Number of items UP at time t

3) Number of items failing during the interval t

4) Number of items that had failed in (, +) and whose

restoration terminates in (t, t+t);

Availability of a continuously monitored component (3)

0 0 0 0

Page 16: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

• The integral-differential form of the balance

• The solution can be obtained introducing the Laplace

transforms

t

dtgptpdt

tdp

0

)()()()(

1)0( p

Availability of a continuously monitored component (3)

𝑝 𝑡

𝑑 𝑝 𝑡

𝑑𝑡

𝑝 𝑡 ∗ 𝑔 𝑡 = න0

𝑡

𝑝 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏

𝐿 𝑝 𝑡 = න0

+∞

𝑒−𝑠𝑡𝑝 𝑡 𝑑𝑡 = 𝑝 (𝑠)

𝐿𝑑 𝑝 𝑡

𝑑𝑡= 𝑠 ∙ 𝑝 𝑠 − 𝑝 0

𝐿 𝑝 𝑡 ∗ 𝑔 𝑡 = 𝑝 (𝑠) ∙ 𝑔 (𝑠)

Time domain Laplace Transform

Page 17: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

• Applying the Laplace transform we obtain:

which yields:

• Inverse Laplace transform p(t)

• Limiting availability:

)(~)(~)(~1)(~ sgspspsps

)(~1

1)(~

sgssp

)(~1lim)(~lim)(lim

00 sgs

sspstpp

sst

Availability of a continuously monitored component (4)

Page 18: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

• As s 0, the first order approximation of 𝑔(𝑠) is:

R

s sdgsdgsdgesg

1)(1)()...1()()(~

000

R g RE T

cycle""pair failure/rea of timeaverage

UP iscomponent the timeaverage

MTTRMTTF

MTTF

1

1

1

1

ss

slimp

RRR0s

General result!

Availability of a continuously monitored component (5)

MTTR

MTTR=

Page 19: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

Exercise 1

• Find instantaneous and limiting availability for an

exponential component whose restoration probability

density istetg )(

Page 20: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

Exercise 1: Solution

• Find instantaneous and limiting availability for an

exponential component whose restoration probability

density is

• The Laplace transform of the restoration density is

tetg )(

stgLsg )()(~

)(

1)(~

ss

s

s

ss

sp 𝑝 𝑡 = 𝐿−1 𝑝(𝑠) = 𝐿−1𝑠 + 𝜇

𝑠(𝑠 + 𝜆 + 𝜇)

Page 21: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

Exercise 1: Solution

• Find instantaneous and limiting availability for an

exponential component whose restoration probability

density istetg )(

𝑝 𝑠 =𝐴

𝑠+

𝐵

(𝑠 + 𝜇 + 𝜆)=𝐴 𝑠 + 𝜇 + 𝜆 + 𝐵𝑠

𝑠(𝑠 + 𝜆 + 𝜇)=

𝑠(𝐴 + 𝐵) + 𝐴𝜇 + 𝐴𝜆

𝑠(𝑠 + 𝜆 + 𝜇)=

𝑠 + 𝜇

𝑠(𝑠 + 𝜆 + 𝜇)

ቊ1 = 𝐴 + 𝐵𝜇 = 𝐴𝜆 + 𝐴𝜇

𝐴 =𝜇

𝜇 + 𝜆

𝐵 =𝜇

𝜇 + 𝜆

𝑝 𝑡 = 𝐴 + 𝐵𝑒− 𝜇+𝜆 𝑡 =𝜇

𝜇 + 𝜆+

𝜇

𝜇 + 𝜆𝑒− 𝜇+𝜆 𝑡

𝐿−1𝑠 + 𝜇

𝑠(𝑠 + 𝜆 + 𝜇)

𝐿 1 =1

𝑠

𝐿1

𝑠 + 𝑎= 𝑒−𝑎𝑡

Page 22: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

0 20 40 60 80 100 120 140 160 180 2000.9

0.91

0.92

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

t [day]

p(t

)

constant failure rate =1/50 [day-1

]

constant repair rate = 1/5 [day-1

]

Exercise 1: Solution

Page 23: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

UNREVEALED FAILURE

Page 24: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

Safety Systems: Examples

Fire extinguisher

Risks: Ignition of the gases

in the oil tank, lightning

strike or generation of

sparks due to electrostatic

charges.

Fire protection

Page 25: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

Exercise 2: Instantaneous Availability of an Unattended Component

• Find the instantaneous unavailability of an unattended component (no repair is

allowed) whose cumulative failure time distribution is F(t)

Page 26: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

• Find the instantaneous availability of an unattended component (no repair is

allowed) whose cumulative failure time distribution is F(t)

SOLUTION

The istantaneous unavailability, i.e. the probability q(t) that at time t the component

is not functioning is equal to the probability that it fails before t

)()( tFtq

Exercise 2: Solution

Page 27: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

2727Piero Baraldi

Availability of a safety system under periodic maintenance (1)

• Safety systems are generally in standby until accident and their

components must be periodically tested (interval between two

consecutive test = Tp)

• The instantaneous unavailability is a periodic function of time

p(t)

R(t)1

tTp 2Tp 3Tp

Page 28: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

2828Piero Baraldi

Availability of a safety system under periodic maintenance (1)

• Safety systems are generally in standby until accident and their

components must be periodically tested

• The instantaneous unavailability is a periodic function of time

• The performance indicator used is the average unavailability over a

period of time Tp

complete maintenance cycle

Average time

the system is

not working

pp

T

TT

DOWNtime

T

dttqq

p

p

0 )(

Page 29: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

2929Piero Baraldi

EX. 3: Availability of a component under periodic maintenance

• Suppose the unavailability is due to unrevealed random failures, e.g.

with constant rate

• Assume also instantaneous and perfect testing and maintenance

procedures performed every Tp hours

• Draw the qualitative time evolution of the instantaneous availability

• Find the average unavailability of the component

Page 30: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3030Piero Baraldi

Ex 3. Solution (1)

• Suppose the unavailability is due to unrevealed random failures, e.g.

with constant rate

• Assume also instantaneous and perfect testing and maintenance

procedures

• The instantaneous availability within a period Tp coincides with the

reliability

p(t)

R(t)1

tTp 2Tp 3Tp

Page 31: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3131Piero Baraldi

• The average unavailability and availability are then:

For different systems, we can compute p and q by first computing their

failure probability distribution FT(t) and reliability R(t) and then applying

the above expressions

Ex. 3: Solution (2)

p

T

T

p

T

TT

dttF

T

dttqq

pp

p

00

)()(

p

T

p

T

TT

dttR

T

dttpp

pp

p

00

)()(

pp TT qp 1

Page 32: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3232Piero Baraldi

• The average unavailability and availability are then:

Ex. 3: Solution (3)

pTT Tqppp

2

111

p

p

p

p

T

p

Tt

p

T

T

p

T

T TT

T

T

dtt

T

dte

T

dttF

T

dttqq

pppp

p

2

1

2

)1()()( 2

0000

Page 33: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3333Piero Baraldi

Ex.4: A more realistic case

33

• The test is performed after a time 𝜏 of operation

• Assume a finite test time R:

• Draw the qualitative time evolution of the instantaneous availability

• Estimate the average unavailability and availability over the complete

maintenance cycle period Tp = + R

Page 34: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3434Piero Baraldi

Ex.4: Solution (1)

34

• Assume a finite test time R:

• Draw the qualitative time evolution of the instantaneous availability

t

p(t)

1

R R R

Page 35: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3535Piero Baraldi

• Assume a finite test time R

• The average unavailability and availability over the complete

maintenance cycle period Tp = +R are:

0 0

( ) ( )R T R T

R

R

F t dt F t dt

q

0 0

( ) ( )

R

R

R t dt R t dt

p

Ex. 4: Solution (2)

Page 36: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3636Piero Baraldi

Component under periodic maintenance: a more realistic case

• Objective: computation of the average unavailability over

the lifetime [0, TM]

• Hypoteses:

• The component is initially working: q(0) = 0; p(0) = 1

• Failure causes:

• random failure at any time T FT(t)

• on-line switching failure on demand Q0

• maintenance disables the component 0 (human error

during inspection, testing or repair)

M

TT

DOWNtimeq

M],0[

Page 37: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3737Piero Baraldi

1) The probability of finding the component DOWN at the generic time t is

due either to the fact that it was demanded to start but failed or to the fact

that it failed unrevealed randomly before t. The average DOWNtime is:

Component under periodic maintenance: a more realistic case

A B C Ft

T0 R RR

maintenance maintenance

(1) (2) (3) (4)

0 0 01A Tq t Q Q F t

(0 ) 0 0 0 0 0

0 0 0

( ) (1 ) ( ) (1 ) ( )D A A T TT q t dt Q Q F t dt Q Q F t dt

],0[ ADOWNtime

Page 38: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3838Piero Baraldi

Component under periodic maintenance: a more realistic case

2) During the maintenance period the component remains disconnected and

the average DOWNtime is the whole maintenance time:

3) The component can be found failed because, by error, it remained

disabled from the previous maintenance or because it failed on demand or

randomly before t. The average DOWNtime is:

( )D AB RT

0 0 0 0( ) (1 ) (1 ) ( )BC Tq t Q Q F t

( ) 0 0 0 0

0 0

( ) (1 ) (1 ) ( )D BC BC TT q t dt Q Q F t dt

],[ BADOWNtime

],[ CBDOWNtime

Page 39: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

3939Piero Baraldi

Component under periodic maintenance: a more realistic case

4) The normal maintenance cycle is repeated throughout the component

lifetime TM. The number of repetitions, i.e. the number of AB-BC

maintenance cycles, is:

• The total expected DOWNtime is:

(0 ) 0 0 0 0 0 0

0 0

(1 ) ( ) (1 ) (1 ) ( )M

MD T T R T

R

TT Q Q F t dt Q Q F t dt

(0 ) 0 00 0 0 0 0

0 0

1 1( ) (1 ) (1 ) ( )M

M

D T

T T R T

M M M R

T Q Qq F t dt Q Q F t dt

T T T

DT

MTq

DOWNtime

M

TT

DOWNtimeq

M

𝑘 =𝑇𝑀 − 𝜏

𝜏 + 𝜏𝑟

Page 40: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

4040Piero Baraldi

Component under periodic maintenance: a more realistic case

(0 ) 0 00 0 0 0 0

0 0

1 1( ) (1 ) (1 ) ( )M

M

D T

T T R T

M M M R

T Q Qq F t dt Q Q F t dt

T T T

MTq

• 𝜏 ≪ 𝑇𝑀

• 0𝜏𝐹𝑇 𝑡 𝑑𝑡 ≤ 𝜏

• 𝜏𝑅 ≪ 𝜏

(0 ) 0 00 0 0 0 0

0 0

1 1( ) (1 ) (1 ) ( )M

M

D T

T T R T

M M M R

T Q Qq F t dt Q Q F t dt

T T T

M

TT

DOWNtimeq

M

M

TT

DOWNtimeq

M

Page 41: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

4141Piero Baraldi

Component under periodic maintenance: a more realistic case

• Q0 and FT(t) are generally small, and since typically R and TM ,

the average unavailability can be simplified to:

• Consider an exponential component with small, constant failure rate

• Since typically 01, Q01, the average unavailability reads:

00 0 0 0

0

1(1 ) ( )

M

RT T

Qq Q F t dt

0 0 0

1

2M

RTq Q

Error after test Switching failure on demand

Random,

unrevealed failures

between testsMaintenance

MTq

( ) 1 t

TF t e t

M

TT

DOWNtimeq

M

M

TT

DOWNtimeq

M

Page 42: Availability and Maintainability · Introduction: reliability and availability • Non maintained systems: they cannot be repaired after a failure (a telecommunication satellite,

4242Piero Baraldi

Where to study?

• Slides

• Lecture 5: Reliability of simple systems

• Chapter 5 (Red Book): Theory

• Chapter 5 (Green Book): Exercises

• Lecture 6: Availability and Maintenability

• Chapter 6 (Red Book): Theory

• Chapter 6 (Green Book): Exercises

42