Reliability.Asset.Integrity Center Introduction to RELIABILITY and MAINTENANCE.

Reliability.Asset.Integrity Center

Introduction to RELIABILITY and MAINTENANCE


To recognize the importance of reliability

To understand the basic definitions of reliability and its measures

To understand the concept of bathtub reliability curve

To understand basic methodology in reliability analysis and its relation to maintenance

Session Objectives1-2


Increased concern in safety and environment

Tight profit margin

Escalating operational cost

Increased system complexity

Depletion in oil and gas resources

Increased in demand

Changes in material, operating conditions, equipment ages

Highly competitive business environment

Pressure

Safe, Reliable,

and Efficient

Plant

OP

ER

ATIO

NA

L I

SS

UES

1-3


Why RELIABILITY?

PETROCHEMICAL BUSINESS DRIVERS► Reduce operational cost ► Healthy, Safe and environmental friendly operation► Maximize utilization► Meeting operation target and customer demand► Reduce wastes, failures and downtime► High availability► Continuously improve plant performance

RELIABILITY DIRECTLY IMPACTS ALL THESE

1-4


Reliability and Organization’s profitability

Recent incident of oil spills in the Gulf of Mexico had caused an estimated of USD 23 Billion loss to BP

What causes it?

• Bad cement job• Failure of the shoe track barrier• The negative pressure test was accepted when it should not have been

• Failure in well control procedures• Failure in blow-out preventer failures • Rig’s fire and gas system failed to prevent ignition

Source: BP report, www.bp.com

1-5


System Performance Improvement

(Modarres, et al (1999))

Improve System performance

Prolong the life of equipment/component

Estimate and reduce Failure rate

Study Reliability Engineering issues

Improve Maintainability

Minimize Downtime

Improve Reliability

1-6


Failure Causes for Engineering Components and Systems

Causes Descriptions

1. Poor design Improper design, dimensions, tolerances, stress concentration, no interchangeability of parts

2. Improper installation Improper foundation, excessive vibration, inadequate inputs (i.e voltage etc.), wrong techniques/tools

3. Incorrect production Outdated technology, wrong equipment, lack of process control and calibrated equipment, inadequate training

4. Improper maintenance Under/over maintenance, wrong tools/technique, poor spare part management, insufficient skills and training

5. Complexity More number of components, interfaces and interconnection

6. Poor operational instruction / SOP

Wrong instruction, lack of clarity, difficult to understand, poor language

7. Human error Lack of understanding of process and equipment, carelessness, forgetfulness, poor judgmental skills

1-7


“the probability that the item will perform its required function under given conditions for the time interval”

Probability – describe stochastic (random) behaviour of occurrence of failure

Required function – the designed function of the system

Given conditions – the external condition in which the system usually operates

Time interval – the design life period of the system

What is RELIABILITY?1-8


RELIABILITY MEASURES

MEAN TIME TO FAILURE (MTTF)

The average time that elapses until a failure occurs. It is for non-repairable item

Example:

Consider 6 similar type components have failure time of 23, 34, 32, 28, 19 and 27 days respectively

MTTF = (23+34+32+28+19+27) / 6 = 27.2 days

n

iitn

MTTF1

1

1-9



MEAN TIME BETWEEN FAILURE (MTBF)

The average time between successive failures. It is used for repairable systems when failure rate is assumed to be constant (random failure).

Fail Fail Fail Fail

Uptime

Downtime

Time (days)

Example:

50 30 60 46

MTBF = (50+30+60+46) / 4 = 46.5 days

n

iixn

MTBF1

1

1-10



FAILURE RATE (HAZARD RATE)

Failure rate (hazard rate) is the conditional probability that a component fails in a small time interval given that it has survived from time zero until the beginning of the time interval.

Note : Failure rate term has been widely used to describe reliability of both non-repairable components and repairable system. The more appropriate term for non-repairable is hazard rate, and for repairable is rate of occurrence of failure (ROCOF)

time

survive

t +tt

What is the probability of failure?

1-11



FAILURE RATE (HAZARD RATE) CT’D

Failure rate is an important function in Reliability study since it describes changes in the probability of failure over the lifetime of the item hence the item’s reliability performance

Increasing rate = reliability deterioratesDecreasing rate = reliability improvesConstant rate = reliability maintains

1-12


Bathtub curve

Bathtub curve is a conceptual model of the reliability characteristics (failure rate) of a component or system over it’s lifetime. It is divided into three regions

Early failures

1

Failu

re r

ate

time

2

Useful life

3

Wear out

1-13


Bathtub curve

Early failures

1

Failu

re r

ate

time

Infant mortality or burn-in period Failure rate is initially higher due to issues such as improper manufacturing, installation and poor materials

1-14


Bathtub curveFa

ilu

re r

ate

time

2

Useful life

Failure rate is approximately constant as the failures, assumed mostly stress-related occur at random. This flat-portion of bathtub is also referred as component’s or system’s ‘normal operating life’ where realistically many components or systems spend most of their lifetimes operating

1-15


Bathtub curveFa

ilu

re r

ate

time

3

Wear out

Increasing failure rate because of degradation phenomena due to wear out. Wear out is generally caused by fatigue, corrosion, creep, friction and other aging factors

1-16


Failu

re ra

te

time t1 t2

Useful life extension

Original system decreasing failure

rate phase

Original system useful life phase

Improvement # 1 system wear out

phase

Original fielded system failure

curve

Improvement # 2 system wear out

phase

Major maintenance

action

Major maintenance

action

tn

Equipment / system useful life phase extension (Wasson, 2006)

Failure rate curve – Repairable system

1-17


Various types of Failure rate curve

1. Traditional view (random failure then wear out)

Typical equipment :

Belt, chains, impellers

Maintenance strategy:

Preventive Maintenance

2. Bathtub curve Electro-mechanical components and motors

Condition monitoring

3. Slow aging (steady increase in failure rate)

Turbine, engines, compressors, piping

Condition monitoring

1-18


4. Best New

(sharp increase in failure rate, then level off)

Typical equipment:

Hydraulic and pneumatic equipment

Maintenance strategy:

Condition based maintenance

5. Random failure (failure rate is constant, no age

related failure pattern)Ball and roller bearing Condition based

maintenance

6. Worst New (high infant mortality, then random

failure)Electronics equipment /components

Condition based maintenance

Various types of Failure rate curve1-19


Statistical concepts play critical roles in Reliability analysis/ techniques

Applications of Reliability techniques in real-world problems generally involves three main elements:

Acquisition – effective and efficient data collection Analysis – description and analysis of data (descriptive

and inferential statistics) Interpretation of data – use the result to solve the

problem

Reliability Analysis1-20


General Methodology for Reliability Analysis

Setting Objectives

Estimation of Reliability Measures

Definition of system and failure

Data gathering

Exploratory analysis

Distribution Analysis Recommendations for Operation and Maintenance improvement

1-21


Setting Objective

Clear objective is very important factor for successful reliability study

Have clear definition of the specific purpose to be achieved at the end of the analysis

The objective of the reliability study has high influence on the approach and method of modeling and analysis used

Precise objective will set proper conditions for appropriate collection of relevant maintenance data to be used in the analysis

1-22


System Definition

Inter-stage Conditioning

(Scrubber, Cooler etc.)

Gear Box

Air inlet Equipment

Gas Generator

Inlet Gas conditioning

(Scrubber, Cooler etc.)

Local Fuel/Gas

inlet Equipment

Starter system

Power Turbine

Lubrication system Miscellaneous

Shaft seal

system

Control and monitoring

After Cooler

Exhaust

Fuel/Gas control valve

Inlet valve

Air

Power Coolant Power Remote Instr.

Compressor unit1st

stage2nd

stage

Fuel/Gas

Recycle valve

Outlet valve

Power Coolant

System boundary

Example: Gas Compression Train (adapted from OREDA (2002))

System Boundary

1-23


Historical Data – test and field data on the same components /equipment

Vendor data – Data from manufacturer / vendor / consultant

Test data – experimental data of the parts

Operational data – Field data collected under actual operating conditions

Handbook data – theoretical data from standard engineering handbook, Reliability database i.e. OREDA, MIL-HDBK 217F

Judgmental data – information based on expert opinion inputs

Cost data – data on sales, maintenance and operational costs

Source of Data1-24


Main categories of data for reliability analysis :

Inventory data – information on equipment related to design, operational, functional and environmental characteristics. Can be classified under equipment identification, manufacturing and design, maintenance and test, engineering and process data

Failure–event data – detailed records on failure incidents i.e. event date; duration; modes; causes; codes; severity and effect on system; downtime date and duration

Operating time data – the time and duration for each operating state i.e. operation, standby and downtime

Operational Data1-25


Types of Data

?

?

?

Complete Data

Interval Censored

Left Censored

Right Censored (Suspension)

Exact time to failure is known

Item is still running at the end of observation time

Failure time is only known to be before a certain timeFailure time is between interval

1-26

Reliability.Asset.Integrity Center 27

1-27

Exploratory Data Analysis

Common Exploratory Tools

Use statistical tools and techniques to investigate data sets in order to gain insight about the data, understand their important characteristics, identify outliers and extract important factors

Histogram Pie chart Pareto Box plot Trend chart scattered plot

Reliability.Asset.Integrity Center 28

1-28

Exploratory Analysis

No. Subsystem Code1 Gas Turbine GT2 Centrifugal Gas Compressor GC3 Starter System STS4 Gearbox GB5 Fuel System FS6 Vibration Monitoring System VMS7 Anti-surge Valve System AVS8 Lube Oil System LOS9 Process and Utilities PRO10 Turbine Control System TCS

GT39%

GC7%STS

4%VMS4%

AVS14%

LOS7%

TCS25%

GT31%

GC18%

STS3%

FS6%

VMS3%

AVS9%

LOS3%

PRO18%

TCS9%PIE CHART

Train 1 Train 2

0

20

40

60

80

100

0

5

10

15

20

25

GT TCS GC AVS PRO LOS STS FS VMS GB

cum

mul

ative

%

failu

res PARETO

Gas compression Train (overall)

Example

0

2

4

6

8

10

12

14

2002 2003 2004 2005 2006 2007 2008 2009

no o

f fa

ilure

s

TCS

PRO

LOS

AVS

VMS

FS

GB

STS

GC

GT

TREND


Types of Configurations

Series

Parallel

M201Feed gas separator

T202AFeed/pure

gas exchanger

T202BFeed/pure

gas exchanger

T201A

T201B

T201C

T201D

A201Absorber

T203-A

T203-B

T203-C

T203-D

M202Feed gas separator

Example RBD for Acid Gas Removal Unit

1-29


Series Configuration

Blocks are connected in a series.

It can be thought of as an “OR” relationship (i.e. The system fails if A OR B fails).

It implies no redundancy in the components.

If units are in series, then all units must for the system to work. If any unit in the series fails, then the system fails.

The reliability of the system is given by:

Rs = R1 × R2 × … × Rn

R1 R2 R3

1-30


Reliability Calculation for Series System

Calculate system reliability given R1 = 0.90, R2 = 0.95 and R3 = 0.98.

R1 R2 R3

RS = R1 × R2 × R3

= (0.90)(0.95)(0.98) = 0.8379

1-31


Reliability Calculation for Series System

What is the system reliability and failure rate?

Assuming that the components are having a constant failure rate.

Then, the system reliability is

R1 R2 R3

t

ttt

s

e

eee

tRtRtRtR

)(

321

321

321

)()()()(

321 S

So, the failure rate for the system is

1-32


Exercise for Series System

Consider a system with three components in series.

You are required to achieve a system reliability of 0.98 over a 800-hours non-stop operation.

1. What would be the target failure rate for the system?

R1 R2 R3

hourper1053.2

800

)98.0ln(

)800()98.0ln(

98.0

)(

5

)800(

S

S

S

ts

S

S

e

etR

1-33



Consider a system with three components in series.

You are required to achieve a system reliability of 0.98 over a 800-hours non-stop operation.

2. What would be the system MTBF be?

days1650

hours395991053.2

1

1

5

SSMTBF

R1 R2 R3

1-34



3. Assuming the component failures are identically distributed,a) What should be the component failure rate?

b) What would be the component MTBF?

c) What should be the component reliability?

hourper1042.8

3

1053.2

31053.2

6

5

5

321

S

days4950

hours796,1181042.8

116

MTBF

993.0

)()800)(1042.8( 6

e

etR t

R1 R2 R3

1-35


Parallel Configuration

A system will fails when all the units fail.

It can be thought of as an “AND” relationship (i.e. the system fails if 1 and 2 and … and n fail)

At least one unit must succeed for a successful mission.

The reliability of the system is given by:

Rs = 1 – [(1-R1) × (1-R2)× … × (1-Rn)]1

2

3

n

.

.

1-36


Reliability Calculation for Parallel System

Calculate system reliability given R1 = 0.90 and R2 = 0.98.

RS = 1 – [(1 – R1)(1-R2)]

= 1 – [(1 – 0.90)(1 – 0.98)]

= 1 – (0.10)(0.02)

= 1 – 0.002

= 0.998

2

1

1-37


Combination of Basic Configurations

Any of the previous configuration types can be used simultaneously in one diagram.

Consider a system having subsystems.

1

43

2 6

5

1-38


Steps to calculate system reliability for combined series-parallel configuration

1. Break the system into smaller series and parallel arrangements.

2. Calculate reliability of each arrangement identified in step 1.

3. Finally, calculate RS using the reliability obtained in step 2.

1-39


k-out-of-n Redundancy

At times, a system function is such that k-out-of-n of its components need to be working for the system to function.

1

2

3

4

3/4

1

2

3

4

k/n

n

.

.

.

1-40



A node is used to signify k-out-of-n redundancy.

The basic property of the node is to define the number of incoming paths that must be “Good” for the system to be “Good”.

1-41



For n identical components (i.e. same reliability values), the system reliability is calculated as

1

2

3

4

k/n

n

.

.

.

!!

!

and

1

where

) workingare components least (at Prob

xnx

n

x

n

RRx

nxP

xP

kR

xnx

n

kx

s

Binomial distribution

1-42


Example: k-out-of-n Redundancy

A high pressure boiler is mounted with 5 identical pressure relief valves. Pressure inside the boiler is successfully controlled by any three of these valves. If the failure probability of a relief valve is 0.05, compute the reliability of pressure relief valve system.

Solution: This is 3-out-of-5 system where n = 5, R = 1 – 0.05 = 0.95.

99884.0

95.0195.0!55!5

!5

95.0195.0!45!4

!595.0195.0

!35!3

!5

1

555

454353

n

kx

xnxs RR

x

nR

1-43


AVAILABILITY

Definition

“The probability that a system or component is performing its required function at a given point in time or over a stated period of time when operated and maintained in prescribed manner”

(Ebeling, 1997)

1-44


AVAILABILITY

Three Types of Availability Measures

1. Inherent, Ai

2. Achieved, Aa

3. Operational, Ao

MTBF

(MTBF + MTTR)Ai =

MTBM

(MTBM + MMT)Ai =

Ao = Uptime

(Uptime + Downtime)

MTBM

(MTBM + MMT + MLDT)Ao =

(LDT + ADT)

MTBF = mean time between failureMTTR = mean time to repairMTBM = mean time between maintenanceMMT = mean maintenance timeMLDT = mean logistics down timeLDT = logistics delay timeADT = administrative delay time

Steady state availability which considers only corrective maintenance (CM)

Steady state availability which include both corrective maintenance (CM) and preventive maintenance (PM)

1-45


Operational Availability

Ao =UPTIME

UPTIME + DOWNTIME

Standby Time

Operating Time

Logistics Delay Time

(LDT)

Administrative Delay Time (ADT)

Corrective Maintenance Time

(CMT)

Preventive Maintenance Time

(PMT)

Parts availability Waiting for items / services

locating tools setting up test equipment finding personnel reviewing manuals

preparation time Fault location time Getting parts Correcting fault Test and check out

servicing Inspection overhaul

1-46

47

THANK YOU


References

Modarres, M., Kaminskiy, M. and Krivtsov, V. (1999) Reliability Engineering and Risk Analysis. Marcel Dekker, New York

OREDA Offshore Reliability Data Handbook, 4th Edition (2002) OREDA Participants

Ebeling, C. (1997), An Introduction to Reliability and Maintainability Engineering, McGraw-Hill Companies, Inc., Boston.

Wasson, C. S. 2006. System Analysis, Design, and Development. Hoboken, NJ, USA: John Wiley & Sons.

1-48

Reliability.Asset.Integrity Center Introduction to RELIABILITY and MAINTENANCE.

Documents

Transcript of Reliability.Asset.Integrity Center Introduction to RELIABILITY and MAINTENANCE.