Accelerated Stress Testing and Reliability Workshop October 9-11, 2013 San Diego, CA Accelerating...
-
Upload
german-dill -
Category
Documents
-
view
217 -
download
1
Transcript of Accelerated Stress Testing and Reliability Workshop October 9-11, 2013 San Diego, CA Accelerating...
Accelerated Stress Testing and Reliability WorkshopOctober 9-11, 2013 San Diego, CA
Accelerating Reliability into the 21st CenturyKeynote Presenter Day 1: Vice Admiral Walter MassenburgKeynote Presenter Day 2: Alain Bensoussan, Thales Avionics
&
CALL FOR PRESENTATIONS: We are now Accepting Abstracts. Email to: [email protected].
Guidelines on website www.ieee-astr.org
For more details, click here to join our LinkedIn Group:IEEE/CPMT Workshop on Accelerated Stress Testing and Reliability
This is the 3rd of a series of four webinars being put on by Ops A La Carte, ASTR, and ASQ Reliability Division
Each webinar will also be presented as a full 2 hour tutorial at our ASTR Workshop Oct 9-11th, San Diego.
Abstracts for presentations are due Apr 30.www.ieee-astr.org
Introduction
5 min
Accelerated Reliability Growth Testing
45 min
Questions
10 min
Agenda
Upcoming Reliability WebinarsTitle: 40 Years of HALT: What Have We Learned
Author: Mike Silverman
Date: Sept 12, 2013, 12pm EST
http://reliabilitycalendar.org/webinars/english/40-years-of-halt-what-have-we-learned/
Location: Webinar
HALT began 40 years ago with a simple idea of testing beyond specifications in order to better understand design margins. Over the past 40 years, thousands of engineers around the world have been exposed to the concepts of HALT and have tried the techniques.
This tutorial will explore what we have learned in the past 40 Years and what the future of HALT could be.
Registration Demographics
For this webinar we have signed up–250 Registrants–17 Countries–28 US States
Registration Question #1
Have you ever performed a Reliability Growth Test?
–Never 45%–All the time 25%–Tried Once 20%
Registration Question #1
For your last RGT, did you have a chance to plan the duration and stresses?
–Neither 50%–Both 25%–Duration Only 10%–Stresses only 10%
Copyright © 2012 Raytheon Company. All rights reserved. Customer Success Is Our Mission is a registered trademark of Raytheon Company.
Traditional and Accelerated Reliability Growth
The Case of Lost (and Found) Failure Rates
Milena Krasich, P. E. Raytheon, IDS
Page 10
¨ Identify shortcomings of traditional reliability growth testing and offer alternatives Reliability Growth Test objectives Explain traditional Reliability Growth test methodology along with the
assumptions Show shortfalls of the traditional methods
• Entire item failure rate not calculated and presented in results• Test duration too long for the modern high reliability items• Little or no relationship of reliability and stresses on the tested item
Show principles of the Physics of Failure test methodology Show how the Reliability growth test based on PoF is constructed Show how the expected stresses are applied and accelerated Show how to account for total final failure rates Show achieved considerable test cost reduction.
Tutorial Objectives
Page 11
¨ Overall test duration determined based on the initial and goal reliability measure: failure rates Mean Time Between Failures, MTBF (or MTTF)
¨ Initial failure rate estimated for the entire item and then used for calculations of reliability growth
¨ Reliability growth parameters and test duration determined based on the goal reliability - mathematically
¨ Magnitude (stress level of applied operational and environmental stresses equal to those in use – but not their duration Applied stress duration determined by engineering judgment, and level
by assumptions of some “mean” stress Overall test duration and stress application are unrelated to use profiles
or required life or mission of the product – only to mathematics¨ Additional errors:
Mathematical
Traditional RG Test Methodology
Page 12
¨ Goal: Increase the current (existing reliability – measured in mean time
between failures) Goal magnitude guided by:
• Requirement or commercial logic
¨ Item as designed contains design errors: Those are going to appear in test reasonably within the determined
test time The test errors are going to be eliminated by design corrections type
B failure modes) The test continuation will evaluate success of the fix. Design errors that cannot be fixed (type A failure modes) will
continuously be counted Failures determined to be random will not be counted Reliability growth will be measured.
Principles and Assumptions
Page 13
¨ Failure rate during the test is constant when there are no changes of the tested item
¨ Failure rate decreases with introduced design corrections in steps, and remains constant through the next change
¨ The step curve is fitted with a curve representing Non-Homogenous Poisson Process, NHPP) The process definition: failure rate is constant until changes occur.
¨ The facts not considered in application of that theory: The initial failure rate is just the total failure rate. No rationale how
much of it is attributed to:• Design problems that can be corrected• Random events (those failure modes one does not know where they
come from, they “just happen”)• Design problems that cannot be corrected for one of the reasons:
– Technically impossible– Economically not justifiable– Time to market constraints
Principles and Assumptions, cont.
Page 14
¨ The expected accumulated number of failures up to test time T is given by:
where
• l is the scale parameter;• b is the shape parameter (a function of the general effectiveness of the
improvements; (0 < b < 1, corresponds to reliability growth; b = 1 corresponds to no reliability growth; b > 1 corresponds to negative reliability growth- reliability degradation)
¨ The failure intensity when it is changing as a result of design improvements after T h of testing is given by:
Mathematical Model - Refresher
0 0, 0, , TTTNE with
0 ,E 1 tttNt
t withd
d 1
TT
.)(
.)(
constt
constt
rr
AA
)()()(
)()()()(1 tttt
tttt
rAItem
rABItem
1
1
)(
)(
tt
tt
ItemItem
rAItem
Page 15
¨ Failure modes types in test: Systematic: corrected in test (Type B), not corrected (Type A), Random -
constant
Mathematics of Traditional Reliability Growth
0
0,01
0,02
0,03
0,04
0,05
0,06
0 1000 2000 3000 4000 5000 6000
Failu
re in
tens
ity/f
ailu
re ra
te (f
ailu
res/
hour
)
Test duration (hours)
S(t)=A(t)+r(t)+B(t)
r(t)
B(t)
A(t)
The only failure modes with decreasing failure rates (power law)
)()()(
)()()()(1 tttt
tttt
rAItem
rABItem
Only type B failure modes failure rates are accounted for in a reliability test program – those that show growth expressed by the power law model; the type A and random remain constant.
1)( ttItem
Page 16
¨ To plan a reliability growth, the initial value of failure rate, lI or initial mean time between failures, qI, was assumed as known at some time tI. This initial failure rate would have a value that was known by experience for that item or by similarity with another like item, wI(tI)=constant
¨ The thought process was then that this initial failure rate would decrease under the rules of the power law and at the end of the test with the corrections would assume a final value (a constant again), wF(tF).
¨ The Crow/AMSAA/Duane planning model is simple and easy to implement:
¨ But, the initial failure rate has three components, only one of those can be improved and fitted with the power law, the failure rate of the B failure modes. The remaining components are constant.
Planning Reliability Growth
1
I
II t
ttt
Page 17
¨ The remaining two components are constant. The final failure rate as a function of time also contains three components, two constant and one only that can be fitted with the power law:
¨ The final B-modes failure rate is then made of the improved B-type failure modes failure rate and the total final item or system failure rate contains also two additional constant components:
Planning Reliability Growth, cont.
rAI
FIBIF
rAFBFF
FrFAFFF
I
FIBIFBF
rAIBII
IrIAIBII
t
ttt
tt
tttt
t
ttt
tt
tttt
1
1
1
)()(
)()(
)()()(
)(
)()(
)()()()(
Page 18
¨ The random failure rates are not recorded or taken into account, the A-type failures are considered in the number of failures it is said that they are included into the shape parameter calculations but
there is no example in current Handbooks that would show how it was done It is also stated that the Type A failure modes are counted every time they
show up, repetitions included; no example of that statement could be found
¨ Given that there is no improvement applied, type A failure modes should be treated in the same manner as the random failure rates. They could be separately accounted for, but numerically, their failure rate will be added to the random failure rate.
¨ This means that during the test, the A type failure modes should be counted as another group of constant failure rates In which case the methodology of the fixed duration testing should
be applied to determine failure rates for both:• The A – type failure modes• All other random failure modes where the origin is not identifiable.
A Failure Modes
Page 19
¨ Test duration is mathematically determined from the reciprocal of the “failure rate” as:
Where: qF = final product MTBF (for mitigated. “fixed” failure modes only) – given goal
qI = initial product MTBF (for failure modes that will be mitigated) - assumed
tF =test duration needed to achieve the final MTBF for fixed failure modes
tI = initial test time (has various explanations) – assumed – what is it?
Example – old school: qI=4,000 hours,
qF=10,000, b = 0.6
Present Method to Determine Test Duration
1
111 t
ttt F
FF
1
11 log1
loglogt
tt
F
FF
et
0 400 800
0
2 103
4 103
6 103
8 103
1 104
Initial Test Time (hours)
Tes
t D
ura
tion (
hours
)
tF tI
tI
Page 20
¨ In the traditional test design, the initial test MTBF is the MTBF assumed for the product, but: The reciprocal of this initial MTBF is the initial failure rate made up of
three components, two of them are constant, not Power Law:• Design – correctable• Design – non correctable• Random failure rates or failure modes
It is only the design failure modes that can be corrected (B type) that can be fitted by the Power Law (Weibull Intensity Function), thus:
• What part of the entire item initial assumed, estimated failure rate could those correctable failure modes could be?
• Analytical prediction contains only the random failure rates– If the Design Engineering is reasonably competent, Type A or B failure modes
could be at the most 40% of the assumed initial failure rate – B failure rate could be only a small fraction of the estimated product failure rate
before the test.
Initial MTBF – What is It?
BIIBI
IBI tt
1
Page 21
¨ Recorded in test are cumulative times of occurrence of A and B failure modes.
¨ A modes are not addressed, they should not be a part of the power
law – handbook text suggested they are counted, if they were it would have been in error
From test data, shape and scale parameters are determined
¨ The reported failure rate and MTBF are:
¨ Random and A modes do not seem to be a part of the achieved growth. They are unfortunately - forgotten.
Parameters and Results
0 ,E 1 tttNt
t B withd
dB
1
TT
BB
T
N
ttN
N
tTN
N B
N
iiB
BN
iiB
B
BB
+ : Unbiased
00
0lnln
1;
lnln
ˆ
1
1
+
+
++
++
TT
TT
1B
B
Page 22
¨ If initial test time was assumed to be 200 hours¨ Traditional test (all failure rates – power law):
Initial failure rate: lI = 2.5×10-4 f/hr
Initial MTBF: qI = 4,000 hours
Final MTBF: qF = 10,000 hours
Final test time: 1,976 hours (from the initial time)¨ True status, only B-type failure modes improved (e.g. maximum 40% of the
old “initial” failure rate: lI = 2.5×10-4 f/hr
Initial failure rate for B modes: lI = 0.4 ×2.5×10-4 f/hr = 1×10-4 f/hr
Initial MTBF: qIB = 10,000 hours
Possible final MTBF for B modes: qFB = 30,000 hours
Overall final failure rate B modes + random and A modes: 1,833 ×10-4 Final overall MTBF: qF = 5,544 hours
Final test time: 3,118 hours (from the initial time)¨ The forgotten, unreported failure rate: = 1.5×10-4 f/hr
Comparison
Page 23
¨ The possible correct solution: Prepare a reliability growth test for only B failure modes Count A type failure modes as if they are random Count random failures Calculate final B failure modes failure rate and MTBF Add the constant A and random failure rates to get results
¨ Possible problems - difficulties: The calculated mathematical test duration is unrelated to use stresses or use
profile The traditionally determined test duration is too short to account for the random
failures, normally the required test duration for a reasonable confidence is about 10 MTBFs (in our example would be about 70,000 hours)• The traditional RG test duration does not support this test time
A short reliability growth test does not disclose any cumulative damage or failures of small failure rates that would start showing only after the test is complete, while useful life of the item could be 10 or 20 years
¨ The proposed viable solution – accelerated Reliability Growth test.
The Solution – Way Forward
Page 24
¨ Failures occur when an item is not strong enough to withstand one or more attributes of a stress: Level, duration, or repetitions of its application
• The higher the level the shorter duration or less repetitions induce a failure
• If the mean of strength is a k times multiple of the mean of stress (load) and the standard deviations of each are a and b times their respective mean values, reliability of an item regarding each use stress (i), and the total reliability will be:
Physics of Failure and Reliability
The area of overlap of strength and stress distributions represents probability of failure for each of the stresses;mL, sL = mean and standard deviation of the load distribution sL = b× mL
mS, sS = mean and standard deviation of the strength distribution sS = a × mS
2_
2_
___ ),(
iLiL
iLiLiLi
bka
kkR
S
iiStressItem tRtR
i
10 )()(
Page 25
¨ Allocate reliability regarding each of the expected stresses in use The cumulative damage and ultimately failure due to a stress is proportional
to the stress level and its duration. For the stress applied at the same level as in life, the cumulative damage model is:
Physics of Failure Reliability – Margin k Selection
t
dttStD )()(
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50
Re
liab
ility
Multiplier k
b=0,5a=0,05
b=0,2a=0,05
b=0,05a=0,05
b=0,2a=0,02
b=0,1a=0.02
b=0,05a=0,02
For the allocated reliability regarding each stress, select the value of margin k which would multiply its duration in use to be applied in test;Apply stresses simultaneously whenever possible;If the same stress type is applied at different levels in use, recalculate their durations to the highest level (using acceleration factors);The most common values for a and b are:a = 0.05, b = 0.2
Rel
iabi
lity
Page 26
¨ Each of the stresses is accelerated in test to allow for shorter test duration
¨ Total item failure rate is the sum of its failure rates regarding each individual stress (l0 is the item total failure rate in use condition and lA is the accelerated item total failure rate (in reliability growth l is equivalent to w):
¨ Product j exists when the stresses 1 to j produce the same failure mode.¨ Stress acceleration models for different stresses – example:
inverse power law model (usually applicable to thermal cycling, vibration, shock, humidity);
Arrhenius model (used for temperature acceleration using absolute temperature);
Eyring model (used also when the thermal stress is a factor in process acceleration);
step stress model, where the stress is increasing in steps; fatigue model representing the degradation due to the repetitious stress.
Test Acceleration
SN
ii
ijjTestA AA
10
Page 27
Test Example B Failure Modes – duration k×life
Stresses: Thermal cycling Thermal exposure (thermal dwell) Humidity Vibration Operational cycling
Thermal cycling
One thermal cycle in test = 24 hours in life
.
m
Use
TestTC T
TA
3/1
_
Uset
TestRateRampA
RateRampTC
UseTCTestTC AA
kNN
_
__
Determination of factor k – for major stresses:
k=1.5 946.0)()( 41
000 tRtRi
0,5
0,55
0,6
0,65
0,7
0,75
0,8
0,85
0,9
0,95
1
1,00 1,05 1,10 1,15 1,20 1,25 1,30 1,35 1,40 1,45 1,50
Relia
bilit
y
Multiplier k
a=0,1b=0,1
Thermal dwell (normalize exposure when OFF to duration at ON temperature):
Duration of accelerated exposure:
hours 754,8
273
1
273
1exp
_
_
NON
ONOFFB
aOFFONNON
t
TTk
Ettt
h 1.168
273
1
273
1exp
_
__
TestT
TestONB
aNONTestT
t
TTk
Ektt
Parameter Symbol Value
Required life t0 10 years = 87 600 h
Required reliability R0(t0) 0,8
Time ON tON 2 h/day=7 300 h
Temperature ON TON 65 °C
Time OFF tOFF 22 h/day=80 300 h
Temperature OFF TOFF 35 °C
Thermal cycling TUse 45 °C, two times per day
Total cycles NUse 7 300
Temperature ramp rate 1,5 °C/min
Vibrations, random WUse 16,68 m/sec2 r.m.s
Relative humidity RHUse 50 %
Activation energy Ea 1,2 eV
Page 28
The thermal exposure is combined with the thermal cycling, distributed over the high temperature:
The test cycle profile:
Humidity: Test 95% RH and temperature TRH= 85 °C (65 °C chamber + 20 °C internal temperature rise)
Vibration: 150,000 miles, 150 hours per axis vibration at 1.7 g rms. Test level: 3.2 g rms To project test time to life use acceleration factor to multiply test time
Data for reliability plotting:
Initial B failure modes MTBF 100,000 hours, final 106hours
Initial test time: 100 hours
Total traditional test time: 4.6x103hours
Final test reliability (B failure modes): 0.99997
Final MTBF (improved failure modes):1,431,964 hours
Total accelerated test time; 526 hours
Test Example, Cont.
h 0.875 min 3.5253.2210
1252
coldat DwellDwell) ThermalionStabilizat (temp. time)ramp(2
TC
TC
t
t
h 300
273
1
273
1exp
_
___
TestRH
RHONB
ah
Test
UseNONTestTestRH
t
h
TTk
E
RH
RHtt
2.3
axisper hours 18
4:With
_
__
TestVib
w
Test
UseUseVibTestVib
t
w
W
Wtkt
ilure Time to failure
h
Cumulative time to
failure (n=24)
q(t) log(t) log[q(t)]
1 3,821.33 91,711,92 91 ,711.92 4.96 4.96
2 5,781.33 138,751.92 69 ,375.96 5.14 4.84
3 14,016 336,384.00 112 ,128 5.53 5.05
4 18,563.44 445 522,56 111, 380.64 5.65 5.05
t0*k 131.400 3 ,153 ,600 788 ,400 6.50 5.90
Page 29
¨ The test duration covers product entire life It allows detection of all design problems, not only those that appear in a
small fraction of product life It enables estimate of failure rate regarding product random events,
disregarded in traditional RG testing The failure rate achieved by design improvement with the random failure
rate provides realistic estimate of total product reliability¨ Test duration is determined based on required total reliability in view of
product physical cumulative damage from life stresses in use;¨ Test acceleration allows achievement of very reasonable test duration,
shorter than traditional mathematically derived testing The reliability improvement through test is no longer cost prohibitive
¨ Test failure times are projected to their appearance in real life and the analysis uses this data;
¨ Even though covering the product expected life (durability information), it is still considerably shorter than the traditional reliability growth test.
Why Accelerated Reliability Growth?
Page 30
¨ [email protected]¨ Milena Krasich is a Senior Principal Systems Engineer in Raytheon Integrated Defense
Systems, Whole Life Engineering in RAM Engineering Group, Sudbury, MA.¨ Prior to joining Raytheon, she was a Senior Technical Lead of Reliability Engineering in Design
Quality Engineering of Bose Corporation, Automotive Systems Division. Before joining Bose, she was a Member of Technical Staff in the Reliability Engineering Group of General Dynamics Advanced Technology Systems formerly Lucent Technologies, after the five year tenure at the Jet Propulsion Laboratory in Pasadena, California. While in California, she was a part-time professor at the California State University Dominguez Hills, where she taught graduate courses in System Reliability, Advanced Reliability and Maintainability, and Statistical Process Control. At that time, she was also a part-time professor at the California State Polytechnic University, Pomona, teaching undergraduate courses in Engineering Statistics, Reliability, SPC, Environmental Testing, Production Systems Design,. She holds a BS and MS in Electrical Engineering from the University of Belgrade, Yugoslavia, and is a California registered Professional Electrical Engineer. She is also a member of the IEEE and ASQC Reliability Society, and a Fellow and the president Emeritus of the Institute of Environmental Sciences and Technology. Currently, she is the Technical Advisor (Chair) to the US Technical Advisory Group (TAG) to the International Electrotechnical Committee, IEC, Technical Committee, TC56, Dependability. As a part of the TC56 Working groups she is working on dependability/Reliability standards as a project leader for revision of many released and current international standards such as IEC/IEEE/ANSI Reliability Growth IEC 61014 and IEC 61164, Fault Tree Analysis IEC /ANSI 61025, Testing for the constant failure rate and failure intensity (Reliability compliance/demonstration tests), IEC/ANSI 61124 and FMEA, IEC/ANSI 60812, and for preparation of the new IEC standard on Accelerated Testing, IEC 62506.
Biography
Page 31
Upcoming Reliability Webinars
Title: 40 Years of HALT: What Have We Learned
Author: Mike Silverman
Date: Sept 12, 2013, 12pm EST
http://reliabilitycalendar.org/webinars/english/40-years-of-halt-what-have-we-learned/
Location: Webinar
HALT began 40 years ago with a simple idea of testing beyond specifications in order to better understand design margins. Over the past 40 years, thousands of engineers around the world have been exposed to the concepts of HALT and have tried the techniques.
This tutorial will explore what we have learned in the past 40 Years and what the future of HALT could be.