Probabilistic design for reliability (pdfr) in electronics part1of2
-
Upload
asq-reliability-division -
Category
Technology
-
view
1.758 -
download
1
description
Transcript of Probabilistic design for reliability (pdfr) in electronics part1of2
Probabilistic Design for Reliability (PDfR) in
El iElectronicsPart IPart I
Dr. E. Suhir©2011 ASQ & Presentation SuhirPresented live on Jan 03~06th, 2011
http://reliabilitycalendar.org/The_Reliability Calendar/Short Courses/Shliability_Calendar/Short_Courses/Short_Courses.html
ASQ Reliability DivisionASQ Reliability Division Short Course SeriesShort Course Series
One of the monthly webinarsOne of the monthly webinars on topics of interest to reliability engineers.
To view recorded webinar (available to ASQ Reliability ) /Division members only) visit asq.org/reliability
To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events
http://reliabilitycalendar.org/The_Reliability Calendar/Short Courses/Shliability_Calendar/Short_Courses/Short_Courses.html
Dr. E. Suhir Page 1
PROBABILISTIC DESIGN for RELIABILITY (PDfR) CONCEPT,
the Roles of Failure Oriented Accelerated Testing (FOAT)
and Predictive Modeling (PM), and
a Novel Approach to Qualification Testing (QT)
“You can see a lot by observing”
Yogi Berra, American Baseball Player
“It is easy to see, it is hard to foresee”
Benjamin Franklin, American Scientist and Statesman
E. Suhir Bell Laboratories, Physical Sciences and Engineering Research Division, Murray Hill, NJ (ret),
University of California, Dept. of Electrical Engineering, Santa Cruz, CA,
University of Maryland, Dept. of Mechanical Engineering, College Park, MD, and
ERS Co. LLC, 727 Alvina Ct. Los Altos, CA, 94024, USA
Tel. 650-969-1530, cell. 408-410-0886, e-mail: [email protected]
Four hour ASQ-IEEE RS Webinar short course
January 3-6, 2011
Dr. E. Suhir Page 2
Contents
Session I
1. Introduction: background, motivation, incentive
2. Reliability engineering as part of applied probability and Probabilistic Risk
Management (PRM) bodies of knowledge
3. Failure Oriented Accelerated Testing (FOAT): its role, attributes, challenges, pitfalls
and interaction with other accelerated test categories
Session II
4. Predictive Modeling (PM): FOAT cannot do without it
5. Example of a FOAT: physics, modeling, experimentation, prediction
Session III
6. Probabilistic Design for Reliability (PDfR), its role and significance
Session IV
7. General PDfR approach using probability density functions (pdf)
8. Twelve steps to be conducted to add value to the existing practice
9. Do electronic industries need new approaches to qualify their devices into products?
10. Concluding remarks
Dr. E. Suhir Page 3
1. Introduction: background, motivation, incentive
“Vision without action is a daydream.
Action without vision is a nightmare”
Japanese saying
“The problem is not that old age comes.
The problem is that young age passes”
Common Wisdom
Session I
Dr. E. Suhir Page 4
Background
� The short-term down-to-earth and practical goal of a particular electronic or a
photonic device manufacturer is to conduct and pass the established qualification
tests, without questioning whether they are perfect or not
� The ultimate long-term and broad goal of electronic, opto-electronic and photonic
industries, regardless of a particular manufacturer or even a particular product, is to
make the industries deliverables sufficiently reliable in the field, be consistently good
in performance, and so to elicit trust of the customer
� Qualification testing (QT), such as, e.g., those prescribed by the JEDEC, Telcordia,
AEC or the MIL specs, is the major means that the electronic, opto-electronic and
photonic industries use to make their viable-and-promising devices into reliable-and-
marketable products.
Dr. E. Suhir Page 5
Motivation
�It is well known, however, that devices and systems that passed the existing qualification tests
often fail in the field. Should it be this way? Is this a problem indeed? Are the existing qualification
specifications adequate? Do electronic and photonic industries need new approaches to qualify
their devices into products?
� If they do, could the today’s qualification specifications and testing procedures be improved to
an extent that if the device passed these tests, its performance in the field would be satisfactory?
�On the other hand, there is a perception, perhaps, a rather substantiated one, that some electronic
components “never fail”. Although one should never say “never”, such a perception exists because
some products might be too robust and, as the consequence of that, are more costly than
necessary. Could the situation be changed and could the cost be brought down considerably, if one
would be able to assess the actual, most likely superfluous, probability of non-failure in the field
and come up, for a particular product, with the best compromise between reliability, cost and time-
to-market?
�Would it be possible to “prescribe” (specify), predict and, if necessary, even control the low
enough probability of failure for a product that operates under the given stress (not necessarily
mechanical, of course) conditions for the given time?
Dr. E. Suhir Page 6
Incentive
�We argue that the improvements in the QT, as well as in the existing best practices, are
indeed possible, provided that the Probabilistic Design for Reliability (PDfR) concept is
thoroughly developed and the corresponding methodologies are employed
�One effective way to improve the existing QT and specs is to
�conduct, on a wide scale, the appropriate Failure Oriented Accelerated Testing (FOAT) at both the
design stage (DFOAT) and the manufacturing stage (MFOAT), and, since DFOAT cannot do without
predictive modeling (PM),
�carry out, whenever and wherever possible, PM to understand the physics of failure, and to
predict, based on the DFOAT, the probability of failure in the field,
�revisit, review and revise, considering the DFOAT and, to a lesser extent, MFOAT data obtained
for the most vulnerable elements of the device of interest, the existing QT practices, procedures,
and specifications,
�develop and widely implement the PDfR methodologies and algorithms having in mind that
“nobody and nothing is perfect” and that probability of failure in the field is never zero, but could be
predicted and, if necessary, minimized, controlled and maintained at an acceptable low level during
product operation.
Dr. E. Suhir Page 7
2. Reliability engineering as part of applied probability
and Probabilistic Risk Management (PRM)
bodies of knowledge
“A pinch of probability is worth a pound of perhaps”
James G. Thurber, American writer and cartoonist
“In a long run we are all dead”
John Maynard Keynes, British economist
Dr. E. Suhir Page 8
Reliability engineering
� deals with failure modes and mechanisms, “root” causes of occurrence of various
failures, role of various defects, methods to estimate and prevent failures, and
probability-based designs for reliability;
� provides guidance on how to make a viable device into a reliable and marketable
product;
� in products, for which a certain level of failures is considered acceptable (such as, e.g.,
consumer products), examines ways of bringing down the failure rate to an allowable
level;
� for products, for which a failure is a catastrophe, examines and considers ways of
making the probability of failure as low as necessary or possible.
Dr. E. Suhir Page 9
Reliability engineering as part of applied probability and
probabilistic risk management (PRM) bodies of knowledge
�Reliability is part of applied probability and probabilistic risk management (PRM) bodies
of knowledge, and includes the item's (system's) dependability, durability, maintainability,
reparability, availability, testability, etc., i.e., probabilities of the corresponding events or
characteristics
� Each of these characteristics is measured as a certain probability and could be of a
greater or lesser importance depending on the particular function and operation
conditions of the item or the system, and consequences of failure
�Applied probability and Probabilistic Risk Management (PRM) approaches and
techniques put the art of Reliability Engineering on a solid “reliable” ground.
E. Suhir
““If a man will begin with certainties, he will end with doubts; bIf a man will begin with certainties, he will end with doubts; but if he will be ut if he will be content to begin with doubts, he shall end in certainties.content to begin with doubts, he shall end in certainties.””
Sir Francis Bacon, English Philosopher and StatesmanSir Francis Bacon, English Philosopher and Statesman
“We see that the theory of probability is at heart only common sense reduced to calculations; it makes us appreciate with exactitude what reasonable minds
feel by a sort of instincts, often without being able to account for it… The most important questions of life are, for the most part, really only problems of
probability.”Pierre Simon, Marquise de Laplace
““Mathematical formulas have their own life, they are smarter thanMathematical formulas have their own life, they are smarter than we, even we, even
smarter than their authors, and provide more than what has been smarter than their authors, and provide more than what has been put into put into
themthem””
Heinrich Hertz, German PhysicistHeinrich Hertz, German Physicist
Dr. E. Suhir Page 11
Reliability should be taken care of on the permanent basis
The reliability evaluation and assurance cannot be delayed until the device is made (although it is often the case in many actual industries). Reliability should be
� “conceived” at the early stages of its design (a reliability and an electronic engineers should start working together from the very beginning of the device/system development),
� implemented during manufacturing (through a high quality manufacturing process)
� qualified and evaluated by electrical, optical, environmental and mechanical testing both at the design and the manufacturing stages (the customer requirements and the general qualification requirements are to be considered),
� checked (screened) during production (by implementing an adequate burn-in process) and, if necessary and appropriate,
� maintained in the field during the product’s operation, especially at the early stages of the product’s use (by employing, e.g., technical diagnostics, prognostication and health monitoring methods and instrumentation).
Dr. E. Suhir Page 12
Three classes of engineering products
from the reliability point of view
� Class I. The product has to be made as reliable as possible. Failure should not be
permitted. Examples are some military or space objects
� Class II. The product has to be made as reliable as possible, but only for a certain level
of demand (stress, loading). Failure is a catastrophe. Examples are civil engineering
structures, bridges, ships, aircraft, cars
� Class III. The reliability does not have to be very high. Failures are permitted, but
should be restricted. Examples are consumer products, commercial electronics,
agricultural equipment.
See E.Suhir, Applied probability for engineers and scientists, McGraw-Hill, 1997
Dr. E. Suhir Page 13
Class I (military or similar) products
� The product (object) has to be made as reliable as possible. Failure is viewed as a
catastrophe. Examples are some warfare, military aircraft, battle-ships, spacecraft
� Cost is not a dominating factor
� The products usually have a single customer, such as the government or a big firm
� The reliability requirements are defined in the form of government standards
� The standards not only formulate the reliability requirements for the product, but also
specify the methods that are to be used to prove (demonstrate) the reliability, and
often even prescribe how the system must be manufactured, tested and screened
� It is typically the customer, not the manufacturer, who sets the reliability standards.
Dr. E. Suhir Page 14
Class II (industrial or similar) products
�The product (system, structure) has to be made as reliable as possible, but only for a certain
specified level of loading (demand). If the actual load (waves, winds, earthquakes, etc.) happens to
be larger than the design demand, then the product might fail, although the probability of such a
failure should be determined beforehand and should/could be (made) very small
�Examples are: long-haul communication systems, civil engineering structures (bridges, tunnels,
towers), passenger elevators, ocean-going vessels, offshore structures, commercial aircraft,
railroad carriages, cars, some medical equipment
� These are highly expensive products, which are produced in large quantities, and therefore
application of Class I requirements will lead to unjustifiable, unfeasible and unacceptable
expenses. Failure is a catastrophe and might be associated with loss of human lives and with
significant economic losses
�The products are typically intended for industrial, rather than government, markets. These
markets are characterized by rather high volume of production (buildings, bridges, ships, aircraft,
automobiles, telecommunication networks, etc.), but also by fewer and more sophisticated
customers than in the commercial (Class III) market.
Dr. E. Suhir Page 15
Class III (consumer, commercial) products
�The typical market is the consumer market. An individual consumer is a very small part of the
total consumer base. The product is inexpensive and manufactured in mass quantities
�The demand for the product is usually driven by the cost of the product and time-to-market, rather
than by its reliability. As long as the product is “sellable”, its reliability does not have to be very
high: it should only be adequate for customer acceptance and reasonable satisfaction. Simple and
innovative products, which have a high degree of customer appeal and are in significant demand,
may be able to prosper, at least for some time, even if they are not very reliable
�Failure is not a catastrophe: a certain reasonable level of failures during normal operation of the
product is acceptable, as long as the failure rate is within the anticipated/expected range
�Reliability testing is limited, and the improvements are often implemented based on the field
feedback
�It is typically the manufacturer, not the consumer, who sets the reliability standards, if any, for the
product . No special reliability standards are often followed, and it is the customer satisfaction (on
the statistical basis), which is the major criterion of the viability and quality of the product.
Dr. E. Suhir Page 16
Reliability, cost-effectiveness, and time-to-market
� Reliability, cost effectiveness and time-to-market considerations play an important role in the design, materials selection and manufacturing decisions, and are the key issues in competing in the global market-place. A company cannot be successful, if its products are not cost effective, or do not have a worthwhile lifetime and service reliability to match the expectations of the customer. Too low a reliability can lead to a total loss of business
� Product failures have an immediate, and often dramatic, effect on the profitability and even the very existence of a company. Profits decrease as the failure rate increases. This is due not only to the increase in the cost of replacing or repairing parts, but, more importantly, to the losses due to the interruption in service, not to mention the “moral losses”. These make obvious dents in the company’s reputation and, as the consequence of that, affect its sails
� The time to develop and to produce products is rapidly decreasing. This circumstance places a significant pressure on both business people and reliability engineers, who are supposed to come up with a reliable product and to confirm its long-term reliability in a short period of time to make their device a product and to make this product successful in the marketplace
� Each business, whether small or large, should try to optimize its overall approach to reliability.
“Reliability costs money”, and therefore a business must understand the cost of reliability, both
“direct” cost (the cost of its own operations), and the “indirect” cost (the cost to its customers
and their willingness to make future purchases and to pay more for more reliable products).
Dr. E. Suhir Page 17
3. Failure Oriented Accelerated Testing (FOAT): its role,
attributes, challenges, pitfalls and interaction with
other accelerated test categories
“Nothing is impossible. It is often merely for an excuse that we
say that things are impossible”
Francois de La Rochefoucauld, French philosofer
“Truth is really pure and never simple”
Oscar Wilde, British writer, “The Importance of Being Earnest”
Dr. E. Suhir Page 18
Why accelerated tests?
� It is impractical and uneconomical to wait for failures, when the mean-time-to-failure for a typical today’s electronic device (equipment) is on the order of hundreds of thousands of hours
� Accelerated testing (AT) enables one to gain greater control over the reliability of a product
� AT has become a powerful means in improving reliability. This is true regardless of whether (irreversible or reversible) failures will or will not actually occur during the FOAT (“testing to fail”) or QT (“testing to pass”)
� In order to accelerate the material’s (device’s) degradation and/or failure, one has to deliberately “distort” (“skew”) one or more parameters (temperature, humidity, load, current, voltage, etc.) affecting the device functional and/or mechanical performance and/or its environmental durability.
Dr. E. Suhir Page 19
Accelerated test categories: traditional definitions
Accelerated
test type
(category)
Product development
(verification) tests
(PDTs)
Qualification (“screening”)
tests (QTs)
Accelerated life tests
(ALTs), highly accelerated
life tests (HALTs), and
failure oriented accelerated
tests (FOATs)
Objective
Technical feedback to
ensure that the taken
design approach is
viable (acceptable)
Proof of reliability;
demonstration that the
product is qualified to
serve in the given capacity
Understand modes and
mechanisms of failure and ,
time permitting, accumulate
failure statistics
End point
Time, type, level, and/or
number of failures
Predetermined time and/or
the # of cycles, and/or the
excessive (unexpected)
number of failures
Predetermined number or
percent of failures
Follow-up
activity
Failure analysis, design
decision
Pass/fail decision Failure analysis and , time
permitting, statistical
analysis of the test data
Perfect (ideal)
test
Specific definition(s) No failure in a long time Numerous failures in a
short time
Accelerated test categories: updated definitions
Accelerated test type (category)
Product development (verification) testing (PDT)
Qualification (“screening”) testing
(QT)
Accelerated Life Testing (ALT)= =Failure Oriented Accelerated Testing (FOAT)
at the design
stage (DQT)
at the manufacturing stage (MQT)
at the design stage (DFOAT) At the manufacturing stage (MFOAT)= Hobbs’ Highly ALT (HHALT)= Accelerated burn-in
Objective Technical feedback to
ensure that the taken design
approach is viable (acceptable)
Proof of reliability; demonstration that the item is qualified into a product, i.e., is able to serve in the
given capacity
Understand physics (modes and mechanisms) of failure, failure limits, and, time permitting, accumulate
failure statistics
Assess failure limits, Weed out infant mortalities
End point Time, type, level, and/or number of
failures
Predermined time and/or number of cycles, and/or the excessive (unexpected)
number of failures
Predetermined number or percent of failures
Follow-up activity
Failure analysis, Design decision
Pass/fail decision
Failure analysis and, time permitting, also statistical analysis
of the test data
Pass/fail decision
Perfect (ideal) test
Specific definitions
No failures in a long time
Numerous failures in a short time No failures in a long time
Some most common accelerated test conditions (stimuli)� High Temperature (Steady-State) Soaking/Storage/ Baking/Aging/ Dwell,
� Low Temperature Storage,
� Temperature (Thermal) Cycling,
� Power Cycling,
� Power Input and Output,
� Thermal Shock,
� Thermal Gradients,
� Fatigue (Crack Propagation) Tests,
� Mechanical Shock,
� Drop Shock (Tests),
� Random Vibration Tests,
� Sinusoidal Vibration Tests (with the given or variable frequency),
� Creep/Stress-Relaxation Tests,
� Electrical Current Extremes,
� Voltage Extremes,
� High Humidity,
� Radiation (UV, cosmic, X-rays),
� Altitude,
� Space Vacuum
Elevated Stress
� AT uses elevated stress level and/or higher stress-cycle frequency as effective stimuli
to precipitate failures over a much shorter time frame
� The “stress” in reliability engineering does not necessarily have to be a mechanical or
a thermo-mechanical: it could be electrical current or voltage, high (or low)
temperature, high humidity, high frequency, high pressure or vacuum, cycling rate, or
any other factor (stimulus) responsible for the reliability of the device or the equipment
� AT must be specifically designed for the product under test
� The experimental design of AT should consider the anticipated failure modes and
mechanisms, typical use conditions, and the required or available test resources,
approaches and techniques.
Dr. E. Suhir Page 23
Qualification Testing (QT) is a must
� The objective of the qualification testing (QT) is to prove that the reliability of the
product-under-test is above a specified level. This level is usually measured by the
percentage of failures per lot and/or by the number of failures per unit time (failure
rate)
�The typical requirement is no more than a few percent failing parts out of the total
lot (population)
�QT enables one to “reduce to a common denominator” different products, as well
as similar products, but produced by different manufacturers
�QT reflects the state-of-the-art in a particular field of engineering, and typical
requirements for the performance of the product. Industry cannot do without QT
�Testing is time limited and is generally non-destructive (not failure oriented).
Dr. E. Suhir Page 24
Today’s Qualification Testing (QT): shortcomings
�The today’s qualification standards and requirements are only good for what they
are intended - to confirm that the given device is qualified into a product to serve in a
particular capacity
�If a product passed the standardized qualification tests, it is not always clear why it
was good, and if the product failed the tests, it is equally unclear what could be done
to improve its reliability
� If a product passed the qualification tests, it does not mean that there will be no
failures in the field, nor it is clear how likely or unlikely these failures might be
� Since QT is not failure oriented, it is unable to provide the most important ultimate
information about the reliability of the product - the probability of its failure after the
given time in service and under the given service (operation, stress) conditions.
Dr. E. Suhir Page 25
Failure Oriented Accelerated Testing (FOAT)-1
�FOAT is aimed at the revealing and understanding the physics of the expected or
occurred failures. Unlike QTs, FOAT is able to detect the possible failure modes and
mechanisms
�Another objective of the FOAT is to accumulate failure statistics. Thus, FOAT deals with
the two major aspects of the Reliability Engineering – physics and statistics of failure
�Adequately planned, carefully conducted, and properly interpreted FOAT provides a
consistent basis for the prediction of the probability of failure after the given time in
service. Well-designed and thoroughly implemented FOAT can dramatically facilitate the
solutions to many engineering and business-related problems, associated with the cost
effectiveness and time-to-market
�This information can be helpful in understanding what should be changed to design a
viable and reliable product. Indeed, any structural, materials and/or technological
improvement can be “translated”, using the FOAT data, into the probability of failure for
the given duration of operation under the given service (environmental) conditions.
Dr. E. Suhir Page 26
Failure Oriented Accelerated Testing (FOAT)-2
�FOAT should be conducted in addition to, and, preferably, long before the
qualification tests. There might be also situations, when FOAT can be used as an
effective substitution for the QT, especially for new products, when acceptable
qualification standards do not yet exist
� While it is the QT that makes a device into a product, it is the FOAT that enables one
to understand the reliability physics behind the product and, based on the appropriate
PM, to create a reliable product with the predicted probability of failure
�There is always a temptation to broaden (enhance) the stress as far as possible to
achieve the maximum “destructive effect” (FOAT effect) in a shortest period of time.
Unfortunately, sometimes, accelerated test conditions may hasten failure mechanisms
that are different from those that could be actually observed in service conditions
(“shift” in the modes and/or mechanisms of failure)
Dr. E. Suhir Page 27
FOAT pitfalls
�Because of the existence of such “shifts”, it is always necessary to correctly identify the
expected failure modes and mechanisms, and to establish the appropriate stress limits, in
order to prevent “shifts” in the original (actual) dominant failure mechanism
�Examples are: change in materials properties at high or low temperatures, time-
dependent strain due to diffusion, creep at elevated temperatures, occurrence and
movement of dislocations caused by an elevated stress, or a situation when a bimodal
distribution of failures (a dual mechanism of failure) occurs
� Since, particularly, infant mortality (“early”) failures might occur concurrently with the
anticipated (“operational”) failures, it is imperative to make sure that the “early” and
“operational” failures are well separated in the tests
� Different failure mechanisms are characterized by different physical phenomena and
different activation energies, and therefore a simple superposition of the effects of two
mechanisms is unacceptable: it can result in erroneous reliability projections.
Dr. E. Suhir Page 28
Burn-in testing (BIT) is a special type of FOAT-1
� Burn-in (“screening”) testing (BIT) is widely implemented to detect and eliminate infant
mortality failures. BIT could be viewed as a special type of manufacturing FOAT (MFOAT).
BIT is needed to stabilize the performance of the device in use
�BIT is supposed to stimulate failures in defective devices by accelerating the stresses
that will cause these devices to fail without damaging good items. The bathtub curve of a
device that undergone BIT is supposed to consist of a steady state and wear-out portions
only.
�The rationale behind the BIT is based on a concept that mass production of electronic
devices generates two categories of products that passed QT:
�1) robust (“strong”) components that are not expected to fail in the field and
�2) relatively unreliable (“week”) components (“freaks”) that, if shipped to the customer,
will most likely fail in the field
�BIT can be based on high temperatures, thermal cycling, voltage, current density, high
humidity, etc., and is performed by either manufacturer or by an independent test house.
Dr. E. Suhir Page 29
Burn-ins – special type of FOAT-2
�For products that will be shipped out to the customer, BIT is nondestructive
�BIT is a costly process, and therefore its application must be thoroughly monitored. BIT
is mandatory on most high-reliability procurement contracts, such as defense, space, and
telecommunication systems. In the today’s practice BIT is often used for consumer
products as well. For military applications the BIT can last as long as a week (168 hours).
For commercial applications burn-ins typically do not last longer than two days (48 hours)
�Optimum BIT conditions can be established by assessment of the main expected failure
modes and their activation energies, and from the analysis of the failure statistics during
BIT
�Special investigations are usually required, if one wishes to ensure that cost-effective
BIT of smaller quantities is acceptable. A cost-effective simplification can be achieved, if
BIT is applied to the complete equipment (assembly or subassembly), rather than to an
individual component, unless it is a large system fabricated of several separately testable
assemblies.
Dr. E. Suhir Page 30
Burn-ins – special type of FOAT-3
�Although there is always a possibility that some defects might escape the BIT, it is more
likely that BIT will introduce some damage to the “healthy” structure and/or might
“consume” a certain portion of the useful service life of the product: BIT not only “fights”
the infant mortality, but accelerates the very degradation process that takes place in the
actual operation conditions, unless the defectives have a much shorter lifetime than the
healthy products and have a more narrow (more “deterministic”, more “delta-like”)
probability-of-failure distribution density
�Some BIT (e.g., high electric fields for dielectric breakdown screening, mechanical
stresses below the fatigue limit) are harmless to the materials and structures under test,
and do not lead to an appreciable “consumption” of the useful lifetime (field life loss).
Others, although do not trigger any new failure mechanisms, might consume some small
portions of the device lifetime.
Dr. E. Suhir Page 31
Burn-ins – special type of FOAT-4
�When planning, conducting and evaluating the BIT results, one should make sure that
the stress applied by the BIT is high enough to weed out infant mortalities, but is low
enough not to consume a significant portion of the product’s lifetime, nor to introduce a
permanent damage
� A natural concern, associated with the BIT, is that there is always a jeopardy that BIT
might trigger some failure mechanisms that would not be possible in the actual use
conditions and/or might affect the components that should not be viewed as defective
ones.
� In lasers, the “steady-state” portion is, in effect, not a horizontal, but a slowly rising curve. In addition, wear-out failures, which are characterized by the time-dependent failure rate, occupy a significant portion of the failure-rate (bath-tub) diagram. Standard production BIT should be combined for laser devices with the long-term life testing.
Dr. E. Suhir Page 32
Wear-out failures
�For a well-designed and adequately manufactured product, the were-out failures
should occur at the late stages of operation and testing.
�If one observes that it is not the case (the steady-state portion of the “bathtub”
curve is not long enough or does not exist at all), one should revisit the design and to
choose different materials and/or different design solutions, and/or a different (more
consistent) manufacturing process, etc.
� In some electronics materials (such as BGA and PGA systems) and in some
photonics products (e.g., lasers) the wear-out part of the bathtub curve can occupy a
significant portion of the product’s lifetime, and should be carefully analyzed.
Dr. E. Suhir Page 33
What one should/could possibly do
to prevent failures-1
� Develop an in-depth understanding of the physics of possible failures. No failure
statistics, nor the most effective ways to accommodate failures (such as redundancy,
trouble-shooting, diagnostics, prognostication, health monitoring, maintenance), can
replace good understanding of the physics of failure and good (robust) physical
design
�Assess the likelihood (the probability) that the anticipated modes and mechanisms
might occur in service conditions and minimize the likelihood of a failure by selecting
the best materials and the best physical design of your design/product
�Understand and distinguish between different aspects of reliability: operational
(functional) performance, structural/mechanical reliability (caused by mechanical
loading) and environmental durability (caused by harsh environmental conditions).
Dr. E. Suhir Page 34
What one should/could possibly do
to prevent failures-2
�Distinguish between the materials and structural reliability and assess the effect of
the mechanical and environmental behavior of the materials and structures in his/her
design on the functional performance of the product
�Understand the difference between the requirements of the qualification
specifications and standards, and the actual operation conditions. In other words,
understand well the QT conditions and design the product not only that it would be
able to withstand the operation conditions on the short- and long-term basis, but also
to pass the QT
�Understand the role and importance of FOAT and conduct PM whenever and
wherever possible.
Dr. E. Suhir Page 35
Session II
4. Predictive Modeling (PM): FOAT cannot do without it
“The probability of anything happening is in inverse ratio to its desirability”John W. Hazard, American attorney-at-law
“Any equation longer than three inches is most likely wrong”
Unknown Experimental Physicist
Dr. E. Suhir Page 36
FOAT cannot do without predictive modeling (PM)
� FOAT cannot do without simple and meaningful predictive models. It is on the
basis of such models that one decides which parameter should be accelerated, how
to process the experimental data and, most importantly, how to bridge the gap
between what one “sees” as a result of the accelerated testing and what he/she will
possibly “get” in the actual operation conditions
�By considering the fundamental physics that might constrain the final design, PM
can result in significant savings of time and expense and shed additional light on the
physics of failure
�PM can be very helpful to predict reliability at conditions other than the FOAT and
can provide important information about the device performance
�Modeling can be helpful in optimizing the performance and lifetime of the device, as
well as to come up with the best compromise between reliability, cost effectiveness
and time-to-market .
Dr. E. Suhir Page 37
Requirements for a good predictive model
�A good FOAT PM does not need to reflect all the possible situations, but should be
simple, should clearly indicate what affects what in the given phenomenon or structure,
be suitable/flexible for new applications, with new environmental conditions and
technology developments, as well as for the accumulation, on its basis, the reliability
statistics.
�The scope of the model depends on the type and the amount of information available.
� A FOAT PM does not have to be comprehensive, but has to be sufficiently generic, and
should include all the major variables affecting the phenomenon (failure mode) of interest.
It should contain all the most important parameters that are needed to describe and to
characterize the phenomenon of interest, while parameters of the second order of
importance should not be included into the model.
�FOAT PM take inputs from various theoretical analyses, test data, field data, customer
requirements, qualification spec requirements, state-of-the-art in the given field,
consequences of failure for the given failure mode, etc.
Dr. E. Suhir Page 38
What the existing FOAT PMs predict
�Before one decides on a particular FOAT PM he/she should anticipates the predominant
failure mechanism in advance, and then applied the appropriate model
�The most widespread PMs identify the mean time-to-failure (MTTF) in steady-state-
conditions
�If one assumes a certain probability density function for the particular failure
mechanism, then, for a two-parametric distribution (like, e.g., the normal one) he/she
could construct this function based on the determined mean-time-to-failure and the
measured standard deviation (STD)
�For a single-parametric probability density distribution function, like an exponential one,
the knowledge of the MTTF is sufficient to determine the failure rate and to determine the
probability of failure for the given time in operation.
Dr. E. Suhir Page 39
Most widespread predictive models (PMs)
�Power law (used when the PoF is unclear),
� Boltzmann-Arrhenius equation (used when there is a belief that the elevated temperature is
the major cause of failure),
�Coffin-Manson equation (inverse power law; used particularly when there is a need to
evaluate the low cycle fatigue life-time),
�Crack growth equations (used to assess the fracture toughness of brittle materials),
�Bueche-Zhurkov and Eyring equations (used to assess the MTTF when both the high
temperature and stress are viewed as the major causes of failure),
�Peck equation (used to consider the role of the combined action of the elevated temperature
and relative humidity)
�Black equation (used to consider the roles of the elevated temperature and current density),
�Miner-Palmgren rule (used to consider the role of fatigue when the yield stress is not
exceeded),
�Creep rate equations,
�Weakest link model (used to evaluate the MTTF in extremely brittle materials with defects),
�Stress-strength interference model, which is, perhaps, the most flexible and well
substantiated model.
Example: Boltzmann-Arrhenius equation
� Boltzmann-Arrhenius equation underlies many FOAT related concepts . The MTTF, τ=tau, is proportional to an exponential function, in which the argument is a fraction, where the activation energy, Ua, eV, is in the numerator, and the product of the Boltzmann’s constant, k=8.6174××××10-5eV/ºK, and the absolute temperature, T, is in the denominator:
The equation was first obtained by L. Boltzmann in the statistical theory of gases, and then applied by the S. Arrhenius to describe the inversion of sucrose. Arrhenius paid attention to the fact that the physical processes and the chemical reactions in solid bodies are also enhanced by the absolute temperature
� Boltzmann-Arrhenius equation is applicable, when the failure mechanisms are attributed to a combination of physical and chemical processes. Since the rates of many physical processes (such as, say, solid state diffusion, many semiconductor degradation mechanisms) and chemical reactions (such as, say, battery life) are temperature dependent, it is the temperature that is the acceleration parameter.
� .
( )
−=
*0 expTTk
U aττ
Dr. E. Suhir Page 41
Boltzmann-Arrhenius Equation and the PDfR concept
�Boltzmann-Arrhenius equation addresses degradation processes and attributes
degradation and possible failures cased by degradation to elevated temperatures and
possibly to the elevated humidity as well, i.e., to the environmental factors.
�The failure rate for a system whose MTTF is given by the Boltzmann-Arrhenius
equation can be found as
�The probability of failure at the moment t of time can be found as
This formula is known as exponential formula of reliability. If the probability of failure
P is established for the given time t in operation, then the exponential formula of
reliability can be used to determine the acceptable failure rate.
( )
−−=
*
0
exp1
TTk
U a
τλ
teP λ−−= 1
Dr. E. Suhir Page 42
Coffin-Manson Equation (Inverse Power Law)-1
� Many electronic materials and especially solder joints fail primarily because of the
elevated mechanical stresses and deformations (strains). The numerous existing
empirical and semi-empirical methods and approached that address the low-cycle-fatigue
life-time of solders are, in one way or another, based on the pioneering work of Coffin and
Manson
�It has been established that materials that experience elevated stresses and strains
within the elastic range fail because of elevated stresses, whether steady-state or variable,
while the materials that experience high stresses exceeding yield stress fail primarily
because of the inelastic deformations. Such a behavior, known as low-cycle-fatigue
conditions, is typical for solder materials, including even lead-free solders whose yield
point might be substantially higher than that for tin-lead solders
�The original Coffin-Manson equation is just an inversed power law that is applicable to
highly compliant materials exhibiting significant plastic deformations prior to failure. The
inverse power law is used also in some other, physically quite different, applications, such
as MTTF in random vibration tests (Steinberg’s formula); aging in high-power lasers, etc.
Dr. E. Suhir Page 43
Coffin-Manson Equation (Inverse Power Law)-2
�The studies carried out in the 1990-s addressed primarily flip-chip tin-lead solder joint
interconnections. The today’s studies address primarily the thermal fatigue life of ball-
grid-array (BGA) and pad-grid-array (PGA) systems and especially lead-free solder joints
�The thermally-induced stresses and strains in the flip-chip solder joints are caused by
the CTE mismatch of the chip and the package substrate materials, as well as by the
temperature gradients because of the difference in temperature between the “hot” chip
and the “cold” substrate. In BGA and PGA systems the stresses and strains are caused
by the mismatch of the package structure and the PCB (“system’s substrate”)
�The numerous suggested phenomenological semi-empirical models are based on the
prediction and improving the solder material fatigue caused by the accumulated cyclic
inelastic strain in the solder material. This strain is due to the temperature fluctuations
resulting from the changes in the ambient temperature (temperature cycling) and/or from
heat dissipation in the package (power cycling).
Dr. E. Suhir Page 44
Coffin-Manson Equation (Inverse Power Law)-3
�The modified Coffin-Manson model
can be used to model crack growth in solder and other metals due to temperature
cycling. In the above formula, is the number of cycles to failure, f is the cycling
frequency, ∆T is the temperature range during a cycle, Tmax is the maximum
temperature reached in each cycle, and k is Boltzmann’s constant. Typical values for
the cycling frequency exponent α and the temperature range exponent β are around -
1/3 and 2, respectively. Reduction in the cycling frequency reduces the number of
cycles to failure. The activation energy U is around 1.25.
�In recent years a visco-plastic rate dependent constitutive model, known as Anand
model, is often used in combination with the FEA simulation to predict the solder
joint reliability. In Anand’s model (that includes one flow equation and three evolution
equations) plasticity and creep phenomena are unified and described by the same set
of flow and evolution relations.
−∆= −−
max
expkT
UTAf f
βα
f
Dr. E. Suhir Page 45
Stress-strength (“interference”) model
Dr. E. Suhir Page 20
Fig.20. Stress-strength (“Interference”) models-1
Stress (Demand) and Strength (Capacity) Distributions
Dr. E. Suhir Page 46
EXAMPLE OF A FOAT:
Physics, Modeling, Experimentation, Prediction
5.
“A theory without an experiment is dead. An experiment without a theory is blind”
Unknown Reliability Engineer
Dr. E. Suhir Page 47
Dr. E. Suhir Page 48
Dr. E. Suhir Page 49
Dr. E. Suhir Page 50
Dr. E. Suhir Page 51
Dr. E. Suhir Page 52
Dr. E. Suhir Page 53
Dr. E. Suhir Page 54
Finite_Element Analysis
(FEA) Data
Dr. E. Suhir Page 55
Predicted Stresses and Strains
in a Short Cylinder
Dr. E. Suhir Page 56
Dr. E. Suhir Page 57
Experimental bathtub curve for the solder joint
interconnections in a flip-chip multichip module
Dr. E. Suhir Page 58
Probability of failure of the solder joint interconnections
vs. failure rate
Dr. E. Suhir Page 59
Dr. E. Suhir Page 118
© 2009
Thank youfor taking my course