PDF of all Presentations
Transcript of PDF of all Presentations
2 5 th A nniv e rsa ry S y m po sium Promise of a Discipline:
Reliability & Risk in Theory and Practice
Dr. A li Mosleh Nicole J. Kim Eminent Professor of Reliab ilit y Engineering
W ELCOME & OPENING REMARKS
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. George Dieter Professor Emeritus Glenn L. Mart in Inst itute Professor of Engineering
HISTORY OF THE RELIABILITY ENGINEERING PROGRAM
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. B. Balachandran Minta Mart in Professor & Chair, Department of Mechanical Engineering
RELIABILITY ENGINEERING IN THE DEPARTMENT OF MECHANICAL ENGINEERING
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. Elias L. Anagnostou, Northrop Grumman Dr. Aris Christou, Universit y of Maryland Dr. Antoine B. Rauzy, Cent rale-Supélec Dr. Carol Smidt s, The Ohio State Universit y
Moderator: Dr. A li Mosleh
Front iers of Reliab ilit y Engineering
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
PANEL 1
Frontiers of Reliability Engineering
Panel 1 25th Anniversary Symposium
Promise of a Discipline: Reliability and Risk in Theory and Practice
University of Maryland College Park April 2, 2014
Frontiers… • Integrated Probabilistic Simulation (for design and operational phases) • Probabilistic Physics of Failure • X-Ware Systems Reliability
– Hardware/Software/Human – Interface failures – Soft Casual Models
• Hybrid Methods • Advanced Inference Methods (doing more with less) • New Modeling Languages • Model-Based System Health Management • Model-Based System Engineering • HAL-9000 • Resilience Engineering
Reliability and Risk in
Theory and Practice
University of Maryland
April 2, 2014
Elias Anagnostou Engineering Fellow, Research and Technology
Panel 1: Frontiers of Reliability
Engineering
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Need For Risk-Based Fleet Management
• Issues
– Uncertainties in legacy approach force conservatisms
– Austere budgets now drive the need to extract all remaining capabilities while minimizing risks
– Pervasive objectives to increase readiness and lower life cycle costs necessitate a change in the current paradigm
– New vehicle requirements for reduced weight and longer life dictate a need for high-fidelity methods to manage risk
• Approach
– Advanced modeling and simulation tools that link materials-design-manufacturing-sustainment (Digital Thread)
– Virtual representation of a system as an integrated system of data, models, and analysis tools applied over the entire life cycle on a tail-number unique basis (Digital Twin)
– Concurrent Uncertainty Management across the material system life cycle
• Enables improved reliability, affordability and maintainability with an overall goal to reduce total ownership costs
– Sustainment - Approximately 40% more life can be extracted without structural modifications (DARPA/Navy Structural Integrity Prognosis System (SIPS) demonstrated results)
2 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
De
fect
Siz
e
Probabilistic Predictions Updated By Imperfect
Sensor Evidence
Anticipated Usage
Actual Usage
Update With Sensor Data
DARPA/Navy Structural Integrity Prognosis System (SIPS)
• Prognosis system to manage uncertainty and provide actionable information for risk-informed fleet management – Increase asset availability and reduce cost w/o increasing risk
Approach
Develop the underlying critical technologies that enable prognosis and the demonstration of these in an integrated PROGNOSIS system:
– Physics-based modeling that captures interactions between structural damage drivers and material failure mechanisms
– Sensors that measure critical vehicle and materials parameters
– Reasoning and predictive modules that accept, compare, interpret and correlate the data from the sensors and models to provide structural reliability predictions
Reasoning &
Prediction
Physics-based
Models
Sensor Systems
Software System
OUTPUT: Current and future state probabilities
3 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
SIPS Program Organization
Prognosis Program Program Manager - Madsen
Principal Scientist - Papazian
Materials & Modeling
Anagnostou
Sensor Systems Silberstein
Reasoning & Predictions
Engel
System Architecture
Teng
Demonstrations Anagnostou
Engel
An integrated team of ≈ 75 engineers, scientists, professors and graduate students
4
Structures Material Science Manufacturing
Computer Science Experimentalists Info Management
Mathematics
Sensor Science Chemistry
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
SIPS Uncertainty Management
5
Current State
Physics-Based Models
Combines all the available information while accounting for their respective uncertainties
Model uncertainty
Usage uncertainty
False and missed indications
Assessment uncertainty
Repair effectiveness
As-manufactured
state
Material properties Environment
Maintenance
induced damage Missing &
corrupted data
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
SIPS Research Progression to Flight Demonstration
• Fixed-Wing Structures Application
6 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Multi-scale Environment
Row of fastener holes
Hole #14
5.7 mm 100 µm
1 mm
Microstructurally small cracks
7 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Microstructural Origins of Fatigue (7075)
1 mm
Microstructurally small cracks
8 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Failure Progression From Initial State to Failure
Multiple cracked particles cause multiple micro-structurally small cracks. Some arrest, some grow then link together and form the dominant crack that leads to failure.
* Typical images from multiple samples 9 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Physics-Based Models for Fatigue Life Prognosis
10
Ntotal = Nincubation + Nnucleation+ Nsmall crack + Nlarge crack
FASTRAN/UniGrow
FASTRAN (& Crack Coalescence) (Models for small crack growth & link-up)
Multi-Stage Fatigue/
VPS-MICRO (Models for nucleation
& small crack growth)
Geometric Approach (Models for incubation & nucleation stages by
coupling experimental observations with micro-
mechanical crystal plasticity simulations)
incinc
P
NC
2max
/
TH
MSC PSC
daG CTD CTD
dN
0.5625i pa D
Initial crack size
0.5625i pa D
Initial crack size
HCF loading dominated LCF loading dominated
2
max
00 2ˆ
p
I
ut
IIGS
GSCa
n
S
U
GS
GSCfCTD
HCF loading dominated LCF loading dominated
2
max
00 2ˆ
p
I
ut
IIGS
GSCa
n
S
U
GS
GSCfCTD
250 mm
a b
c
0 Cycle: 1 Cycle:
(a)
100 Cycle:
(b)
3000 Cycle:
(c) 10 mm
Loading Direction
7075-T651
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
FcSSK
KC
dN
dc
xxmaxeff
K
K
N
effi
c
max
i
2
1
11
Multiscale Fatigue Modeling Environment
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Multi-Scale Modeling: 3D Digital Materials
• Statistical Characterization of Material
• Digital Replication of microstructure
Two program materials: 7075-T651 & 7050-T7451, and seven 7075-T651 legacy wing panel materials
12 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Investigation of Damage Mechanics
• Experimental methods to characterize damage evolution
• Calibrate fatigue models at various length scales/damage mechanisms
13 Three specimens tested, ~1000 particles monitored per test
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Framework Validation
• Experimental characterization of damage evolution • Validation of probabilistic framework
14 Total of 35 Specimens tested, 5668 cracks measured Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Model Integration
• Captures critical microstructurally-sensitive damage mechanisms
• Captures probability of occurrence of life-limiting fatigue mechanisms
• Produces naturally-occurring initial crack sizes for the start of small crack growth analysis to failure
• Tailored to the as-built manufactured state per aircraft tail number
Incubation Nucleated at Cycle 30
80% of 1st Cycle 1000 Cycles
Physics-Based Models for Crack Nucleation
Material Cyclic Response at the Notch
Multiaxial Methods,Neuber & Glinka-ESEB
Bulk Material mStructure Statistics
Grain OrientationParticle Aspect Ratio
Particle Size
Geometry, Material & Fatigue Loading UniGrow/FASTRAN
PredictionsSmall & Large Crack Growth
to failure
Physics-Based Initial Crack Size Distribution
Particle Size, a
P(a)P(a)
Particle Size, a
P(a)P(a)
Spectrum LoadSpectrum LoadSpectrum Load
Response Surface to Select Particles Most
Likely to Crack(Incubation Filter)
Response Surface to Select Cracked Particles Most Likely to Spawn a Crack into the Matrix
(Nucleation Filter)
Probabilistic Output
p(t)
a1
t0
De
fec
t S
ize
Flight Hours t1
Initial Crack Size Distribution
Probabilistic Predictions vs. Experiment
15 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
EA-6B Outer Wing Panel Fatigue Test Prognosis Validation
• Original life predictions – Prior flight history of panel
– Panel swap history
– Pre-test NDI
– Distribution of constituent particle sizes in 7075-T651
• Predictions modified by: – Null sensor readings, detection at sensor
threshold, crack size estimates (all accounting for sensor accuracy and uncertainty characteristics)
• Bayesian reasoning system to make a probabilistic prediction based on uncertain input data
• As the test progressed: – Significant decrease in life uncertainty
– Significant increase in predicted usable life
• Observed crack sizes validated predictions
Predictions converged to truth as test progressed
Predictions for largest crack
16 Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
SIPS P-3 Flight Demonstration
Onboard Sensors
“Workable Executable Prototype” demonstration of a combination of systems consistent with Navy fleet management practice
NDI (Omni-Scan)
Reasoning &
Prediction
Sensor Physics Based
Models
CATASTROPHIC
(1)
CRITICAL
(2)
MARGINAL
(3)
NEGLIGIBLE
(4)
FREQUENT (A)
= or > 100 / 100K
Flight Hours
1 3 7 13
PROBABLE (B)
10 - 99 / 100K
Flight Hours
2 5 9 16
OCCASIONAL (C)
1.0 - 9.9 / 100K
Flight Hours
4 6 11 18
REMOTE (D)
0.1 - 0.9 / 100K
Flight Hours
8 10 14 19
IMPROBABLE (E)
= or < 0.1 / 100K
Flight Hours
12 15 17 20
SEVERITY
HAZARD
CATEGORIZATION
FR
EQ
UE
NC
Y
CATASTROPHIC
(1)
CRITICAL
(2)
MARGINAL
(3)
NEGLIGIBLE
(4)
FREQUENT (A)
= or > 100 / 100K
Flight Hours
1 3 7 13
PROBABLE (B)
10 - 99 / 100K
Flight Hours
2 5 9 16
OCCASIONAL (C)
1.0 - 9.9 / 100K
Flight Hours
4 6 11 18
REMOTE (D)
0.1 - 0.9 / 100K
Flight Hours
8 10 14 19
IMPROBABLE (E)
= or < 0.1 / 100K
Flight Hours
12 15 17 20
SEVERITY
HAZARD
CATEGORIZATION
FR
EQ
UE
NC
Y
Cra
ck
siz
e
Flight Hours
SIPS
17
NAVAIR Chose Vehicle & Sensor System
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Results at a Critical Location
• Model predicted ~50% probability of a significant crack in a critical location
• Sensor had no indications
• Their combination reduced the probability to ~ 1%
18
The Combination of Model Predictions and Sensor Evidence Agreed With All Teardown Findings
• What Caused the Wide Disagreement Between Model and Sensor? During teardown inspections we discovered that the hole had been drilled out
We presume this was done to remove an existing crack
Based on the amount of material removed, model predicted repair would have been performed about 2001
No repair records were available, however Phased Depot Maintenance was performed 6/00 to 1/01
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Approved for Public Release, Distribution Unlimited : Northrop Grumman Aerospace Systems Case 14-0601 Dated March 2014
Reliability Physics and Engineering: Key to Transformative Research
Aris Christou, MSE and ME Department, University of Maryland; [email protected]
"Advanced manufacturing is a family of activities that (a) depend on the use and coordination of information, automation, computation, software, sensing, and networking, and/or (b) make use of cutting edge materials and emerging capabilities enabled by the physical and biological sciences, for example nanotechnology, chemistry, and biology. It involves both new ways to manufacture existing products, and the manufacture of new products emerging from new advanced technologies.” —President’s Council of Advisors on Science and Technology Report to the President on Ensuring American Leadership in Advanced Manufacturing,
Introduction and Motivation • Industry profitability and success depend on yield and reliability. • Advanced semiconductors i.e. 2D, wide bandgap systems are a key for
numerous applications that extend from communications to automotive, defense and security.
• Manufacturing of components is strongly dependent on in depth reliability studies that include physics-based approaches to complement the currently used industry techniques that are not adequate for improving the current status of technology.
• Point-like nano/microscopic defects can often be the cause of a macroscopic device to collapse
• The challenge is a physics based approach to reliability through an integration of science and engineering.
• The transformative breakthroughs will be based on reliability physics, chemistry, mathematics and engineering.
Approach • Meeting the challenge will be based on novel material and defect
characterization techniques which are necessary to locate the prevalent defects as well as their concentration and dynamics over time.
• Dimensional reduction, lower and higher voltages, and higher frequencies impact impact negatively the reliability
• In-situ and ex-situ characterization, will be necessary to satisfy the program’s objectives.
• Examples include reliability predictors such as spin, Transport-, Raman-, Noise-spectroscopy, Imaging for defects down to monolayer size.
• The types of defects existing in the fabricated devices need to be identified. Determining which of the defects is the cause of failure and which are effects of the failure is very important.
• Nanometer resolution characterization techniques considerably smaller than the apparent average separation between traps are required. Physics based simulation and experimental validation to further the fundamental understanding of the degradation mechanisms must also be undertaken.
Reliability Grand Challenges Identify and Quantify the failure mechanisms arising through smaller dimensions, high electric fields, coupled effects of heat, strain, and electric polarization, gate current, and the relatively high density of extended and point defects endemic in most semiconductors. Gain a physics based knowledge through extensive and targeted characterizations and analyses and incorporate it into the failure models which can then become the basis for the new robust manufacturing science. Establish the basis for the new methodology for reliability prediction and manufacturing science for future technologies. Take basic science all the way to manufacturing through education and research and enable a competitive industry to be realized.
PAST LESSONS FROM INNOVATIVE RELIABILITY ASSESSMENT TECHNIQUES
Reliability Assessment Fabrication
Fixture Mounting
Step-Stress Tests
Burn-in Tests
Reliability Assessment Fabrication Noise
Measurements
1 week 1 month 6 months
7 hours
7 months
CONVENTIONAL METHOD
NEW METHOD
7 months
7 hours
Bas
e N
oise
Pow
er D
ensi
ty
(A2 )
Determined a strong correlation between device reliability and baseband noise characteristics
Temperature dependence of peaks in base noise power density indicates reliability
Identified trap levels responsible for degradation from temperature dependent noise measurements
OUTCOME
A. Reina et al., Nano Letters 9, 30 (2009)
Contour map of I2D/IG
Contour map of I2D/IG: 60 points over 63.5×45 μm2
Thin graphene layers (mono/bi/tri-layer): I2D/IG > 1 Half of the graphene layers are covered with thin graphene
layers
X: 10pts, step: 7.1 µm Y: 6pts, step: 9 µm
mapping on sample
RECENT LESSONS FROM INNOVATIVE POTENTIAL RELIABILITY ASSESSMENT TECHNIQUES
Ref: H. Kim, E. Pichonat, D. Vignaud, D. Pavlidis and H. Happy; Graphene Layers Grown by RTP-CVD on Nickel and Their Properties, WOCSDICE 2012
Degradation model Physics and Math
Experimental Results for Future Semiconductor Devices
Characterization techniques
Engineering
Materials
Physics
Chemistry
New model reliably predicts degradation and allows for Robust
Manufacturing
Yes
No
Technological effects Physics and Engineering
Design Test structures
Process Science, Chemistry
Change parameters/
expand model
+
parameters for modeling Basic test structures
(Electrical Engineering)
Model fits exp. results?
“Updated” degradation model
Future Materials test
structures
Future Semiconductors: New Physics (High field effects - stress/temperature - Mechanical)
An Interdisciplinary Approach Device Physics and Electrical Engineering Mathematics and Materials Science Chemistry and Physics
Example of Carbon Nanotube Composite Interconnects Cabon
Nanotube
Aluminum crystal
structure
Future Electronic Approach: • Mathematical Simulation • Process Science Modeling of Defects
0102030405060708090
100
1 2 3 4 5 6
Wafer
Yield
(%)
Education, Research and Innovation REPRODUCIBLE
ROBUST DESIGN
C
B
Vb2ADVANCED DESIGN
CIRCUITS AND SYSTEMS Establish Material and
Device models
PHYSICAL PARAMETER -RELIABILITY CORRELATION
Disseminate Results through publications
Improve fabrication yield.
Improved robustness
Develop compact designs.
Improve performance with compact designs.
Establish correlation between physical parameters and reliability.
Outcome and Conclusions • Promote cross-disciplinary approaches across scientific disciplines i.e.
reliability physics, materials, chemistry and more in addition to engineering.
• Initiate “transformative research” with societal impact i.e. power electronics and transport, T-Rays and medicine, communications and low-power etc. which are robust and manufacturable.
• Establish new methodologies for reliability prediction and manufacturing science for future technologies.
• Provide education and research experience for future engineers in new semiconductor technologies.
Thank you for your attention
New Logic Modeling Paradigms for Complex System Reliability and Risk Analysis
Antoine Rauzy
Chair Blériot-Fabre* - Ecole Centrale de Paris Ecole Polytechnique
FRANCE [email protected]
http://www.lgi.ecp.fr/pmwiki.php/PagesPerso/ARauzy
*Sponsored by SAFRAN group
Probabilistic Risk Assessment …
… is now established on a solid scientific ground … is a mature technology … is a great tool for decision making
So, what’s next?
• More openness • Higher level modeling languages • Wider spectrum of applications
Standard Representation Formats
Issues • Models are tool-dependent • Calculations are provably
difficult so calculation engines perform unwarranted approximations
<define-fault-tree name="FT1" > <define-gate name="top" > <or> <gate name="G" /> <basic-event name="C" /> </or> </define-gate> <define-gate name="G" > <and> <basic-event name="A" /> <basic-event name="B" /> </and> </define-gate> </define-fault-tree>
The Open-PSA Standard Representation Format for Fault Trees and Event Trees Challenge/research direction:
Define standard representation formats, with all the necessary constructs, with a clear and sound semantics
Version 3 of the Open-PSA standard under redaction • Simplifications • Block Diagrams • Multi-phase Markov Chains with Rewards
New Algorithms for Model Assessment
Typical example (US plant): • ~2 500 Basic Events PSA model
What has been calculated: • ~100 000 Minimal Cutsets • 95% of the Core Damage Frequency with
less than 5% of the Basic Events, 100% with 25%
In a word, 75% of the model is “useless”!
Issues: • Finding the right level of
abstraction is difficult to achieve
Original Model
Minimal Cutsets
Simplified Model
Design Filtering Algorithms that to build simpler models that are equivalent w.r.t. to observation means
Categories of Models
Challenge/research direction: Many possibly very different models are undistinguishable by observation means, i.e. results of virtual experiments (typically, calculation of failure scenarios). They are equivalent in the Turing test sense. Equivalent models form a category. Design mathematical concepts, algorithms and tools to determine the most representative (simplest?) model of a category.
MCS calculation
Minimal Cutsets
Original Models
Representative Model
?
High Level Modeling Languages
Issues: • Completeness of specifications
with respect to safety concerns • Distance between system
specifications and safety models • Integration with other system
engineering disciplines
System Specification Fault Trees
class component state Boolean working (init = true); event failure (delay = exponential(lambda)); transition failure: working -> working := false; end
AltaRica
• Formal • Event-Based • Textual & graphical • Multiple assessment tools
Calculations
Automated Generation
AltaRica features
AltaRica Mathematical Framework
domain componentState { STANDBY, WORKING, FAILED} class spareComponent componentState s (init = WORKING); Boolean demanded (reset = false); event turnOn (delay = 0, expectaction = 0.98), failureOnDemand (delay = 0, expectation = 0.02), turnOff (delay = 0), failure (delay = exponential(0.001)), repair (delay = exponential(0.1)); transition turnOn: s==STANDBY and demanded -> s := WORKING; failureOnDemand: s==STANDBY and demanded -> s := FAILED; turnOff: s==WORKING and not demanded -> s := STANDBY ; failure: s==WORKING -> s := FAILED; repair: s==FAILED -> s := STANDBY; end
s=WORKING
s=FAILED
s=STANDBY
failure not demanded? turnOff
demanded? turnOn
demanded? failureOnDemand
repair
Well founded generalization of: • Fault Trees, Blocks Diagrams • Markov chains, Stochastic Petri Nets
Guarded Transition Systems:
The AltaRica 3.0 Project
class Pump … end
AltaRica 3.0
compilation to Fault Trees
generation of sequences
Libraries patterns
Guarded Transition Systems
model checking Probabilité de l'ER
0.0000 2000.0000 4000.0000 6000.0000 8000.0000
2.0000e-1
3.0000e-1
4.0000e-1
5.0000e-1
6.0000e-1
7.0000e-1
8.0000e-1
9.0000e-1
1.0000e+0
Pr[STop event]
stochastic simulation reliability allocation
Reliability Data
SysML
AADL
FMEA Petri Nets
Dynamic FaultTrees
GUI for modeling GUI for simulation Version & Configuration
Management System
compilation to Markov Chains
Performances Assessment
Issues: • The business model of industry is moving from selling products to selling capacities • Companies have to take commitments and to do so to assess performances of
systems in presence of hazards.
= PRA languages and tools are well suited to assess capacities (it mainly suffices to assess mathematical expectations rather than probabilities)
Carol Smidts Department of Mechanical and Aerospace Engineering
The Ohio State University [email protected]
To be presented at the Reliability Engineering 25th Anniversary Symposium
April 2, 2014 University of Maryland, College Park
CHARACTERISTICS IN CONTRAST MORE RECENT • First hardware reliability paper appears in 1952 in Proceedings of the Institute of
Radio Engineers. • First software reliability paper appears in 1975 in IEEE Transactions on Software
Engineering. MORE COMPLEX • The complexity of typical hardware systems is several hundreds of components
(e.g., nuclear power plants). • The complexity of current software systems is millions of lines of source code
(e.g., 15 millions for the Linux kernel). Assuming a typical function consists of 200 lines of code, there are approximately 75,000 functions in the Linux kernel.
CHARACTERISTICS IN CONTRAST EVOLVES EXTREMELY FAST • The number of important programming languages introduced per decade is
approximately 10. This number has been constant since 1950.
0
2
4
6
8
10
12
14
16
1950 1960 1970 1980 1990 2000
Number of Important Programming Languages Emerged in each Decade
CHARACTERISTICS IN CONTRAST EVOLVES EXTREMELY FAST (Cont’d) • Programming paradigms have changed from non-structured to structured,
procedural to object-oriented. • Six main paradigms currently coexist: imperative, declarative, functional, object-
oriented, logic and symbolic.
ALWAYS TIED TO HARDWARE • Software does not run in isolation • Software is tied to a computer platform • As such failures are never observed in isolation • This has led some to not want software to be modeled at all
CHARACTERISTICS IN CONTRAST DIFFERENT FAILURE MODE • Hardware:
• Hardware wears out leading to degraded performance • Failures are triggered due to harsh environment like excess heat and radiation
• Software: • Software does not wear out • Failures are due to latent faults that are triggered and propagate into failures
HIGHLY DEPENDENT UPON ITS ENVIRONMENT • Software is particularly sensitive to the environment CONTINUITY ASSUMPTION ONLY VALID WITHIN THE CONFINES OF A
LARGE NUMBER OF SMALL SUBDOMAINS • Predicates create non continuous behavior in program logic. • The typical ratio of predicates over lines of code is at the magnitude of 1/10. ONE OF A KIND • Data is difficult to collect
CURRENT AREAS OF RESEARCH
Embedded (71%)
Web (14%)
Service (14%)
SRGM
Reliability / Test / Cost
Measures
Architecture (50%) Modeling (50%)
Other
Domain Characteristics
Dependable Systems
Based on a review of papers published between 2008-2013 in the Proceedings of the International Symposium on Software Reliability Engineering (ISSRE) [excludes 2012]
AREAS OF RESEARCH EXAMPLES: OP DEFINITION
Environment
Computer (Hardware)
Software Software Software
Institutions/Customers
Factory Power Plant Bank School Corporation
Computer (1)
Computer (n)
Network
Computer (2)
Extract from: Smidts, C., Mutha, C., Rodríguez, M., & Gerber, M. J. (2014). Software testing with an operational profile: OP definition. ACM Computing Surveys (CSUR), 46(3), 39.
AREAS OF RESEARCH EXAMPLES: OP DEFINITION
Critical Operations Considers
0..*
Abstraction Level
Field-of-Interest
0..1 considers
Executive Scope Profile
Component Level System Level
Profile
Inputs
Structure
External Error Input Data
Values Variable Name
Data Types
Input Data Constraints
Source Code
Application OP
Requires 0..1
1 1..*
0..1 uses
1
derived from mapping
mapping modifies
1..* 1
1 1
1..* adds dimension
changes
extends 0...*
0..1
1 input 1 1 0..*
Context
1
1 Executable: Y/N LPhase: Early/Later ToolSupp: Y/N
OP
Single , 68.5
Multiple, 31.6
Profile
Tree , 30
State, 50
Set, 20
Structure
Extract from Smidts, C., Mutha, C., Rodríguez, M., & Gerber, M. J. (2014). Software testing with an operational profile: OP definition. ACM Computing Surveys (CSUR), 46(3), 39.
HW , 13.1
SW , 8.7
Human, 43.5
Unspecified, 34.8
Originator
Aware, 21.1
Unaware, 79
Critical Operations
Aware, 15.8
Unaware, 84.3
Executive Scope
Aware, 10.6
Unaware, 89.5
External Error
Auto , 57.9
Non-auto, 42.2
Tool Support
Component, 15.8
System, 84.3
Abstraction Level
Early , 84.3
Late, 10.6
Unspecified, 5.3 Lifecycle phase
AREAS OF RESEARCH EXAMPLES: SOFTWARE AND HARDWARE RELIABILITY
ALU maps, showing usage and probability profiles. (a) Usage in terms of number of demands. (b) Delay probability profile. (c) Different-Function probability profile. (d) Stuck-at probability profile. (e) Combined failure probability profile.
Extracted from: Bing H.; Rodriguez, M.; Ming Li; Bernstein, J.B.; Smidts, C.S., "Hardware Error Likelihood Induced by the Operation of Software," Reliability, IEEE Transactions on , vol.60, no.3, pp.622,639, Sept. 2011 “© © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”
AREAS OF RESEARCH EXAMPLES: SOFTWARE MEASURES TO SOFTWARE RELIABILITY
Group Root
Metric Total Rate Rank Inaccuracy
Ratio
I
BLOC 0.4 L 5.3764 CMM 0.6 M 5.5091
CC 0.72 H 5.6927 FP 0.5 L 5.2303
RSCR 0.69 M 5.3095 SDC 0.53 M 4.3765
II
CEG 0.44 L 2.7243 CF 0.81 H 1.4662
COM 0.36 L 2.7211 DD 0.83 H 0.1853 RT 0.55 M 0.0334
III FD 0.72 H 0.7397 TC 0.68 M 0.2146
Inaccuraccy ~ Group + Strata + Group*Strata
Sum Sq Group 54.556 Strata 3.986 Group:Strata 2.424 Residuals 1.901
AREAS OF RESEARCH EXAMPLES: CHARACTERIZING SOFTWARE FAILURE
MECHANISMS
# Defect Name
1 Missing function
2 Extra function
…
F1
F3
F2
F1
F3
F2
NEW AREAS OF RESEARCH
THANK YOU
QUESTIONS?
BREAK 10 :30 a.m. – 11:0 0 a.m.
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. George Apostolakis, Nuclear Regulatory Commission Ms. Maria Korsnick, Constellat ion Energy Mr. Thomas D. W hitmeyer, NASA
Moderator: Dr. A li Mosleh
Risk-Informed Regulat ions, Oversight , and Emergency Response
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
PANEL 2
Commissioner George Apostolakis U.S. Nuclear Regulatory Commission
25th Anniversary of the Reliability Engineering Education Program
The Center for Risk and Reliability University of Maryland
April 2, 2014
Risk-Informed Regulation at the U.S. NRC
2
NRC Oversight
New Reactors
Uranium Enrichment
Power Reactors Transportation Storage
Waste Disposal
Uranium Conversion
Medical/Industrial
The Traditional Approach to Regulation (Before Risk Assessment)
• Management of uncertainty (unquantified at the time) was always a concern.
• Defense-in-depth and safety margins became embedded in the regulations (structuralist approach)
• “Defense-in-Depth is an element of the NRC’s safety philosophy that employs successive compensatory measures to prevent accidents or mitigate damage if a malfunction, accident, or naturally caused event occurs at a nuclear facility.” [Commission’s White Paper, February, 1999]
• Questions that defense in depth addresses:
What if we are wrong? Can we protect ourselves from the unknown unknowns?
3
Design Basis Accidents
• A design basis accident is a postulated accident that a facility is designed and built to withstand without exceeding the offsite exposure guidelines of the NRC’s siting regulation
• They are very unlikely events
• They protect against “unknown unknowns”
4
Technological Risk Assessment (Reactors)
• Study the system as an integrated socio-technical system
• Probabilistic Risk Assessment (PRA) supports Risk Management by answering the questions:
What can go wrong? (thousands of accident
sequences or scenarios) How likely are these scenarios? What are their consequences? Which systems and components contribute the most
to risk?
5
What Did We Learn from the Reactor Safety Study?
6
Prior Beliefs: 1. Protect against large loss-of-coolant accident (LOCA) 2. Core damage frequency (CDF) is low (about once every 100 million years, 10-8 per reactor year) 3. Consequences of accidents would be disastrous
Major Findings 1. Dominant contributors: Small LOCAs and Transients 2. CDF higher than earlier believed (best estimate: 5x10-5, once every 20,000 years; upper bound: 3x10-4 per reactor year, once every 3,333 years) 3. Consequences significantly smaller 4. Support systems and operator actions very important
Beckjord et al, Reliability Engineering and System Safety, 39 (1993) 159-170.
7
PRA Model Overview and Subsidiary Objectives
PLANT MODEL
CONTAINMENT MODEL
SITE/CONSEQUENCE MODEL
Level I Level II Level III
Results
Accident sequences leading to plant damage states
Results
Containment failure/release sequences
Results
Public health effects
PLANT MODE At-power Operation Shutdown / Transition Evolutions
SCOPE Internal Events External Events
CDF 10-4/ry
LERF 10-5/ry
QHOs
Uncertainties
PRA Policy Statement (1995)
• The use of PRA should be increased to the extent supported by the state of the art and data and in a manner that complements the defense-in-depth philosophy
• PRA should be used to reduce unnecessary conservatisms associated with current regulatory requirements
8
Risk-Informed Framework
9
Traditional “Deterministic”
Approach
• Unquantified probabilities
•Design-basis accidents •Defense in depth and
safety margins •Can impose unnecessary
regulatory burden •Incomplete
Risk-Based Approach
• Quantified probabilities
•Thousands of accident
sequences •Realistic
•Incomplete
Risk-Informed Approach
•Combination of traditional
and risk-based
approaches through a
deliberative process
10
The Deliberation
DeliberationStakeholder
Input
Assumptions,Uncertainties
and Sensitivities
TechnicalAnalysisone or more techniques
Decision Criteria
Resource and
Schedule Constraints
Other Factors
Decision & Implementation
Options
Figure 3-2 Deliberations
NUREG-2150, A Proposed Risk Management Regulatory Framework
Evolution of the NRC’s Risk-Informed Regulatory System
11
• 1980s: New or revised regulatory requirements based on PRA insights introduced
• 1990s: Risk-informed changes to a plant’s licensing basis allowed
• 2000: Change to a risk-informed reactor oversight process made
• 2004: Risk-informed alternative to comply with fire protection requirements introduced
• 2007: Regulation requiring PRAs for licensing new reactors issued
Risk-Informed Decision Making in Regulation
• Improves Safety New requirements (SBO, ATWS) Design of new reactors Focus on important systems and locations
• Makes regulatory system more rational Reduction of unnecessary burden Operating experience accounted for in
regulations Consistency in regulations
12
The Experience
13
• Successes Maintenance rule Risk-informed inservice inspection Reactor oversight process
• Challenges Fire protection Special treatment requirements Risk-informing Emergency Core Cooling System
rule
Summary • Uncertainties have always been of concern in
safety
• Traditional methods manage uncertainties through design basis accidents and conservatism
• Risk assessment provides a global view of accident sequences, quantifies uncertainties, and is more realistic
• Risk-informed regulation combines the best features of both approaches
14
Risk Informing the Commercial Nuclear Enterprise
Maria Korsnick Constellation Energy Nuclear Group, LLC
April 2, 2014
Promise of a Discipline: Reliability and Risk in Theory and in Practice
University of Maryland
2
How our Business is Risk-Informed
I. Managing Risk to the Business II. Managing the Risk of Normal Plant Operation III. Defining Extreme External Events IV. Risk-Informed Lessons for External Events V. The Path Forward
3
I. Managing Risk to the Business
Each CENG nuclear plant and the corporate office maintains a risk “Heat Map” – An easy-to-read summary of the risks associated with a
business unit – A method for communicating the risks being managed
‘Delphi Method’ for forecasting risk is used - experts come together to perform periodic assessments of Company risks – Subjective (non-analytical) probability and impact assessment
of each risk – Identifies mitigating actions
Prob
abili
ty
Rar
e 5%
M
oder
ate
50%
Ve
ry L
ikel
y 95
%
Like
ly
80%
R
emot
e 20
%
Critical Insignificant
Minor
Significant
Major
Impact
Operating Fleet Heat Map (example)
Prolonged Forced Outage Medium
High
Low
Level of Control
Top risks.
Key Staffing
Regulatory Compliance
Nuclear Risk
Corporate / Generation
Environmental
Industrial / Radiological
Fire Protection/ NFP 805
Extended Refueling Outage
Tritium
Short term
Output/ Forced outages
Post-Fukushima Response
New NRC Regulations
EPA Cooling Water Intake regulation
GSI 191
4
Cyber Security
Significant risks from site maps grouped / assigned based on significance to fleet
Flood analysis White finding (G)
5
Heat Map Risk Table (example)
Issue Risk Category Impact Probability Level of Control
Mitigation
Fukushima Response
High cost of studies, modifications, uncertainty of outcomes. Impact on emergency planning
Regulatory Major Likely Medium Active engagement with industry and NRC
EPA 316b Rule, Clean Water Act
Potential for significant modifications to intake structures at NY and MD sites
Regulatory Critical Remote Low Industry proposing alternatives to federal and state EPA
Key Staffing
High rate of retirements over next ten years, loss of expertise/talent
Corporate Significant Moderate Medium Implement Knowledge Transfer and Retention program
6
II. Managing Risk during Normal Operations Plant-specific PRAs model core damage and large early
release frequency Risk impact of scheduled maintenance, plant evolutions,
and system outages are analyzed Four risk levels used to communicate to plant staff and
set controls
Pre-established risk mitigation measures applied as
higher risk conditions are entered
GREEN ORANGE YELLOW RED
Example Plant PRA Risk
%CDF
30%
15%
9%
4%
3%
3%
2%
2%
2%
1%
Control Level to Prevent Boron Washout
Align RHR During ATWS
Align Fire Water for EDG Cooling
Manually Depressurize (Transient)
Vent PC (Local Actions including use of Port. Powerpack)
Isolate SW Header Flood in RB
Initiating Event Distribution Potential Risk Increase Factor for Key Equipment
System Percentage Contribution to CDF Key Operator Actions
Description
Respond to Control Room Fire
Control Service Water and Open Room Doors (HVAC)
Align Containment Heat Removal
Vent PC (Air or Div I AC lost)
Fi re, 42%
Flood, 6%
Div II AC, 8%
LOOP, 7%
Div I AC, 7%
Loss of 2 SWP Pumps, 5%
Lake Intake, 4%
Feedwater, 4%
MSIV, 3%
Seismic, 4%
Condenser, 2%Other, 8%
1 10 100 1000
Div I Emergency Switchgear
Div II Emergency Switchgear
Div I Emergency DC
Div I 120V Emergency AC
Div II 120V Emergency AC
2RHS A/LPCS Supp Pool Return
Div 1 600V Emergency Switchgear
125V DC Switchgear
Div 2 600V Emergency Switchgear
2RHS B/RHS C Supp Pool Return
0.0%1.0%2.0%3.0%4.0%5.0%6.0%7.0%
Colors correspond to the associated System Health Report status as of 4th quarter in 2013
Risk Thresholds>x30>x15>x3≤ x3
Hypothetical PRA Risk Planetary Charts
Plant 3
Plant 1 Plant 2
Plant 4
Every Plant is Unique – design, internal / external events
Risk insights are gained by comparing plant risk profiles Physical
Modifications Protective Barriers Procedures Operator Response
Times Maintenance
Practices Housekeeping
9
III. Defining Extreme External Events Original plant design for external events (security, seismic, flood, fire)
based on regulations and best state of knowledge of risk at time of licensing
Industry understanding of risk has been highly dynamic – 1975 Browns Ferry fire – 2001 terrorist attacks – 2011 Japan earthquake and tsunami (Fukushima)
Evolving risk insights from new data creates constant “churn” in design and operation of our plants – Fire: industrial fire code - to - “Appendix R” - to - NFPA 805 – Revised design basis security threat, robust defenses, cyber – Post-Fukushima reassessment of earthquake frequency and intensity for central and
eastern US plants (NRC GSI-199) – Post-Fukushima reassessment of design basis flood/frequency
10
IV. Risk-informed Lessons for External Events
The uncertainties are real and unavoidable – Extrapolation from internal event modeling experience is not
applicable to other models – Reliance on numerical mean values is not sufficient – Data supporting rare events may have large uncertainty (e.g.,
floods)
Undue focus on numerical outcomes leads to a reduced emphasis on important insights
Adding conservatism in PRA is not an antidote, it can significantly distort sound risk-informed decision-making
11
Case in Point NFPA-805 FPRA Challenge: Deterministic PRA mentality distorts risk perspective
– Conservatisms added at every major step of the process to “bound” uncertainties
Results do not match operating experience benchmarks – Risk-significant fires over-predicted – Fires with significant spurious operations over-predicted
Outcome: Disproportionately large resources spent on model refinements and plant modifications
Significant Departure from Realism =
Ineffective Decision-Making
Conditional Core Damage Probability Conservatism
Fire Suppression Conservatism
Fire Severity Conservatism
Fire Frequencies
Conservatism
Large Conservatism in Fire PRA Building Blocks
+
+
+
+
Compounding conservatism reduces effectiveness of decision making tool
13
V. The Path Forward
Objective
Proposed Actions
Industry NRC
Gain a more complete and balanced understanding of important risk contributors
Continue development of more realistic and complete plant-specific PRAs
Move away from imbedding conservatism in PRA models - Starts with fire PRA
Clarify risk-informed decision-making process that can deal with uncertainties
Propose a practical integrated decision-making process
Adapt/adopt a practical integrated decision-making process consistent with RG 1.174
Educate decision-making stakeholders on risk-informed decision-making
Provide focused PRA training to industry staff and decision-makers
Provide focused PRA training to NRC staff and decision-makers
Develop technical resources to support better risk-informed understanding
Expand EPRI/OG commitment to training and technology
Expand training on truly risk-informed decision-making
14
Key Takeaway
PRA has added tremendous value to the Nuclear Industry allowing us to operate plants safer.
Addressing very low probability / high consequence events can be as important as addressing high probability / high consequence events.
Challenges remain with the tools: – Risk insights are masked by over conservatism or deterministic
approach – back to basics. – Uncertainty matters – what can we do to address and reduce
uncertainty?
• Launch Date Name Country Result Reason • 1960 Korabl 4 USSR (flyby) Failure Didn't reach Earth orbit • 1960 Korabl 5 USSR (flyby) Failure Didn't reach Earth orbit • 1962 Korabl 11 USSR (flyby) Failure Earth orbit only; spacecraI broke apart • 1962 Mars 1 USSR (flyby) Failure Radio Failed • 1962 Korabl 13 USSR (flyby) Failure Earth orbit only; spacecraI broke apart • 1964 Mariner 3 US (flyby) Failure Shroud failed to jeOson • 1964 Mariner 4 US (flyby) Success Returned 21 images • 1964 Zond 2 USSR (flyby) Failure Radio failed • 1969 Mars 1969A USSR Failure Launch vehicle failure • 1969 Mars 1969B USSR Failure Launch vehicle failure • 1969 Mariner 6 US (flyby) Success Returned 75 images • 1969 Mariner 7 US (flyby) Success Returned 126 images • 1971 Mariner 8 US Failure Launch failure • 1971 Kosmos 419 USSR Failure Achieved Earth orbit only • 1971 Mars 2 Orb/Lander USSR Failure Orbiter arrived, but no useful data and Lander destroyed • 1971 Mars 3 Orb/Lander USSR Success Orbiter obtained approximately 8 months of data and lander landed safely, but only 20 seconds of data • 1971 Mariner 9 US Success Returned 7,329 images • 1973 Mars 4 USSR Failure Flew past Mars • 1973 Mars 5 USSR Success Returned 60 images; only lasted 9 days • 1973 Mars 6 Orb/Lander USSR Success/Failure Occulta\on experiment produced data and Lander failure on descent • 1973 Mars 7 Lander USSR Failure Missed planet; now in solar orbit. • 1975 Viking 1 Orb/Lander US Success Located landing site for Lander and first successful landing on Mars • 1975 Viking 2 Orb/Lander US Success Returned 16,000 images and extensive atmospheric data and soil experiments • 1988 Phobos 1 Orbiter USSR Failure Lost en route to Mars • 1988 Phobos 2 Orb/Lander USSR Failure Lost near Phobos • 1992 Mars Observer US Failure Lost prior to Mars arrival • 1996 Mars Global Surveyor US Success More images than all Mars Missions • 1996 Mars 96 Russia Failure Launch vehicle failure • 1996 Mars Pathfinder US Success Technology experiment las\ng 5 \mes longer than warranty • 1998 Nozomi Japan Failure No orbit inser\on; fuel problems • 1998 Mars Climate Orbiter US Failure Lost on arrival • 1999 Mars Polar Lander US Failure Lost on arrival • 1999 Deep Space 2 Probes US Failure Lost on arrival (carried on Mars Polar Lander) • 2001 Mars Odyssey US Success High resolu\on images of Mars • 2003 Mars Express Orbiter/Beagle 2 ESA Success/Failure Orbiter imaging Mars in detail and lander lost on arrival • 2003 Mars Rover -‐ Spirit US Success Opera\ng life\me of more than 15 \mes original warranty • 2003 Mars Rover -‐ Opportunity US Success Opera\ng life\me of more than 15 \mes original warranty • 2005 Mars Reconnaissance Orbiter US Success Returned more than 26 terabits of data (more than all other Mars missions combined) • 2007 Phoenix Mars Lander US Success Returned more than 25 gigabits of data • 2011 Mars Science Laboratory US Success Exploring Mars' habitability • 2011 Phobos-‐Grunt/Yinghuo-‐1 Russia/China Failure Stranded in Earth orbit • 2013 Mangalyaan India En route On way to Mars • 2013 MAVEN US En route On way to Mars
• hip://mars.nasa.gov/programmissions/missions/log/
• The path to Mars involves closing knowledge and performance gaps in a systema\c manner: – The health threat from exposure to high-‐energy cosmic rays and other ionizing
radia\on and nega\ve effects of a prolonged low-‐gravity environment on human health, including eyesight loss.
– Human performance considera\ons related to a long-‐dura\on isolated mission in a confined habitable space.
– The inaccessibility of terrestrial medical facili\es. – Cri\cal systems, including propulsion, habita\on, and life support that are reliable,
require liile to no maintenance, and have a small mass/volume. – Long dura\on naviga\on, and opera\ons in deep space environment. – Ability for crew to operate autonomously including onboard analysis of crew and
environmental samples.
Mars 228,000,000 kilometers
ISS 400 kilometers
Today 2020’s 2030’s
• 1.5 year + crew dura\on • Crew health and performance vital to a mission • Habita\on and life support and other cri\cal systems mass/size limited and must have high reliability with limited consumable resupply
• Limited spares, systems must be reliable • No opportunity for ground valida\on of crew/ environmental samples or system failure
• Communica\on delay of up to 42 minutes • No emergency crew return • Heavy liI available to support Mars transit
• 6 month crew dura\on • Crew health and performance research in-‐work • Habita\on and life support and other cri\cal systems are large and require regular maintenance and consumable resupply
• Preposi\oned spares and regular resupply • Ground analysis of crew/environmental samples and system failures
• Near real-‐\me communica\ons • Any \me crew return • Heavy liI capability in development
Cis/trans lunar space 443,400 kilometers
Mars 228,000,000 kilometers
ISS 400 kilometers
Today 2020’s 2030’s
Mission Formula\on -‐ System Design – Technical Management – Mission Opera\ons
(2) ISS to 2024 and Cis-‐lunar are Essen\al to Turn Unknown Risk to Known Risk
• Crew Health • Human Performance • System Reliability
(3) Make Risk Informed Decisions Iden\fy Alternates – Analyze Risk – Make Informed Decisions
(1) Establish An Objec\ve Hierarchy
LUNCH 12:30 p.m. – 2:0 0 p.m.
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. W allace Loh President Universit y of Maryland
REMARKS
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. Jeong H. Kim Ent repreneur Chairman, Kisw e Mobile, Inc. Former President , Bell Lab
KEYNOTE SPEAKERS
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Mr. Ken Farquhar P res ident & G eneral Manager, S y stem s E ngineering and Miss ion S upport B us iness U nit, ManT ec h International
Dr. Hoang Pham, Rutgers Universit y Dr. Vasiliy Krivt sov, Ford Motors Dr. J. W esley Hines, Universit y of Tennessee
Moderator: Dr. Marvin Roush
Reliab ilit y Educat ion: Challenges and Potent ial of a Non-Trad it ional Engineering Discip line
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
PANEL 3
The Whereabouts of Reliability Education: Challenges & Opportunities
Hoang Pham Department of Industrial & Systems Engineering
Rutgers University
April 2, 2014
Reliability Education
Reliability is a discipline that has been studied for several decades.
Today several dozen graduate programs in the US and hundreds worldwide offer reliability courses, and some universities have entire reliability programs.
There is a gap between reliability theory and practice, between school and industry, book knowledge and real world applications.
Due to changes in technology, the expectation for a reliability engineer has been changing and getting higher.
Some Reliability books in … 1960s
Igor Bazovsky (1961), Reliability Theory and Practice D. K. Lloyd and M. Lipov (1962), Reliability Management,
Methods and Mathematics N. H. Roberts (1964), Mathematical Models in Reliability
Engineering G. H. Sandler (1964), System Reliability Engineering R. B. Barlow and F. Proschan (1965), Mathematical Theory of
Reliability
Engineering
Reliability Engineering
Reliability Programs Computer
Science
Operations Research
& Management
Statistics & Mathematics
Reliability Programs
Reliability Management
Reliability Statistics
In today’s global market, the only way to stay ahead of the competition is to provide:
Better products! Better service! Better customer experience every time!
Sample 3D TV
Boeing 787
Reliability Computing Reliability requirement: 0.999999999
“The airplane systems and associated components …must
be designed so that the occurrence of any failure condition which would prevent the continued safe flight and landing…is extremely improbable (1 per billion flights~10-9). Compliance… must be shown by analysis…”
FAA Federal Aviation Regulations 25.1309
Reliability Challenges From Theory to Practice
DATA QUALITY The Data of Everything!
Reliability Challenges From Theory to Practice PREDICTIVE MODELING * The Uncertainty in Modeling! * What Models Should Be Used?
Predictive Modeling
based on data and statistical methods
“Prediction is difficult, especially when it’s about the future!”
Operating Environments Testing
Environments
modelling
application
prediction
controlled random
Many reliability studies:
Controlled Environment ≈ Operating Environment
Systemability model
Operating Environments Testing
Environments
modelling
application
prediction
controlled random
1 Controlled environment( ) Operating environmentf
ηη
=
Reliability -- Definition
The probability that the system is still operating at time t.
where f(t) probability density function h(t) failure intensity rate.
0
( )( )( ) ( )
t
h s dsH t
t
R t f s ds e e∞ −
−∫
= = =∫
Systemability -- Definition The probability that the system is still operating
at time t subject to the uncertainty of the operating environments.
The systemability function is [Pham,2005]:
where F is a distribution function of η.
0
( )
( ) ( )
t
h s ds
sR t e dFη
η
η− ∫
= ∫
Systemability approximations using Taylor series:
2( ) 2 ( )( ) 1 ( )
2!H t H t
sR t E e H t eη µσ− − = +
( )( ) ( )H tsR t e dFη
η
η−= ∫
Loglog Distribution – Example
Assume system lifetime ~ Loglog(a,b) with failure rate
Assume η ~ gamma(α, β) System reliability function
1( ) ln 0, 1, 0bb th t b a t a t a b−= > > >
11( )
btaR t e −=
Failure rate h(t) for various values of a and b = 0.5
0 50 100 150 200 2500.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
t
h(t)
Loglog distribution
b=0.5; a=1.1b=0.5; a=1.13b=0.5; a=1.15
Loglog Dist. - Example
Systemability function
Systemability approximations
2 ( )1
btR t
a
αβ
β
= + −
( ) ( )2 1
3 2
1( ) 1
2
btb ataR t e
α
βα
β
−−
− = +
Systemability vs Systemability approximation for 1.15, b 0.05a = =
0 50 1000.82
0.84
0.86
0.88
0.9
0.92
Time
Syste
mab
ility
Fun
ctio
ns
Systemability --(alpha,beta) = (2,3)
R1R2R3
0 50 1000.75
0.8
0.85
0.9
0.95
Time
Syste
mab
ility
Fun
ctio
ns
Systemability --(alpha,beta) = (3,2)
R1R2R3
0 20 40 60 80 1000.2
0.4
0.6
0.8
1
Time
Syste
mab
ility
Fun
ctio
ns
Systemability --(alpha,beta) = (12.5,2.5)
R1R2R3
What Models Should Be Used?
(H. Pham, “A New Software Reliability Model with Vtub-Shaped Fault-Detection Rate and the Uncertainty of Operating Environments”, Optimization, vol 63, 2014:
Published online December 2013)
Model Comparison
Model m(t) Goel-Okumoto
(G-O) Delayed S-shaped
Inflection S-shaped
Yamada Imperfect debugging l
PNZ model
Pham-Zhang model
Dependent-parameter model 1
Dependent-parameter model 2
Vtub-shaped fault-detection rate model
( ) (1 )btm t a e−= −
( ) (1 (1 ) )btm t a bt e−= − +
(1 )( )1
bt
bt
a em teβ
−
−
−=
+
( ) [1 ][1 ] btm t a e a tbα α−= − − +
-bt( ) [1 ][1 ]1 e
btam t e tbα α
β− = − − + +
1( ) ( )(1 ) ( )1
bt t btbt
am t c a e e ebe
α
αβ− − −
−
= + − − − −+
( )( ) 1 ( 1)tm t t t e γα γ γ −= + + −
( ) ( )
( ) ( ) ( )
0
0
00
0
11
1 1 1
t t
t t
tm t m et
t t t e
γ
γ
γγ
α γ γ γ
− −
− −
+=
+ + + − + −
( ) 11
btm t N
a
αβ
β
= − + −
MSE: measures the deviation between the predicted values with the actual observation
Predictive ratio risk (PRR): measures the distance of model estimates from the actual data against the model estimate
Predictive power (PP): the distance of model estimates from the actual data against the actual data
Criteria For Model Selection ( )
2
1
ˆ ( )MSE
n
i ii
m t y
n l=
−=
−
∑
2
1
ˆ ( )ˆ ( )
ni i
i i
m t yPRR
m t=
−=
∑
2
1
ˆ ( )ni i
i i
m t yPP
y=
−=
∑
MSE: measures the deviation between the predicted values with the actual observation
Predictive ratio risk (PRR): measures the distance of model estimates from the actual data against the model estimate
Predictive power (PP): the distance of model estimates from the actual data against the actual data
Criteria For Model Selection ( )
2
1
ˆ ( )MSE
n
i ii
m t y
n d=
−=
−
∑
2
1
ˆ ( )ˆ ( )
ni i
i i
m t yPRR
m t=
−=
∑
2
1
ˆ ( )ni i
i i
m t yPP
y=
−=
∑
Normalized Criteria Distance (NCD) value, Dk, measures the distance of the normalized criteria from the origin for kth model where Wj denotes the weight of the criterion j for j = 1,2,…,d
2
1
1
d
kjk js
jij
i
CD w
C=
=
=
∑∑
Software System Test Data (System Software Reliability,2006)
Week index Exposure time (Cum.
system test hours) Fault Cum. fault 1 416 3 3 2 832 1 4 3 1248 0 4 4 1664 3 7 5 2080 2 9 6 2496 0 9 7 2912 1 10 8 3328 3 13 9 3744 4 17
10 4160 2 19 11 4576 4 23 12 4992 2 25 13 5408 5 30 14 5824 2 32 15 6240 4 36 16 6656 1 37 17 7072 2 39 18 7488 0 39 19 7904 0 39 20 8320 3 42 21 8736 1 43
Model Comparisons & Results Model / Criteria MSE (Rank) PRR (Rank) PP (Rank)
1. G -O Model 6.61 (7) 0.69 (1) 1.10 (7)
2. Delayed S-shaped 3.27 (5) 44.27 (8) 1.43 (8)
3. Inflection S-shaped 1.87 (2) 5.94 (5) 0.90 (4)
4. Yamada imperfect debugging model
4.98 (6) 4.30 (4) 0.81 (3)
5. PNZ model 1.99 (3) 6.83 (7) 0.96 (6)
6. Pham-Zhang model 2.12 (4) 6.79 (6) 0.95 (5)
7. Dependent-parameter model 1 43.69 (9) 601.34 (9) 4.53 (9)
8. Dependent-parameter model 2 24.79 (8) 1.14 (2) 0.73 (1)
9. Vtub-shaped fault-detection rate model
1.80 (1) 2.06 (3) 0.77 (2)
Model Comparisons & Results (cont.) Model / Criteria MSE (Rank) PRR (Rank) PP (Rank) NCD Value (Dk) Model Rank
1. G -O Model 6.61 (7) 0.69 (1) 1.10 (7) 0.115843 6
2. Delayed S-shaped 3.27 (5) 44.27 (8) 1.43 (8) 0.139264 7
3. Inflection S-shaped 1.87 (2) 5.94 (5) 0.90 (4) 0.077194 2
4. Yamada imperfect debugging model
4.98 (6) 4.30 (4) 0.81 (3) 0.086315 5
5. PNZ model 1.99 (3) 6.83 (7) 0.96 (6) 0.082414 4
6. Pham-Zhang model 2.12 (4) 6.79 (6) 0.95 (5) 0.082015 3
7. Dependent-parameter model 1
43.69 (9) 601.34 (9) 4.53 (9) 1.079700 9
8. Dependent- parameter model 2
24.79 (8) 1.14 (2) 0.73 (1) 0.278587 8
9. Vtub-shaped fault-detection rate model
1.80 (1) 2.06 (3) 0.77 (2) 0.066303 1
Reliability Opportunities: Big Data! High Tech Companies in the past 20 years!
Amazon Inc. Founded: 1994 Yahoo Founded: 1994 eBay Founded: 1995 Google Founded: 1998 Facebook, Inc. Founded: 2004 YouTube Founded: 2005 Twitter Inc. Founded: 2006
Engineering Knowledge
Reliability Programs Computer
Skill
School-Industry Projects
Statistics/ Management Skill
Knowledge That Should Be Covered in Reliability Programs
Have a Wonderful Day!
Reliability Education Opportunity: “Reliability Analysis of Field Data”
25th Anniversary of Reliability Engineering @ University of Maryland
Vasiliy Krivtsov, PhD Sr. Staff Technical Specialist Reliability & Risk Analysis
Ford Motor Company
2
Discussion Outline
Introduction
Practical Importance of Reliability Analysis of Field Data
Modelling Peculiarities in Reliability Analysis of Field Data
Staggered Production/Sales
Bivariate Models (Time & Usage)
Seasonality
Data Maturation Issues
Illustrative Case Studies
Proposed Course Structure
Conclusions
3
Practical Importance of Reliability Analysis of Field Data
Root cause analysis and future failure avoidance through
statistical engineering inferences on the failure rate trends and
factors (covariates) affecting them
Lab test calibration by equating percentiles of the failure time
distributions in the field and in the lab
Cost avoidance through early detection of field reliability
problems
Cash flow optimization through the prediction of the required
warranty reserve and/or the expected maintenance costs
Staggered Production/Sales
5
Number of failures at time unit interval j, with r0 = 0:
k
jp
pjj rd
k
jp
1j
1q
pqpj )rv(nRisk set exposed at time unit interval j :
Number of
vehicles
Time in
service
intervals
Failure time intervals
j = 1, …, k
i = 1, …, k 1 2 3 4 5 6 7 8 9 k
v1 1 r11
v2 2 r21 r22
v3 3 r31 r32 r33
v4 4 r41 r42 r43 r44
v5 5 r51 r52 r53 r54 r55
v6 6 r61 r62 r63 r64 r65 r66
v7 7 r71 r72 r73 r74 r75 r76 r77
v8 8 r81 r82 r83 r84 r85 r86 r87 r88
v9 9 r91 r92 r93 r94 r95 r96 r97 r98 r99
vk k rk1 rk2 rk3 rk4 rk5 rk6 rk7 rk8 rk9 rkk
Nonparametric Estimation
Formalized Data Structure:
j
j
jn
dh
Hazard function at the j-th failure time unit interval:
6
Numerical Example
Jan'02 Feb'02 Mar'02 Apr'02 May'02 Jun'02 Jul'02 Aug'02 Sep'02 Oct'02 Nov '02
Volume
Jan'02 10,000 1 3 6 9 15 17 20 22 41 64Feb'02 10,000 0 2 5 10 12 18 19 24 45Mar'02 10,000 1 4 5 10 14 18 20 23
Apr'02 10,000 1 2 7 11 16 17 20
May'02 10,000 0 1 6 12 17 18
Jun'02 10,000 1 3 4 9 16
Jul'02 10,000 2 3 7 11
Aug'02 10,000 1 4 6
Sep'02 10,000 1 3
Oct'02 10,000 0Nov '02 10,000
Time
t
Risk Set
n(t)
Repairs
d(t)
0 110,000 0
1 100,000 8
2 90,000 25
3 80,000 46
4 70,000 72
5 60,000 90
6 50,000 88
7 40,000 79
8 30,000 69
9 20,000 86
10 10,000 64
29592
19523
9437
69921
59849
49759
39671
110000
100000
89992
79967
0.01396
0.00956
CDF
F(t)=1-R(t)
Cum Hazard
H(t)=Sh(t)
0.99907
0.99964
0.99992
01
0.00720
0.00951
0.01387
0.02053
Repair Month
0.99804
0.99654
0.99478
0.00008
0.00036
0.00093
0.00196
0.00346
0.00522
0.00036
0.00093
0.00196
0.00347
0.00524
0.007230.00199
0.00233
0.00441
0.00678
0.99280
0.99049
0.98613
0.979470.02075
Reliability
R(t)=e{-H(t)}
Mo
nth
in
Se
rvic
e
0
0.00008
0.00028
0.00058
0.00103
0.00150
0.00177
0
Sale
s M
onth
Hazard
h(t)=d(t)/n'(t)
Risk Set (corr)
n'(t)
0.00008
Mechanical Transfuser Example: 24MIS/Unlm usage warranty plan
7 Time: t
CD
F: F
(t)
1.000E-4 12.0002.400 4.800 7.200 9.6000.000
0.020
0.004
0.008
0.012
0.016
x 8x 25
x 46
x 72
x 90
x 88
x 79
x 69
x 86
Mechanical Transfuser: Nonparametric Inferences
~1.4% failing @ 9 MIS
Concavity is an indication of an IFR. Note: F(t)≈H(t), for small F(t).
8
j
k
jp
j
q
jpqpj rvn1
1
))((Risk set exposed at time unit interval j :
Probability of mileage not exceeding the warranty mileage limit at failure time unit interval j :
Nonparametric Estimation under a Bivariate Warranty Plan
12 24 36 48 t, MIS
12,000
36,000
60,000
Mileage
j
9
Weibull Probability Plot: Mechanical Transfuser Data ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Time: t
CD
F: F
(t)
0.100 100.0001.000 10.0000.001
0.005
0.010
0.050
0.100
0.500
1.000
5.000
10.000
50.000
90.000
99.000
0.001
x 8
x 25
x 46
x 72
x 90x 88
x 79x 69
x 86x 64
0.5
0.6
0.7
0.8
0.9
1.0
1.2
1.4
1.6
2.0
3.0
4.0
6.0
Probability-W eibullCB@ 95% 2-Sided [T]
All DataW eibull-2PRRX SRM MED FMF=627/S=99373
Data PointsProbability LineTop CB-IBottom CB-I
Vasiliy KrivtsovVVK9/22/20074:51:35 PM
Mechanical Transfuser – Warranty Forecast Summary:
Failure probability @ 24MIS: 0.1364 Population Size: 110,000 Total Expected Repairs: 15,004 Cost per repair: $30 Total Expected Warranty Cost: $450,120 Year-to-date Cost: $18,810 Required Warranty Reserve: $431,310
13.64
24
10
Calendarized Forecasting
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Time: t
CD
F: F
(t)
0.100 100.0001.000 10.0000.001
0.005
0.010
0.050
0.100
0.500
1.000
5.000
10.000
50.000
90.000
99.000
0.001
x 8
x 25
x 46
x 72
x 90x 88
x 79x 69
x 86x 64
0.5
0.6
0.7
0.8
0.9
1.0
1.2
1.4
1.6
2.0
3.0
4.0
6.0
Probability-W eibullCB@ 95% 2-Sided [T]
All DataW eibull-2PRRX SRM MED FMF=627/S=99373
Data PointsProbability LineTop CB-IBottom CB-I
Vasiliy KrivtsovVVK9/22/20074:51:35 PM
13.64
24
Mechanical Transfuser – Warranty Forecast Summary:
Failure probability @ 24MIS: 0.1364 Population Size: 110,000 Total Expected Repairs: 15,004 Cost per repair: $30 Total Expected Warranty Cost: $450,120 Year-to-date Cost: $18,810 Required Warranty Reserve: $431,310
How will this total number of repairs be distributed along the calendar time, i.e. how many repairs to expect next month, the following month, etc.?
11
TimeParametric
thru
Oct'02in
Nov'02
in
Dec'02…
in
Sep'04
in
Oct'04
thru
Oct'02in
Nov'02
in
Dec'02…
in
Sep'04
in
Oct'040 0 110000 0 0 0 0 01 0.0001 100000 10000 0 0 0 6 1 0 0 0
2 0.0003 89992 10008 10000 0 0 27 3 3 0 0
3 0.0006 79967 10025 10008 0 0 49 6 6 0 0
4 0.0010 69921 10046 10025 0 0 69 10 10 0 0
5 0.0014 59849 10072 10046 0 0 84 14 14 0 0
6 0.0019 49759 10090 10072 0 0 92 19 19 0 0
7 0.0023 39671 10088 10090 0 0 93 24 24 0 0
8 0.0029 29592 10079 10088 0 0 84 29 29 0 0
9 0.0034 19523 10069 10079 0 0 66 34 34 0 0
10 0.0039 9437 10086 10069 0 0 37 40 40 0 0
11 0.0045 0 9437 10086 0 0 0 43 46 0 0
12 0.0051 0 0 9437 0 0 0 0 48 0 0
13 0.0057 0 0 0 0 0 0 0 0 0 0
14 0.0063 0 0 0 0 0 0 0 0 0 0
15 0.0069 0 0 0 0 0 0 0 0 0 0
16 0.0076 0 0 0 0 0 0 0 0 0 0
17 0.0082 0 0 0 0 0 0 0 0 0 0
18 0.0088 0 0 0 0 0 0 0 0 0 0
19 0.0094 0 0 0 0 0 0 0 0 0 0
20 0.0100 0 0 0 0 0 0 0 0 0 0
21 0.0106 0 0 0 0 0 0 0 0 0 0
22 0.0112 0 0 0 0 0 0 0 0 0 0
23 0.0118 0 0 0 10000 0 0 0 0 118 0
24 0.0124 0 0 0 10008 10000 0 0 0 124 124
609 222 272 … 242 124
Population Exposed Predicted Number of Repairs
total ->
Calendarized Forecast (generic example)
k
ji
i1iijij )tt(n)t(fd
Calendar Time
Tim
e in S
erv
ice
15,004
Time vs. Usage
13
Time or usage?
time
mileage
time
mileage
Note: DFR in time domain Note: IFR in time domain
Note
: D
FR in m
ileage d
om
ain
Note
: D
FR in m
ileage d
om
ain
Depending on variability in mileage accumulation rates of individual vehicles, the same data may result in a contradicting inference in time and mileage domains.
14
Time or usage? (Hu, Lawless & Suzuki, 1998)
Time (MIS)
H(t) ~1.1K/mo
~1K/mo
~0.8K/mo
~0.9K/mo
~0.6K/mo
Note: cum haz functions in time domain appear to be dependant on mileage accumulation, which suggests that time may be NOT the appropriate domain for this failure mode.
Mileage
H(t)
~1.1K/mo
~1K/mo
~0.8K/mo
~0.9K/mo
~0.6K/mo
Note: cum haz functions in mileage domain appear to be independent of mileage accumulation, which suggests that mileage may be the appropriate domain for this failure mode.
15
Time or usage? (Kordonsky & Gertsbakh, 1997)
Time (MIS)
f(t)
Choose the scale that provides a lower coefficient of variation of the respective failure distribution.
Mileage
f(t)
Data Maturity
17
Data Maturity: Lot Rot
t
F(t)
Jan’06
Mar’06
May’06
t0
Data Maturity Problem:
CDF estimates for a nominally homogeneous population at a fixed failure time change as a function of the observation time.
Possible cause:
“Lot Rot”, i.e., vehicle reliability degrades from sitting on the lot prior to be sold.
Various observation
times
Solution:
Stratify vehicle population by the time spent on lot (the difference between sale date and production date). t
F(t)
Jan’06
Mar’06
May’06
t0
Units with 0-10 days on lot
18
Data Maturity: Reporting Delays
t
F(t)
Jan’06
Mar’06
May’06
t0
Data Maturity Problem:
CDF estimates for a nominally homogeneous population at a fixed failure time change as a function of the observation time.
Possible cause:
The number of claims processed at each observation time is under-reported due to the lag between repair date and warranty system entry date.
Various observation
times
Solution:
Adjust* the risk set by the probability of the lag time, Wj:
t
F(t)
Jan’06
Mar’06
May’06
t0
At each observation time, risk sets adjusted to account for the under-reported claims
k
jp
1j
1q
jpqpj ))rv((n W
* J. Kalbfleisch, J. Lawless and J. Robinson, "Method for the Analysis and Prediction of Warranty Claims", Technometrics, Vol. 33, # 1, 1991, pp. 273-285.
19
Data Maturity: Warranty Expiration Rush
t
F(t)
Jan’06
Mar’06
May’06
t0
Data Maturity Problem:
CDF estimates for a nominally homogeneous population disproportionably increases as a function of the observation time and proximity to the warranty expiration time.
Possible cause:
“Soft” (non-critical) failures tend to not get reported until the customer realizes the proximity of warranty expiration date.
Solution:
Use historical data on similar components to empirically* adjust for the warranty-expiration rush phenomenon.
*B. Rai, N. Singh “Modeling and analysis of automobile warranty data in presence of bias due to customer-rush near warranty expiration limit”, Reliability Engineering & System Safety, Vol. 86, Issue 1, pp. 83-94.
tw
t
F(t)
Mar’04
May’04
t0 tw
A basis for adjustment
Development of a Successful Program in Reliability and Maintainability Engineering
Dr. Wes Hines Head, Nuclear Engineering
College of Engineering The University of Tennessee
UMD Reliability Engineering Symposium April 2, 2014
Overview • Goal
– Provide a case study that may be useful in developing new reliability programs.
• Outline – What Reliability programs do we have at UT – History of how they were developed – Components of the program – What makes them successful
3
Reliability Programs at UT • Reliability and Maintainability Center (RMC)
– University - industry association dedicated to improving industrial productivity, efficiency, safety & profitability through advanced maintenance and reliability technologies and management principles
– Industrial Center since 1996 with ~30 members
• Reliability and Maintainability Engineering Program (RME) – Interdisciplinary Academic Program
• Undergraduate Minor in RME • Graduate Certificate and/or MS in RME
– Local or Synchronous, Interactive Distance Delivery
• Prognostics, Reliability Optimization and Control Technologies (PROaCT) Laboratory – Interdisciplinary research program with professors and students in
industrial, mechanical, nuclear engineering, and statistics.
UT History in Industry Focused RME • 1988 - Preventive Maintenance Engineering Laboratory (PMEL) under
Nuclear Engineering • 1995 - Proposal to Develop College-wide Maintenance and Reliability
Center (MRC) – Industry roundtable in July – Director named in September
• 1996 - Initial Meeting with 12 Charter Members • 1997 - NSF Combined Research and Curricula Development (CRCD)
Grant to develop 4 MRE courses • 1997 - Internship Program Created • 2000 - Initial Academic Program
– Undergraduate Certificate – Graduate Certificate
• 2007 - New RME Programs Approved – Master of Science in Reliability and Maintainability Engineering – Undergraduate Minor in Reliability and Maintainability Engineering
• 2009 - MS with Specialization in Prognostics • 2010 - RME Minor most utilized minor in the COE
UT Reliability and Maintainability Center The Maintenance and Reliability Center is a university - industry association dedicated to improving industrial productivity, efficiency, safety & profitability through
advanced maintenance and reliability technologies and management principles.
* Education * Research & Technology Assessment * Information Sharing * Business Support & Alliances 50 Companies with a Desire to Improve
Components of Reliability and Maintainability Engineering Program
• Process vs. Product Focus
• Original Academic Programs
– Undergraduate Certificate with Industry Partnership
• Coursework (2 courses) • Summer Bootcamp • Internship (12 weeks)
– Graduate Certificate • 4 courses: 12 hours • Stats 560 Mathematical Statics for Reliability • NE 483 Introduction to Reliability Engineering • NE 484 Advanced Maintenance Engineering • NE 579 Advanced Monitoring and Diagnostic
Techniques
Internship Class of 2000
Internship Class of 1998
Alcoa, Bayer, Dow, DuPont, Eastman, Energizer, Fluor Global, Harley Davidson, Jacobs, Nissan, NiSource, Novelis, ORNL, Owens Corning, Redstone Arsenal,
SABIC, Schlumberger, SNL, Y-12, ….
Boot Camp Course
Internships
Maintenance Technology Teaching Labs
Real Time Interactive Distance Delivery • Supports the working class. • Courses are delivered live and interactively (i.e., synchronous
delivery) to the student's desktop computer via the World Wide Web • Taught in “Dual Delivery” format • Instructor wears wireless microphone • Local students attend class or log in from home • Distance students
– Multipoint audio communication – View slides, whiteboard, demos, etc. – Students can raise hands – Make presentations to class – Courses archived
• Content Delivery Methods – PowerPoint slides – Whiteboard – Windows application sharing – Video or audio clips
Graduate Programs in Reliability Maintenance and Engineering
• Interdisciplinary program offered by the College of Engineering through one of the following six departments:
– Chemical and Biomolecular Engineering – Electrical Engineering and Computer Science – Industrial and Systems Engineering – Materials Science and Engineering – Mechanical, Aerospace and Biomedical Engineering – Nuclear Engineering
• Offered on campus and through web-based, synchronous, interactive, distance education.
• The RME graduate certificate program (12 hours) is designed to allow the credits to be applied towards an M.S. degree.
Support and Integrate with Research Programs
Give your COE Graduates a Niche (RME Minor)
Fifteen hours of coursework are required: Hours Core courses: 6
Introduction to Maintenance Engineering Introduction to Reliability Engineering
Statistics or Math Requirement (choose 1): 3 Probability and Statistics for Scientists and Engineers (Stats 251) Probability and Statistics (Math 323) Chemical Engineering Data Analysis (ChE 301) Probability and Random Variables (ECE 313)
Electives (choose at least 2): 6 Process Dynamics and Control (ChE 360) Engineering Data Analysis and Process Improvement (IE 300) Statistical Process Control (Stats 365) (for non IE) Process Improvement through Planned Experimentation (IE 440) Signals and Systems (ECE 315) Introduction to Pattern Recognition (ECE 471) Mechanical Engineering Instrumentation and Measurement (ME 345) System Dynamics (ME 363) Nuclear and Radiological Engineering Laboratory (NE 304) __
Total: 15
• 10% of COE graduates have the RME Minor – most desired minor in COE
Summary • Garner strong industrial support
– Get their input on curriculum and laboratories – Partner through internship programs – Partner with research opportunities – Meet their needs!
• Make it available to a wide range of students – An interdepartmental college-based program reaches more
students – Increase your reach through distance education
• Build expertise to increase industrial and government research opportunities
• Explain the employment benefits to increase enrollment and promote student success (students will figure this out themselves)
Questions ?
BREAK 3:30 p.m. – 4 :0 0 p.m.
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. Darryll Pines Nariman Farvard in Professor Dean, A . James Clark School of Engineering
FUTURE AND IMPACT OF RELIABILITY ENGINEERING AT THE CLARK SCHOOL
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Special Presentation
Mpact and Future of Reliability
Engineering DARRYLL J. PINES
APRIL 2, 2014
25TH ANNIVERSARY OF CENTER ON RISK AND RELIABILITY
The Center for Risk and Reliability (CRR) was formed in 1989 as the umbrella organization for many of the risk and reliability research and development activities at the UMD Clark School of Engineering. CRR research covers a wide range of subjects involving systems and processes, and include topics on predictive reliability modeling and simulation, physics of failure fundamentals, software reliability and human reliability analysis methods, advanced probabilistic inference methods, system-level health monitoring and prognostics, risk analysis theory and applications to complex systems such as space missions, civil aviation, nuclear power plants, petro-chemical installations, medical devices, information systems, and civil infrastructures. Over 20 core and adjunct faculty from various engineering departments of the Clark School of Engineering form the pool of experts at CRR. CRR is also home to numerous research laboratories with extensive state of the art equipment and high performance computers. CRR is the research arm of the Reliability Engineering educational program, the largest and most comprehensive degree granting graduate program in the field of reliability and risk analysis of engineered systems and processes. The program offers MS, PhD, and Graduate Certificate in Reliability Engineering and Risk Analysis. All courses are available both through traditional on-campus and online delivery modes.
Center for Risk and Reliability
Current Core Faculty
Professor Neil Goldsman (ECE) Professor Carol Smidts (ME, OSU) Professor Joseph Bernstein (ECE, Israel) Adjunct Faculty and Lecturers Dr. Stuart Katzke (NIST) Dr. Nathan Siu (NRC) Dr. Norman Eisenberg (Independent Consultant) Dr. Mark Kaminiskiy (CRR-CEE) Dr. Roy Schuyler (Independent Consultant)
Affiliate and Adjunct Faculty Al-Sheikhly, Mohamad Professor Materials Science and Engineering 2309F Chemical and Nuclear Engineering Building Phone: 301-405-5214 | [email protected]
Desai, Jaydev Professor Mechanical Engineering 0160 Glenn L. Martin Hall Phone: 301-405-4427 | [email protected]
di Marzo, Marino Professor Fire Protection Engineering 3104B JM Patterson Building Phone: 301-405-5257 | [email protected]
Sandborn, Peter Professor, Director of MTECH Mechanical Engineering 2106A Glenn L. Martin Hall Phone: 301-405-3167 | [email protected]
Schmidt, Linda Associate Professor Mechanical Engineering 2104B Glenn L. Martin Hall Phone: 301-405-0417 | [email protected]
Mpact-Rankings 1. City University of Hong Kong 2. Sandia National Laboratories 3. University of Southern California 4. National University of Singapore 5. University of California Berkeley 6. Politecnico di Milano 7. University of Electronic Science & Technol... 8. University of Maryland 9. University of Manchester 10. Stanford University
Microsoft Academic Ranking Reliability Engineering (based on publications)
1. Stanford University-Management Science and Engineering, 1-2 2. Massachusetts Institute of Technology-Operations Research, 2-4 3. Georgia Institute of Technology-Main Campus-Industrial Engineering, 1-3 4. Northwestern University-Industrial Engineering and Management Sciences, 4-12 5. Carnegie Mellon University-Operations Res/Information Systems/Manufacturing and Operating Systems, 4-17 6. University of California-Berkeley-Industrial Engineering and Operations Research, 3-10 7. University of Michigan-Ann Arbor-Industrial Operations and Engineering 4-11 8. Cornell University-Operations Research 6-18 9. Carnegie Mellon University-Engineering and Public Policy 8-28 10. Purdue University-Main Campus-Industrial Engineering 6-22 11. Princeton University-Operations Research and Financial Engineering 11-29 12. University of Iowa-Industrial Engineering 11-37 13. University of Nebraska-Lincoln-Industrial and Management Systems Engineering 28-65 14. University of Wisconsin-Madison-Industrial Engineering 6-22 15. Virginia Polytechnic Institute and State University Industrial and Systems Engineering 5-28 16. University of Florida-Industrial and Systems Engineering 12-40 17. University at Buffalo-Industrial Engineering 27-53 18. University of Pennsylvania 19. Operations and Information Management 5-26 20. Arizona State University-Industrial Engineering 11-34 21. Pennsylvania State University-Main Campus-Industrial and Manufacturing Engineering 7-23 22. University of Pittsburgh-Pittsburgh Campus-Industrial Engineering 30-55
23. University of Maryland-College Park- Reliability Engineering 6-29
2010 NRC Rankings (Industrial Engineering, Operations Research, Reliability Engineering)
Stanford University 1 Massachusetts Institute of Technology 1 California Institute of Technology 3 University of California--Berkeley 3 Georgia Institute of Technology 5 University of Illinois--Urbana-Champaign 5 University of Michigan--Ann Arbor 5 Princeton University 8 Cornell University 8 Purdue University--West Lafayette 10 Carnegie Mellon University 10 University of Texas--Austin (Cockrell) 10 University of California--Los Angeles (Samueli) 13 Northwestern University (McCormick) 13 Johns Hopkins University (Whiting) 13 University of Minnesota--Twin Cities 16 University of Maryland--College Park (Clark) 17 Pennsylvania State University--University Park 17 Texas A&M University--College Station (Look) 17 Virginia Tech 17 University of California--San Diego (Jacobs) 21 University of Wisconsin--Madison 21 Rensselaer Polytechnic Institute 23 Ohio State University 23 University of Washington 23
2015 US News Mechanical Engineering Ranking
Mpact-Prestige Professional Society Fellows of Center
• Mohammed Modarres
• Fellow, American Nuclear Society • Ali Mosleh
• Fellow, Society of Risk Analysis • Bilal Ayuub
• Fellow, ASEE • Shapour Azarm
• Fellow, ASME • Greg Baecher,
• Fellow, ASCE • Arist Christou
• Fellow, ASME • Fellow, APS
Faculty Service on Leading Journals
• Editorial Boards/Associate Editors • Reliability Engineering and System
Safety Journal • Journal of Risk and Reliability. • International Journal on
Performability Engineering • International Journal of Reliability
and Safety (IJR) • SNAME’s Journal of Ship
Research, Ships and Offshore Structures Journal, Naval Engineers Journal (NEJ),
Mpact-NAE For contributions to the development of Bayesian methods and computational tools in probabilistic risk assessment and reliability engineering.
For contributions to national defense and security through improved battlefield communication. Also Inducted in May 2004 for innovative engineering and entrepreneurship in communications technologies.
For the development, explication, and implementation of probabilistic- and reliability-based approaches to geotechnical and water-resources engineering.
Mpact-Awards
1. Michel Cukier, NSF CAREER 2. Jeffrey Herrmann, Innovator of year 3. Monifa Vaughn-Cooke
Significant Junior Faculty Awards/Recognition
Mpact-Book/Monograph Contributions
Mpact-Partnerships CRR Research Partnerships • Cooperative Research Agreements with government agencies: – US NRC – US Navy /NAVAIR-NAWCAD – NASA – EC Halden Research Center, Norway – EEC Joint Research Center, Italy – ETH Center for System Safety, Switzerland – Norwegian Institute of Technology – Paul Scherrer Research Institute, Switzerland • Partnership with the industry: – ManTech – Reliability Information Analysis Center RIAC Partnership
Mpact-Education Innovations Professional Education-OAEE • Online Professional Masters Degree • Graduate Certificate
#1 Columbia University (Fu Foundation) New York, NY #2 University of California—Los Angeles (Samueli) Los Angeles, CA #3 University of Wisconsin—Madison Madison, WI #4 University of Southern California (Viterbi) Los Angeles, CA #5 Pennsylvania State University—World Campus College, PA #6 Purdue University— West Lafayette West Lafayette, IN #7 University of Michigan—Ann Arbor Ann Arbor, MI #7 Virginia Tech Blacksburg, VA #9 North Carolina State University Raleigh, NC #9 Texas A&M University—Kingsville (Dotterweich) Kingsville, TX #11 Arizona State University (Fulton) Tempe, AZ #12 Polytechnic Institute of New York University New York, NY #12 South Dakota School of Mines and Technology Rapid City, SD #14 Johns Hopkins University (Whiting) Baltimore, MD #14 University of Maryland—College Park (Clark) College Park, MD #16 California State University—Fullerton Fullerton, CA #17 Cornell University Ithaca, NY #17 Lawrence Technological University Southfield, MI #17 Missouri University of Science & Technology Rolla, MO #20 Texas Tech University (Whitacre) Lubbock, TX
since 1993 are as follows: MS – 211 PhD – 97 Per OAEE’s records, the Master of Engineering and Graduate Certificate in Engineering degrees awarded since 1994 and 2000 respectively are as follows: M. Eng. Reliability On-Campus 46 M. Eng. Reliability Online 16 Total M. Eng. 62 GCEN Reliability On-Campus 10 GCEN Reliability Online 22 Total GCEN 32
Mpact -Placement of Meng, MS and PhDs 2006 Kristine Fretz (currently with Johns Hopkins Applied Research Lab.) 2004 S. Chamberlain (Currently, ITT - Industrial Products Group Reliability Specialist and Area Manager, ITT Industries) 2003 Chi Yeh (Currently, Systems Engineering & Integration Branch, NASA, Glenn Research) 2001 F. Li (Currently, Materials Research Scientist, Corning, Inc.) 2000 F. Joglar (Currently, Manager, Fire Risk Group, SAIC) 2000 V. Krivtsov (Currently, Ford Technical Leader for Reliability & Statistical Analysis, Ford Motor Company) 1998 H. Hadavi (Energy Research Corp., Rockville, MD) 1998 J. O’Brien (Currently Director of Office of Nuclear Safety, DOE) 1998 Y. Guan (President and CEO, Advanced System Technology Management, Inc.) 1998 K. Ouliddren (Currently, Staff Researcher, Nuclear Research Centre SCK-CEN, Mol, Belgium) 1997 T. Ni (Currently, Deputy Dean, Shanghai University, China) 1997 A. Thunem (Currently, Halden Reactor Project, Norway). 1994 Y-S. Hu (Currently, Dean, Beijing Technology & Business University, and CEO of DML International Corp.) 1991 L. Hammersten (Currently, Research Analyst, MITRE Corp.) 1990 L. Chen VP at JP Morgan
Work on Grand Challenge Problems Disaster Resilience Risk and Reliability of Critical Infrastructure
Work on Grand Challenge Problems Global Public and Human Health • Risk and Reliability of Devices and System
What of the Future? New Faculty Hires in ME-Reliability Engineering: • Monifa Vaughn-Cooke • Offers out to at least 2 individuals
Facilities: • Upgrades to Virtual Reality Cave under review to
support future research thrusts
Education: • Develop MOOC Course Series in Reliability
Engineering
Some Perspectives from Dilbert
Dr. Monifa Vaughn-Cooke Assistant Professor
FACULTY VISION FOR THE FUTURE OF RELIABILTY ENGINEERING
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Dr. Mohammad Modarres Minta Mart in Professor
CLOSING REMARKS
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
ANNIVERSARY RECEPTION A N D A L U MN I R E U N IO N
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
Join us for the
5:0 0 p.m. – 7:0 0 p.m. Presentations by Dr. Marvin Roush, Professor Emeritus Frank Groen, Ph.D., ‘0 0 Tim Hajenko, M.S., ’13, Lesa Ross, Ph.D., ‘0 9 Ken LaSala, Ph.D., ‘93
THANK YOU T O O U R G E N E R O U S E V E N T S P O N S O R S
25th ANNIVERSARY RELIABILITY ENGINEERING SYMPOSIUM
ISSA Technologies, Inc.