Cantilever Beam End-to-End UQ Test Problem
and Evaluation Criteria for
UQ Methods Performance Assessment
Vicente Romero, Ben Schroeder, Matt Glickman,
Justin Winokur
Sandia National Laboratories
Albuquerque, NM
ASME Verification & Validation Symposium
May 3-5, 2017
Sandia is a multi-program laboratory managed and operated by Sandia Corporation, a
wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of
Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Sandia National Laboratories document SAND2017-4592 C (unlimited release).
Introduction
• This talk outlines an End-to-End UQ model problem
– a “test” problem--physics and uncertainty Truth Models will
be released in a year or two for self-checks, workshops?
– has many features of real problems that make it useful for
significant assessment of UQ and Credibility approaches
and their potential practicality and effectiveness on real
problems
– This talk just outlines the problem--no approaches or results
• Performance Metrics are introduced that can help
quantify UQ method reliability, conservatism, and
risk over thousands of realizations of incomplete
uncertainty info. and how the method handle this.
– Opportunity to benchmark and compare methods
using specified performance metrics 2
• Model and
Experiment
inputs
& results
• corrections
and/or
uncertain-
ties
• Model
Validation
• Model
Calibration
• Model
Condition-
ing
• probabil-
istic v.
non-prob.
• random v.
systematic
• aleatory v.
epistemic
• traveling
v. non-
traveling
UQ Elements in End-to-End Analysis
• design
margins
• safety
margins
• model
acceptability
Uncertainty
Characterization
and Representation
Uncertainty
“Roll Up”
(any or all of:)
uncer.
represent.
model
prediction-bias
uncertainty
solution bias
uncertainty
estimation
propagation extrapolation aggregationperformance
margin UQ
• intervals
• distribu-
tions
• Pboxes
• random
fields
• discrete
propagated
uncertainty
+
extrapolated
uncertainty
+
solution error
uncer.
Model Input
uncertainties:
• continuous
• discrete
• probabilistic
• non-
probabilistic
Calculation
or Solution
Verification
Margin
Assessment
Sensitivity
Analysis
UQ Based
Resource Allocation
Uncertainty
Quantification
uncer.
categoriz.
3
4
Motivation:
A Relatively Simple End-to-End UQ Problem
(generic, not the Beam Problem)
Several more Layers and
Significant Sources of
Uncertainty usually involved
Experim.
Data,
scalar
input
Calibrated
sub-model is
state-var.
dependent
Uncertain
Function
Experim.
Data,
scalar
input
Scalar response outputs
(functional outputs much
more difficult to
effectively compare)
Exper. Data
Set B
Model in
validation
conditions
Propagate
uncertainty
realizations
through model
Model
Predictions
0.00
0.25
0.50
0.75
1.00
Cu
m P
rob
2e+15 4e+15 6e+15 8e+15 1e+16 1.2e+16
E# Al Box n fluenceExperim.
results
Model Validation
Val. Info.
QMU analysis
where
response
threshold is
specified
for system
safety or
performance
goal
New Uncertainties
specific to app. conditions
Predictions at
app. conditions
Exper. Data
Set A
• Stochastic system, e.g. randomly varying geoms., mtl. props.
5
Sandia is working on identifying, evaluating, and developingpractical and effective procedures for:
A) Integration of experimental information into models and predictions Experiment design and UQ analysis of experimental results Parameter Estimation, Model Calibration Model Validation
B) Extrapolation with UQ
C) QMU—Quantification of Margins and Uncertainty
Evaluate the most promising candidate methods on several engineering test problems relevant to our application space
Cantilever Beam E2E UQ Test Problem is a simplified end-to-end UQ problem we’re currently working
Ultimate Goal: Systems Engineering of VVUQ in Experimental and M&S Workflows
Experiments → Model Development/Calibration → Validation → Extrapolation → QMU
Establish Credibility of:
• Simple prototype problem for stochastic physical systems with scalar
inputs and outputs
– population of beams with small random variations in geometry (Length,
Width, Height) and material property (E)
• Random and Systematic components of measurement uncertainty on
inputs and outputs of tests to characterize Deflection variability
• Probabilistic and Interval uncertainties
• Load/BC control errors (vary from test to test about target load P at
which want to determine Deflection variability)
• Very few experimental replicate tests (sparse variability data;
substantial aleatory and epistemic types of uncertainty)
• Model Discretization related solution error/uncertainty
• Model Calibration, Validation, Extrapolative prediction, QMU
Cantilever Beam E2E UQ Test Problem
6
A.1
• 4 beam tested at random from population
• Loads vary about target Po
• beams vary in geometry and material properties
• no measurement errors
• Use data from 4 tests to estimate or bound:
• Deflection variability for whole population
• Probability that defl. > Dcrit.
A.2
• same as A.1 but random and systematic
errors exist on measurements of output
deflection
• Errors are unknown but consistent with supplied
probabilistic and interval random and systematic
uncertainty characterizations
Problem Part A:
Experimental Data UQ and QMU
7
A.3
• Add random and systematic errors on
measurements of input loads in the 4 tests
• Errors are unknown but consistent with
supplied probabilistic and interval random
and systematic uncertainty characterizations
• Isolate deflection variability due to beam
variations only (no load variations) and
predict for loads Po, 0.9Po, 1.1Po at
which no experimental deflections exist
• 5th test is supplied to yield load-deflection
relationship information
• potentially use to address isolation task
• and account for meas. errors on input loads
Part A: Experimental Data UQ and QMU
8
Scenario 1 – beam dimensions can be varied in calibration
model
• Geom. UQ Case A – the 4 beams’ dimensions not
measured but are controlled to within stated
manufacturing tolerances
• Geom. UQ Case B – the 4 beams’ dimensions
individually measured but subject to specified
measurement uncertainties
• Predict population deflection variability and exceedance
probability for loads Po, 0.9Po, 1.1Po.
• Model discretization error/uncertainty
• Response-surface surrogate model error/uncertainty.
• Compare to estimates using experimental data only
Part B: Model Introduced—Calibrate and
Estimate Deflection Variability
and Exceedance Probability
9
Scenario 2 – beam dimensions beam dimensions cannot
be varied in calibration model
• Geom. UQ Case A – the 4 beams’ dimensions not
measured but are controlled to within stated
manufacturing tolerances
• Geom. UQ Case B – the 4 beams’ dimensions
individually measured but subject to specified
measurement uncertainties
• What strategy is used to set beam dimensions in model?
What are implications and impacts?
• Predict population deflection variability and exceedance
probability for loads Po, 0.9Po, 1.1Po.
• Compare to B.1 results and use of experimental data only.
Part B: Model and Calibration to
Estimate Deflection Variability
and Exceedance Probability
10
Validation Configuration A
• Validation Beams have different dimensions than calibration beams
• Different loading (uniform distributed load)
• Same temperature
• Random and systematic probabilistic and interval measurement
uncertainties in the validation experiments
• Model discretization error/uncertainty
• Response-surface surrogate model error/uncertainty
• Predict deflection and compare to results of 1 test
• May elect to adjust model based on comparison results
Part C: Model Validation, Potential associated
Adjustment of Prediction Model, and
Extrapolative Prediction and Analysis
11
Validation Configuration B
• Much higher temperature—possible material strength effects
• Validation Beams have different dimensions than calibration beams
• Point end-load Po like in calibration setting
• Random and systematic probabilistic and interval measurement
uncertainties in the validation experiments
• Model discretization error/uncertainty
• Response-surface surrogate model error/uncertainty
• Predict deflection and compare to results of 2 replicate tests
• May elect to adjust model based on comparison results
Part C: Model Validation, Potential associated
Adjustment of Prediction Model, and
Extrapolative Prediction and Analysis
12
Model Validation Relevance and Extrapolative Prediction and QMU
• Validation Beams have same geometry/dimensions as beams that
want to make post-validation predictions for
• How is the model’s predictive ability characterized from the two
validation activities?
• What about after the model is adjusted, if adjusted (based on
the validation results)?
• What are the caveats, range of use conditions, uncertainty,
“confidence”, credibility” etc. of the model and/or model predictions
for end-load Po and for uniform loading at the validation conditions?
• What about for a specified higher temperature, T_hot?
• Operational Restriction: determine the “safe” temperature for 99.9%
reliability of end-deflection not exceeding Dcrit for end-load Po
Part C: Model Validation, Potential associated
Adjustment of Prediction Model, and
Extrapolative Prediction and Analysis
13
Discussion
• Plan to introduce Beam Problem to ASME, AIAA,
Soc. Auto. Engrs., maybe SIAM
• Physics and Uncertainty “Truth Models” for the
Cantilever Beam E2E UQ problem will be released in
a year or two
– after the UQ community has a chance to try the problem
without knowledge of the truth models.
• Robustness, Efficiency, and Accuracy of UQ methods
will be distributional quantities that must be
characterized over thousands of realizations of
complex system dynamics involving sparse data and
errors/uncertainties in measurements, model forms,
discretization, surrogate models, and UQ strategies,
methods, and implementations. 14
Discussion
• It will be a major effort to characterize and
differentiate the performance and practicality of E2E
UQ and Credibility strategies and methods
• Some progress on performance characterization
measures and metrics already made in recent
sparse-data UQ work
• Next two slides define metrics that help quantify
performance tradeoffs between UQ method reliability,
conservatism, and risk over thousands of random
trials of application of the methods
15
• Winokur & Romero (ASCE/ASME J. UQ/Risk, 2017 in review):
– all error types usually not equally bad; preference weighting via denom.
– larger avg. magnitude of given error type drives numerator and metric up
– lower metric value = better performance
– 𝑃𝑖𝑗 is the proportion of
error type ij,
𝑤𝑖 is a relative preference for error type ij
16
Weighted Performance Metric considers
Proportions and Magnitudes of Overshoot and
Undershoot Errors on estimates of e.g.
95% Central Range of Response
case 𝒘++ 𝒘−− 𝒘+− 𝒘−+ Comments
A 1.00 0.00 0.00 0.00 2-sided coverage: bound/encompass both upper and lower percentiles
B 0.50 0.00 0.00 0.50 1-sided: upper bound on upper percentile
𝜖𝐿(+; 𝑜𝑣𝑒𝑟𝑠ℎ𝑜𝑜𝑡)
𝜖𝐿(+; 𝑜𝑣𝑒𝑟𝑠ℎ𝑜𝑜𝑡)
𝜖𝑈(−; 𝑠ℎ𝑜𝑟𝑡𝑓𝑎𝑙𝑙)
𝜖𝑈(+; 𝑜𝑣𝑒𝑟𝑠ℎ𝑜𝑜𝑡)
𝜖𝑈 = %𝑖𝑙𝑒𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒97.5 −%𝑖𝑙𝑒𝑒𝑥𝑎𝑐𝑡
97.5
𝜖𝐿 = %𝑖𝑙𝑒𝑒𝑥𝑎𝑐𝑡2.5 −%𝑖𝑙𝑒𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒
2.5
• Romero, Bonney, et al. (2017):
– all error types usually not equally bad; preference weighting via penalty
factor in numerator
– larger avg. magnitude of given error type drives numerator and metric up
– larger proportion of + (overshoot) errors in denominator drives metric
down
– lower metric value = better performance
17
Performance Metric for Estimation of
Exceedance Probability
EP peformance metric = [σ𝑵+∆𝒍𝒐𝒈 + 𝟏𝟎σ𝑵− |∆𝒍𝒐𝒈| ]/𝑵+
EP peformance metric = [σ𝑵+∆𝒍𝒐𝒈 + σ𝑵− |∆𝒍𝒐𝒈| ]/𝑵+
Error metric = ∆𝒍𝒐𝒈 = 𝒍𝒐𝒈 𝑬𝑷_𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅 − 𝒍𝒐𝒈 𝑬𝑷_𝒕𝒓𝒖𝒆
unpenalized
10X penaltyfor negative errors (under-estimation of exceedance probability)
(#of orders of magnitude that the predicted probability is off by)
Summary
• We need model E2E UQ problems with known answers
to get an idea how well methods and strategies may be
working on real problems we are applying them to.
• It will be a major effort to characterize and differentiate
the performance and practicality of E2E UQ and
Credibility paradigms, strategies, and methods.
• The Beam E2E UQ problem has many features of real
problems that make it useful for significant assessment of
UQ and Credibility approaches and their potential
practicality and effectiveness on real problems.
• Performance metrics are available to help quantify
tradeoffs between UQ method reliability, conservatism,
and risk over thousands of random trials. 18
Contact: [email protected]
Top Related