Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural...

8
From the Proceedings of the 29th Symposium on the Interface: Computing Science and Statistics LX-L-R-97- May 14-17. 1997. Houston, TX. Structural Model Uncertainty in Stochastic Simulation Michael D. McKay and John D. Morrison Technology and Safety Assessment Division Los Alamos National Laboratory Los Alamos. NM 87545-600 Abstract Prediction uncertainty in stochastic simulation models can be described by a hierarchy of components: stochastic variability at the lowest level, input and parameter uncertainty at a higher level, and structural model uncertainty at the top. It is argued that a usual paradigm for analysis of input uncertainty is not suitable for application to structural model uncertainty. An approach more likely to produce an acceptable methodology for analyzing structural model uncertainty is one that uses characteristicsspecific to the particular family of models. 1 Introduction Investigations into how to relate model structure to prediction uncertainty have become more visible since the 1993 U.S. Nuclear Regulatory Commission workshop “Model Uncertainty: Its Characterization and Quantifica- tion” (Mosleh 1993). The proceedings of that workshop, and a paper by David Draper (1995). summarize many interesting ideas on the subject. However, the general question of how one might effectively quantify variability in model prediction due to alternative model structures remain unanswered-this paper notwithstanding. 2 Modeling Modeling proceeds from a mathematical abstraction of a conceptual notion. From the mathematical model, a software implementation is constructed and verified to agree with the abstraction. At this point. the model is a computer code which should be calibrated and validated through comparison of code prediction to data. Variability of model prediction due to uncertainty in model input values can be assessed in computer experiments. It is not apparent how one ought to evaluate uncertainty about model structure. A mathematical model m(-) is a formal statement of assumptions about a relationship between known quantities 2 and unknown quantities y. The structure of a model d e h e s how characteristics of y are determined from those of t. Structure, in this sense, is a mathematical or computational algorithm. We call 1: the model inputs, and y the model outputs. We also allow models to have internal variables z called simulation variables, that are derived from random number streams. Model parameters, as they often appear in the function form. can be absorbed into the vector 2. 3 Origins of Uncertainty Structural uncertainty comes from plausible alternative model structures. The alternatives. however. may exist only hypothetically. Input uncertainty comes from alternative but plausible input values. Although a model structure defines what input variables are needed. the values of these variables are assigned with some ambi,@y which contributes to ambi,Gty in the model’s predicted values. Finally, when there is simulation variability. it comes from sampling of random numbers within the model and is considered inherent in the model prediction. To associate this variability with the random number streams. we create the variables z. There are many methods for assessing the variability of model prediction arising from simulation variability and input uncertainty. Methodology for assessing structural uncertainty is not as mature. Deductive statistical analyses begin with questions to be answered. The questions motivate and direct how both formal and informal analysis methods are used. Examples of questions related to model uncertainty analyses follow. How do alternative input values affect model prediction? Are calculations driven by only a subset of inputs? How might strong and weak characteristics of alternative models and submodels be identified? Are there formal analysis methods for ranking alternative models? How can one identify a priori requirements for coupling models that operate at different levels of detail? How might investment dollars be best spent toward model improvement?

Transcript of Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural...

Page 1: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...

From the Proceedings of the 29th Symposium on the Interface: Computing Science and Statistics LX-L-R-97- May 14-17. 1997. Houston, TX.

Structural Model Uncertainty in Stochastic Simulation Michael D. McKay and John D. Morrison

Technology and Safety Assessment Division Los Alamos National Laboratory

Los Alamos. NM 87545-600

Abstract Prediction uncertainty in stochastic simulation models can be described by a hierarchy of components: stochastic variability at the lowest level, input and parameter uncertainty at a higher level, and structural model uncertainty at the top. It is argued that a usual paradigm for analysis of input uncertainty is not suitable for application to structural model uncertainty. An approach more likely to produce an acceptable methodology for analyzing structural model uncertainty is one that uses characteristics specific to the particular family of models.

1 Introduction Investigations into how to relate model structure to prediction uncertainty have become more visible since the 1993 U.S. Nuclear Regulatory Commission workshop “Model Uncertainty: Its Characterization and Quantifica- tion” (Mosleh 1993). The proceedings of that workshop, and a paper by David Draper (1995). summarize many interesting ideas on the subject. However, the general question of how one might effectively quantify variability in model prediction due to alternative model structures remain unanswered-this paper notwithstanding.

2 Modeling Modeling proceeds from a mathematical abstraction of a conceptual notion. From the mathematical model, a software implementation is constructed and verified to agree with the abstraction. At this point. the model is a computer code which should be calibrated and validated through comparison of code prediction to data. Variability of model prediction due to uncertainty in model input values can be assessed in computer experiments. It is not apparent how one ought to evaluate uncertainty about model structure.

A mathematical model m(-) is a formal statement of assumptions about a relationship between known quantities 2 and unknown quantities y. The structure of a model dehes how characteristics of y are determined from those of t. Structure, in this sense, is a mathematical or

computational algorithm. We call 1: the model inputs, and y the model outputs. We also allow models to have internal variables z called simulation variables, that are derived from random number streams. Model parameters, as they often appear in the function form. can be absorbed into the vector 2.

3 Origins of Uncertainty Structural uncertainty comes from plausible alternative model structures. The alternatives. however. may exist only hypothetically. Input uncertainty comes from alternative but plausible input values. Although a model structure defines what input variables are needed. the values of these variables are assigned with some ambi,@y which contributes to ambi,Gty in the model’s predicted values. Finally, when there is simulation variability. it comes from sampling of random numbers within the model and is considered inherent in the model prediction. To associate this variability with the random number streams. we create the variables z . There are many methods for assessing the variability of model prediction arising from simulation variability and input uncertainty. Methodology for assessing structural uncertainty is not as mature.

Deductive statistical analyses begin with questions to be answered. The questions motivate and direct how both formal and informal analysis methods are used. Examples of questions related to model uncertainty analyses follow.

How do alternative input values affect model prediction?

Are calculations driven by only a subset of inputs?

How might strong and weak characteristics of alternative models and submodels be identified?

Are there formal analysis methods for ranking alternative models?

How can one identify a priori requirements for coupling models that operate at different levels of detail?

How might investment dollars be best spent toward model improvement?

Page 2: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...
Page 3: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...

DISCLAIMER

This report was prepared as an account of work sponsored by an agency of the United States Government Neither the United States Government nor any agency thereof, nor any of their employees, make any warranty, express or implied, or assumes any legal liabili- ty or responsibility for the accuracy, completeness, or usefulness of any information, appa- ratus, product, or process disdosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, p m q or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessar- ily state or reflect those of the United States Government or any agency thereof.

Page 4: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...

With questions like these in mind. we take up an outlie of the development of precise mathematical statements for uncertainty analysis. A more complete discussion is given by McKay (1995).

4 Contexts for Model Uncertainty One short description underlies most methodology of input uncertainty. That description is

{t E D , t N f,} 5) y - fy .

Simply stated. the inputs t have a probability distribution (density) fz defined on their domain D. usually. a vector space on the real numbers. The model m(-) induces a probability distribution (density) fy on the output y. From a practical perspective, interpreting fz, and definins it and D are difficult tasks.

A similar description for structural uncertainty is

where

is the model-structure counterpart to the domain D of input values. Because the relationship between y and m(.), =* w e n t, is more obscure than that between y and t, given m(.). it is not apparent that the simple paradigm used to assess input uncertainty can be successfully applied to structural uncertainty. For related discussions, see Atwocd (1993). Winkler (1993). Draper (1995) and Laskey (1996).

M = { m ( x ; e ) 1 e E 01

5 A Variance-based Quantification of Input Uncertainty We assume that fy contains all available information on prediction uncertainty for a fixed model m(-). The question is how to efficiently study fy. We choose the not &que focus of the variance of the distribution for variance-based uncertainty analysis to answer the question “How much of the uncertainty (variance) of y is due to specific input subsets?”

Let there be p input variables partitioned into subsets of size s and p - s. Let the input partition be indicated by

x = IS u xP-$ .

For the partition, the variance of y can be written as the sum of three terms depending on the distributions of the inputs

+ E,* { ~ , p [ V a r , P - s p . z ( ~ I e‘. =)I } . We call the fist term in the last equation the “Variance of the Conditional Expectation” (VCE)-

VCE(y I xs) = Var,. [E,P-.~z.(y I e”)] . The second term is the “Partial Variance of the Conditional Expectation” (pVCE)-

PVCE(y I z ; 2’) =

E,. {Var,lz’ [E,P-qz.,r(Y I IS, =,I } .

F: = E,* { E+. [Varzp-’lz.,t(Y I xJ, .,] } . The last term is the “Residual Error”-

Therefore,

Var(y) = VCE(y I zs) + P V C E ( y I :;e”) +i$.

The ratio of the component of variance attributed to the input subset xs to the variance of y is called the correlation ratio (pearson 1903 and Kendall and Stuart 1979). We denote is by

= VWY I ~ s ) / V ~ ( ~ ) .

Similarly, the variance attributable to the simulation variables “adjusted for” the subset ts is given by partial correlation ratio

q:.2‘ = P V W y I 2; x”/Var(y) .

Finally. the residual variance. attributed the remaining inputs, is given by the Residual Error term.

This variance decomposition does not depend upon any linearity assumptions about the relation between e and y. However. when a linear analysis model is assumed. as in

y = x s P + e ,

the correlation ratio becomes the square of the usual correlation coefficient because the VCE becomes

v q y I tS) = P’Var(+”)p. In practice. the use of both nonparametric analyses and analyses based on linear models is almost always worthwhile.

Page 5: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...

6 Variance Component Estimation Estimation comes from the relevant sums of squares defined in the Table 1. The correlation ratio is estimated. with bias. as

for Input Subset xs We set aside discussion of how one identifies the input subset Z" in order to sketch how one might estimate the

follows McKay (1995). The procedure uses a sum-of- SSB/SST . v z s -

-3 - variance components from the last section. The approach

Squares decomposition like the familiar One from an analysis Of variance table. It should be noticed, however. that the usual linear, random effects analysis model is absent.

We perform a computer experiment in which the response data yi j result from an experimental design on the inputs t and simulation variables z . Let the sample of values be

similarly, the (multiple) comelation coefficient are =timated as

comelation ratio and square of the

and -9 p i . = SSR/SST,

respectively . An importaut characteristic of this treatment of variance

decomposition is that the variance is decomposed in a nested or conditional fashion. This approach is easily generalized for analysis of several subsets of 2. It is much like what is done in multiple regression analyses. We continue now with a small example.

source ldf

Total nk - 1

Inputs x" n - 1

Table 1 Sums of squares estimation for variance decomposition

sum of squares E(Sum of Squares)

n

i=l SSB = r (gi - g)' k ( n - l)V=[E(y Z")] + (n - 1)F:

Linear fit ( s = 1)

Lack of fit n - s - 1 SSE = SSB - SSR

Variables z n j n(k - l>E(Var[y I L'"] )

n(k - 1) ssw = (yij -vi)' = n( /c - 1)TZ i=l j = 1

3

Page 6: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...

7 Example Application The example shows how nonparametric variance-based methods work to identify important inputs. The model is a discrete event simulation of time dependent movements of various cargos by various types of military aircraft. We studied eight input variables (a small number): MOG, Max Wait. Use Rate. Euroute T i e . Offload T i e , Onload T i e . Initial Hours and Fuel Flow. The model output predictions are cumulative hours flown and tons of cargo delivered by aircraft type. Aircraft types are designated by (2-141, (2-17 and C-SA. The computer experiment was based on a form of replicated Latin hypercube sampling (LHS) (McKay, Conover and Beckman 1979. McKay and Beckman 1994). The experiment was carried out in stages, each stage based on a design of size n = 12 replicated k = 4 times. Two replicates were made of different random number streams. Total N = 12 x 4 x 2 = 96 computer runs. Each stage concludes with the selection of potentially important inputs, which are fixed at their nominal values at subsequent stages. It turned out that simulation variability was very small and completely swamped by input variability. Therefore, the model is treated as deterministic in this example.

To begin. aU inputs were sampled according to fi as a joint uniform probability distribution. The resulting base

case predictions for 6 outputs are presented as the bands in Fi,we 1. Correlation ratios were estimated for each input at each day-not independent estimates-and averaged in Table 2.

C.141BTom C-141 B HOUE

i 1 4 d * 3" 32 1.

err

case, 8 inputs vary

Table 2 Correlation ratios from base case

Output Max Use Enroute Offload Onload Initial Fuel (7) MoG Wait Rate T i e Time T i e Hours Flow

c- Avg 5j2 0.28 0.23 0.45 0.14 0.27 0.28 0.17 0.40

7 47

c- Avg5j2 0.24 0.21 0.49 0.11 0.29 0.3 1 0.17 0.42

53

C V is a critical value from normal theory under a null hypothesis of independence of z and y. It is used here only as a filter.

The average variability in tons delivered by C-141 aircraft is 1227 fiom the first column of the table. Use Rate, by itself, accounts for 45% of it from the first row of numbers. Therefore. Use Rate is selected and fixed at its nominal value for subsequent stages. Reduction in the bands of predicted values from fixing Use Rate are seen in Figure 2. ,

We selected input in the sequential manner until four inputs had been identified as driving the calculations: Use

4

Rate. Fuel Flow, Euroute T i e and MOG. We sampled the 4-dimensional domain of these inputs with a factorial design (2 times a 1/2 fraction of 2' with full 22 on Use Rate and Fuel Flow) to produce the results in Fi,m 3 for tons delivered by C-5A aircraft. Without providing interpretation, we state that the large differences between bands of predictions-due to the 4 inputs-and the small widths of the bands-due to the other 4 inputs-suggest that Use Rate. Fuel Flow, Euroute T i e and MOG well account for essentially all of the variability in the model predictions.

Page 7: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...

C.141 B Tons

$7

0 0

g $ - - 0 0 - 8

0 ,

2 ' 6 I m * I 1.

m

C-17 Tons

C-141BHWfS

1 I 1

B 0

z e I IO 12 II

ar,

Fi,we 2 Use Rate set to nominal value

C-5A Tons

8 '", - I

1 I I I I I

2 4 6 8 10 12 14

day

Figure 3 Use Rate. Fuel Flow, Euroute Time and MOG at 8 extreme points

8 Whence an Analysis of Structural Model Uncertainty? It was suggested earlier that there might be a useful general paradigm for structural uncertainty of the form

{ m € JM, m - sm} 4 Y - Qy I

{z E D,t - fz} 5.' y - fy .

which parallels that for input uncertainty,

We then offered that the paradigm likely would not be sufficient for analysis. We said this for two reasons.

First. it seems too much to ask that a single (simple?) methodology would be adequate for a general notion of model. Secondly. and more practically. estimation for input uncertainty requires that model predictions y be obtained in computer experiments. This requirement is impossible to meet for all general families .M of models. Therefore. we take an approach of investigating structural model uncertainty for a special case, which is outlined below.

9 Structural Uncertainty for a Discrete Event Simulation Model We consider a discrete event simulation model which moves actors through activities. Actors are objects in the simulation, and activities are actions taken by actors or processes which operate upon actors. The discrete event simulation model produces a sequence of activities {AI A2 . . .} for each actor from a prescribed set of possible activities. Each activity Ai for an actor begins at an event time t i and lasts for a duration or residence time T i . We think of the activity (process) as described by a comparmtenral model.

There are many relevant question one might ask about predictions for each actor. Some of them are: (1) are activities (Ai) in a proper sequence? (2) are there missing or extra activities (Ai*)? and ( 3 ) can uncertainties in predicted activity event times ( t i ) and residence times ( p i ) be quantified to support more realistic forecasting and prediction?

This developing context allows a restricted view of the family M of models. Therefore. a practical analysis methodology might be developed from it. To motivate this approach, Fi,we 4 presents a schematic description of activities for a single actor. Fi,we 5 presents a schematic description of how a compartmental model defines a single activity. Model predictions y of activity sequences. event times and residence times generally can be examined as previously described under input uncertainty. We propose that structural uncertainty might be examined by observing y as different submodels or individual compartmental models are replaced with others. In particular. we look at refinements of compartmental models as generating M. Refinements are indicated in Fi-gu-e 5. The main model is 1 is refined by replacing it with 6 more-detailed submodels. 1.1, 1.2. 1.3, etc. Submodel 1.3 is further refined.

A = a A = a A = a A = a 1 1 2 6 3 5 4 3

I 7 2 r3 I4

r

Fi,we 4 Actor's view: time line of 4 activities

5

Page 8: Structural Model Uncertainty in Stochastic Simulation/67531/metadc691167/m2/1/high...Structural Model Uncertainty in Stochastic Simulation ... to structural model uncertainty. ...

1

A summary of points follows.

References

Atwood, C. L. (1993). Individual model evaluation and probabilistic weighting of models. In Proceedings of Workshop I in Advanced Topics in Risk and Reliability Ana1ui.s. Model Uncertainty: Its Characterization and Quantification. NUREG/CP-0138, pages 99-106, Annapolis, MD. October 20-22. U.S. Nuclear Regulatory Commission.

Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. B, 57( 1):45- 97.

Kendall. M. and Stuart, A. (1979). Tile Advanced Theon, of Statistics. volume 2, chapter 26. MacMillan Publishing Co.. New York, fourth edition.

structural uncertainty may refer to a f d y Of Laskey. K. B. (19%). Model uncertainty: theory and practical models, which can be enumerated explicitly or exist implications. IEEE Transactions on Systems. Man. arid hypothetically. Cybernetics. 26(3):340-348.

M = {me I 0 E 0 ) is too general except for an explicitly enumerated set of models.

McKay, M. D. (1995). Evaluating prediction uncertainty. Technical Report NUREG/CR-6311. U.S. Nuclear Regulatory Commission and Los Alamos National Laboratory.

Suppose h(.) is an identifiable module or submodel of 74x1 = r n h ( t , h ( Z ) ) .

In one context. the study of structural uncertainty is with respect to h(.) within rn(.):

McKay. M. D. and Beckman. R. J. (19%). Using variance to identify important inputs. In Proceedings of tlie American Statistical Association Section on Physical and Engineering Sciences* Toronto* August 14-18.

McKay, M. D.. Conover, W. J.. and Beckman. R. J. (1979). A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technornetrics, 21(2):239-245.

An important question. sti l l under investigation, is whether anything has been simplified.

Mosleh, A., Siu, N., Smidts, C.. and Lui, C.. editors (1993). Proceedings of Workshop I in Advanced Topics in Risk and Reliability Analysis, Model Uncerrainty: Its Characterization and Quantification. NUREG/CP-O138. Annapolis, MD. October 20-22. US. Nuclear Regulatory Commission.

10 Final Thoughts A generally acceptable method for analysis of input 71:288-313. uncertainty exists, although in rather early stages of developmenL On the Other hand' no general me*od for analysis Of structural is availab1e. Should we look for that general One now* Or should we focus On Particular types of models in hopes 0fg-g insights? The question is open. Regulatory Commission.

Pearson, K. (1903). Mathematical contributions to the theory of evolution. Proceedings of tlie Royal Society of London,

Winkler, R. L. (1993). Modeling uncertainty: Probabilities for models? In Proceedings of Workshop I in Advanced Topics in Risk and Reliability Analysis, Model Uncertainty. Its Characterization and Quanrification. NUREG/CP-0138, pages 107-116. h ~ p ~ l i s . MD. October 20-22. US. Nuclear

6