Verification and Validation. Statistical analysis of steady-state simulations Warmup and run length...

Verification and Validation

• Verification and Validation

• Statistical analysis of steady-state simulations Warmup and run length Truncated replications Batching

What we will do…

V & V Introduction

• In a simulation, the real-world system is abstracted by a conceptual model a series of mathematical and logical relationships concerning the components and structure of the system.

• The conceptual model is then coded into a computer recognizable form (i.e., an operational model), which we hope is an accurate imitation of the real-world system.

• The accuracy of the simulation must be checked before we can make valid conclusions based on the results from a number of runs.

Real World

Real World

Conceptual Model

Quantitative ModelsAnalytical Models

Operational Model

“Code” V & V

Introduction Cont…

This checking process consists of two main components:

Verification: Is “Code” = Model? (debugging) Determine if the computer implementation of the

conceptual model is correct. Does the computer code represent the model that has been formulated?

Validation: Is Model = System? Determine if the conceptual model is a reasonable

representation of the real-world system.

V & V is an iterative process to correct the “Code” errors and modify the conceptual model to better represent the real-world system

The Truth: Can probably never completely verify, especially for large models

Common Errors While Developing Models

• Incorrect data

• Mixed units of measure Hours Vs. Minutes

• Blockages and dead locks Seize a resource but forgot to release Forgot to dispose the entity at the end

• Incorrectly overwriting attributes and variables Names

• Incorrect indexing When you index beyond available queues and resources

Verification

Verification is debugging of code so conceptual model is accurately reflected by the operational model

Various common sense suggestions that can be used in the verification process:

• Write the simulation program in a logical, well-ordered manner. Make use of detailed flowcharts when writing the code

• Make the code as self-documenting as possible. Define all variables and state the purpose of each section of the program.

• Have the computer code checked by more than one person.

• Check to see that the values of the input parameters have not been changed inadvertently during the course of a simulation run.

• For a variety of input parameter values, examine the output of simulation runs for reasonableness.

• Use traces to check that the program performs as intended. Break point: Stop at a particular block Watch point: Stop when a condition is true

– NQ(1) > 10 (If the queue length is > 10, stop and check)

Intercept: Stop whenever a particular entity moves

Verification Cont…

Verification Cont…

• Some techniques to attempt verification Eliminate error messages (obviously) Single entity release, Step through logic

– Set Batch Size = 1 in Arrive

– Replace distributions with a constant “Stress” model under extreme conditions Performance estimation Look at generated SIMAN .mod and .exp files

– Run > SIMAN > View

Validation

• Process of developing confidence that inference drawn on model tell us something about the real system

• Conceptual Validity Does the model structured, adequately represent the system?

– Rationalism

• Operational Validity Is behavior of model is characteristic of real world system?

– Empericalism

• Believability Do ultimate users have confidence in this model?

Validation

A variety of subjective and objective techniques can be used to validate the conceptual model.

• Face Validity

• Validation of Model Assumptions

• Validating Input-Output Transformations

Face Validity

A conceptual model must be reasonable “on its face” to those who are knowledgeable about the real-world system.

• Have experts examine the assumptions or the mathematical relationships of the conceptual model for correctness. Such a critique by experts would be of aid in identifying

any deficiencies or errors in the conceptual model (Turing Test: compare simulation Vs actual system)

The credibility of the conceptual model would be enhanced as these deficiencies are corrected during the iterative verification and validation process.

If the conceptual model is not overly complicated, additional methods can be used to check face validity.

• Conduct a manual trace of the conceptual model.• Perform elementary sensitivity analysis by varying

selected “critical” input parameters and observing whether the model behaves as expected.

We consider two types of model assumptions:

• Structural assumptions – i.e., assumptions concerning the operation of the real-world system

• Data assumptions

Structural assumptions can be validated by observing the real-world system and by discussing the system with the appropriate personnel

Validation of Model Assumptions

We could make the following structural assumptions about the queues that form in the customer service area at a bank.

• Patrons form one long line, with the person at the front of the line receiving service as soon as one of the tellers becomes idle.

• A customer might leave the line if the others in line are moving too slowly.

• A customer seeing 10 or more patrons in the system may decide not to join the line.

Validation of Model Assumptions – Examples

Assumptions concerning the data that are collected may also be necessary.

Consider the interarrival times at the above bank during peak banking periods. Could assume these interarrivals are i.i.d. exponential random variables. In order to validate these assumptions, we should proceed as follows.

• Consult with bank personnel to determine when peak banking periods occur.

• Collect interarrival data from these periods.

• Conduct a statistical test to check that the assumption of independent interarrivals is reasonable.

• Estimate the parameter of the (supposedly) exponential distribution.

• Conduct a statistical goodness-of-fit test to check that the assumption of exponential interarrivals is reasonable.

Validation of Model Assumptions – Examples

Validating Input – Output Transformations

We can treat the conceptual model as a function that transforms certain input parameters into output performance measures.

In the banking example, input parameters could include:

• The distributional forms of the patron interarrival times and teller service times.

• The number of tellers present.

• The customer queuing discipline.

The average customer waiting time and server utilization might be the output performance measures of interest.

The basic principle of input-output validation is the comparison of output from the verified operating model to data from the real-world system.

Input-output validation requires that the real-world system currently exist.

Example

One method of comparison uses the familiar t test.

• Suppose we collected data from the bank under study, and the average customer service time during a particular peak banking period was 2.50 minutes.

• Further suppose that five independent simulation runs of this banking period were conducted (and that the simulations were all initialized under the same conditions).

• The average customer service times from the five simulations were 1.60, 1.75, 2.12, 1.94, 1.89 minutes.

We would expect the simulated average service times to be consistent with the observed average service time.

• Therefore, the hypothesis to be tested is: H0 : E[Xi] = 2.50 min Versus H1 : E[Xi] ≠

2.50 min

• where Xi is the random variable corresponding to the average customer service time from the ith simulation run.

Example Cont…

Define

μ0 = 2.50 (= E[Xi] under H0),

n = 5 (the number of independent simulation runs),

Example Cont…

=∑ n

i=1

Xi

n(sample mean of runs)

(sample variance of runs)

By design and a central limit theorem, the Xi’s are approximately i.i.d. normal random variables. So,

(X – μ0)

S / nt0 =

Is approximately a t random variable with n-1 degrees of freedom if H0 is true.

Example Cont…

XFor this example, = 1.86, S2 = 0.0387, and t0 = -7.28

Taking = 0.05, t table gives t4,0.025 = 2.78

Therefore H0 is rejected.

This suggests that our operational model does not produce realistic customer service times. Changes in the conceptual model or computer code may be necessary, leading to another iteration of the verification and validation process.

Robustness

• Suppose we have validated the conceptual model (and verified the associated simulation code) of the existing real-world system.

• So we can say that the simulation adequately mimics the real-world system. And we can assume that some non-existing system of interest and our conceptual model have only minor differences.

• If we wish to compare the real-world system to non-existing systems with alternative designs or with different input parameters, the conceptual model (and associated code) should be robust.

• Should be able to make small modifications in our operational model and then use this new version of the code to generate valid output performance values for the non-existing system.

• Such minor changes might involve certain numerical input parameters (e.g., the customer inter-arrival rate) or the form of a certain statistical distribution (e.g., the service time distribution).

• But it may be difficult to validate the model of a non-existing system if it differs substantially from the conceptual model of the real-world system.

Historical Data Validation

Instead of running the operational model with artificial input data, we could drive the model with the actual historical record.

Then it’s reasonable to expect the simulation to yield output results very close to those observed from the real-world system.

Example Outline

Suppose we have collected interarrival and service time data from the bank during n independent peak periods.

• Let Wj denote the observed average customer waiting time from the jth peak period, j = 1…n.

• For fixed j, we can drive the operational model with the actual interarrival and service times to get the (simulated) average customer waiting time Yj.

• We hope that Dj ≡ Wj – Yj ≈ 0 for all j.

• We could do a paired t test to test H0: E[Dj] = 0

Steady State Simulation

Time Frame of Simulations

• Terminating: Specific starting, stopping conditions Run length will be well-defined (and finite; Known

starting and stopping conditions)

• Steady-state: Long-run (technically forever) Theoretically, initial conditions don’t matter (but

practically they usually do) Not clear how to terminate a simulation run

(theoretically infinite) Interested in system response over long period of

time

• This is really a question of intent of the study

• Has major impact on how output analysis is done

• Sometimes it’s not clear which is appropriate

Techniques for Steady State Simulation

• The main difficulty is to obtain independent simulation runs with exclusion of the transient period .

• If model warms up very slowly, truncated replications can be costly

Have to “pay” warm-up on each replication

• Two techniques commonly used for steady state simulation are: Method of Batch means, and Independent Replication.

• None of these two methods is superior to the other in all cases.

Warm Up and Run Length

• Most models start empty and idle Empty: No entities are present at time 0 Idle: All resources are idle at time 0 In a terminating simulation this is OK if

realistic In a steady-state simulation, though, this can

bias the output for a while after startup– Usually downward (results are biased low)

in queueing-type models that eventually get congested

– Depending on model, parameters, and run length, the bias can be very severe

Warm Up and Run Length (cont’d.)

• Remedies for initialization bias Better starting state, more typical of steady state

– Throw some entities around the model– How do you know how many to throw and where?

This is what you’re trying to estimate in the first place!

Make the run so long that bias is overwhelmed– Might work if initial bias is weak or dissipates quickly

Let model warm up, still starting empty and idle– Run > Setup > Replication Parameters: Warm-up

Period» Time units!

– “Clears” all statistics at that point for summary report, any Outputs saved data from Statistic module of results across replications

Method of Independent Replications

• Suppose you have n equal batches of m observations each.

=∑ m

j=1

Xij

mThe mean of each batch is: meani

Overall estimate is: Estimate = ∑ n

i=1

meani

n

The 100(1 - /2)% CI using t table is: [ Estimate t S ]

Where the variance S2 = ∑ n

i=1

(meani – Estimate)2

n - 1

Method of Independent Replications (cont’d.)


• Warm-up and run length times? Most practical idea: preliminary runs, plots Simply “eyeball” them Be careful about variability — make multiple

replications, superimpose plots Also, be careful to note “explosions”

• Possibility – different Warm-up Periods for different output processes To be conservative, take the max Must specify a single Warm-up Period for the

whole model


Example: Lengthen Replications to 5 days (7200 min), do 10 Replications

Truncated Replications

• If you can identify appropriate warm-up and run-length times, just make replications as for terminating simulations

Only difference: Specify Warm-up Period inRun > Setup > Replication Parameters

Proceed with confidence intervals, comparisons, all statistical analysis as in terminating case

• So… What should be the length of warm-up period?

Abate J., and W. Whitt, Transient behavior of regular Brownian motion, Advance Applied Probability, 19, 560-631, 1987

Batching in a Single Run

• Alternative: Just one R E A L L Y long run

Only have to “pay” warm-up once

Problem: Have only one “replication” and you need more than that to form a variance estimate (the basic quantity needed for statistical analysis)

– Big no-no: Use the individual points within the run as “data” for variance estimate

– Usually correlated (not indep.), variance estimate biased

Batching in a Single Run (cont’d.)

• Break each output record from the run into a few large batches Tally (discrete-time) outputs: Observation-

based Time-Persistent (continuous-time): Time-based

• Take averages over batches as “basic” statistics for estimation: Batch means Tally outputs: Simple arithmetic averages Time-Persistent: Continuous-time averages

• Treat batch means as IID Key: batch size must be big enough for low

correlation between successive batches Still might want to truncate (once, time-based)

• Suppose you have n equal batches of m observations each.

=∑ m

j=1

Xij

mThe mean of each batch is: meani

Overall estimate is: Estimate = ∑ n

i=1

meani

n

The 100(1 - /2)% CI using t table is: [ Estimate t S ]

Where the variance S2 = ∑ n

i=1

(meani – Estimate)2

n - 1



• One replication of 50 days (about the same effort as 10 replications of 5 days each)

How to choose batch size?

Equivalently, how to choose the number of batches for a fixed run length?

Want batches big enough so that batch means appear uncorrelated.

Batching in a Single Run (cont’d.)• Arena automatically attempts to form 95% confidence

intervals on steady-state output measures via batch means from within each single replication

“Half Width” column in reports from one replication

– In Category Overview report if you just have one replication

– In Category by Replication report if you have multiple replications

Ignore if you’re doing a terminating simulation

Won’t report anything if your run is not long enough

– “(Insufficient)” if you don’t have the minimum amount of data Arena requires even to form a CI

– “(Correlated)” if you don’t have enough data to form nearly-uncorrelated batch means, required to be safe

Verification and Validation. Statistical analysis of steady-state simulations Warmup and run length...

Documents

Transcript of Verification and Validation. Statistical analysis of steady-state simulations Warmup and run length...