Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision...

41
Marko Tainio, marko.taini o[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011

Transcript of Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision...

Page 1: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Modeling and Monte Carlo simulation

Marko TainioDecision analysis and Risk

Management course in Kuopio10.3.2011

Page 2: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Content

• Computer modeling– Why?

• Approximation– When data is not available

• Monte Carlo simulation– When and why to use?

• Common uncertainty distributions– Normal distribution is not the only option

Page 3: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Computer modeling

Page 4: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Modeling

Correct Wrong

Page 5: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Modeling, models

• http://en.wikipedia.org/wiki/Mathematical_model

• A mathematical model is a description of a system using mathematical language

• The process of developing a mathematical model is termed mathematical modelling (also spelled modeling)

• The terms "modeling" and "simulation" are often used interchangeably

Page 6: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Why models?

• Information can be created with measurements and with models

• Benefits of modeling in comparison to measurements:– Not everything can be measured (e.g. air pollution concentration

all over the country)– Future scenarios can not be measured– Modeling is often cheaper than measurements

• Measurements and models are dependent on each other!– Without measurements models are impossible to create– Without modeling, measurement are difficult/impossible to

generalize– Measurements and models can be used to design or validate

each others

Page 7: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Classification of models

• Based on uncertainty – Deterministic (input and output variables are

fixed values)– Stochastic (aka. probabilistic) (at least one

of the input OR output variables is probabilistic)

• Based on time– Static (time is not taken into account)– Dynamic (time-varying interactions among

variables are taken into account)

Page 8: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Deterministic vs. Stochastic

Deterministic• Input values and the

results are based on point values:– Model result is always

same!

• For example, laws of physic are deterministic

Stochastic• Some of input values

and model result are based on uncertainty distributions– Model result is always

a distribution!

• Most decision analysis models contains uncertainty

Page 9: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

The general steps when designing models

Step 1. Identify the problem.Step 2. Formulate the problem.Step 3. Collect and process data.Step 4. Formulate and develop a model.Step 5. Validate the model.Step 6. Document model for future use.

Ref:http://citeseerx.ist.psu.edu/viewdoc/

download;jsessionid=4B8162C34D9B9D492158493E25DC8F2C?doi=10.1.1.81.8350&rep=rep1&type=pdf

Page 10: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Modeling tools

• Paper and pencil– Computers are not necessity!

• Microsoft Excel and Open Office equivalent– Good and widely used modeling tool

• Simulation programs designed for computer modeling– For example R, SAS, Analytica, Matlab, Scilab

• Selection of tool depends on available time, money and other features. One tool might not fit to all the situations!

Page 11: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Sayings about models

• “A good model is a judicious tradeoff between realism and simplicity.”

• Make things as simple as possible, but not simpler (Albert Einstein)

• The good modeler knows when he/she has achieved the correct level of simplicity!– Also, some methods exists to calculate

correct level of simplicity

Page 12: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Approximation

Page 13: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Approximation

• Definition: An approximation is an inexact representation of something that is still close enough to be useful

• In decision analysis, risk assessment and computer modeling approximation is necessity– Without approximation assessments would be

impossible to complete

Page 14: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Example of approximation

• Case: You need to define fine particulate matter (PM2.5) concentration in Kuopio for year 2008.

• You have following information available:– PM2.5 concentration for Jyväskylä (city 100 km west

from Kuopio) for year 2008: 8.0 μg/m3– PM2.5 concentration for Joensuu (city 100 km east

from Kuopio) for year 2008: 7.0 μg/m3– PM2.5 concentration for Kuopio for year 2000: 9.0

μg/m3

• Which one value you would use and why?

Page 15: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Approximation in calculation of integrals

• Integrals are calculated for example when estimating life expectancy of population

• The challenge in approximation of integrals are following:– Values are provided only for fixed points (e.g.

first of January every year)– Often values for points that were not

measured are required!– To calculated the results, modeler needs to

approximate the function

Page 16: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Example with population data

Population

0

10 000

20 000

30 000

40 000

50 000

60 000

Population 55 381 53 720 51 034 45 420 38 607 30 499

50 51 52 53 54 55

How many people lived to 52 and half years?

Imaginary data on number of people in defined cohort in different ages. The number of people have been calculated 1st of January of each year.

?

Page 17: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Example with PM2.5 concentration

PM2.5 concentation

0

5

10

15

20

25

Weekday

PM

2.5

con

cen

trat

ion

PM2.5 concentation 14.5 13.2 19.4 23 13.1 17.2

Monday TuesdayWednes

dayThursda

yFriday Saturday Sunday

Page 18: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Approximation - summary

• Approximation is required in decision analysis and risk assessment

• More data you can collect, the better the approximation

• The better you understand the problem, the better the approximation

Page 19: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Monte Carlo simulation

Page 20: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Monte Carlo - definitionhttp://en.wikipedia.org/wiki/Monte_Carlo_simulation

• Monte Carlo methods (or Monte Carlo experiments) are a class of computational algorithms that rely on repeated random sampling to compute their results

• Monte Carlo methods are often used in simulating physical and mathematical systems

• Monte Carlo methods are most suited to calculation by a computer and tend to be used when it is infeasible or impossible to compute an exact result with a deterministic algorithm

• In risk & decision analysis, Monte Carlo is the most common way of propagating uncertainty through the model!

Page 21: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Monte Carlo name?• http://en.wikipedia.org/wiki/Monte_Carlo_simulation

• Modern Monte Carlo method was developed in Los Alamos National Laboratory, USA– Los Alamos is famous from The Manhattan Project (atomic

bomb)• In late 1940’s the scientists in Los Alamos were faced

with problems that could not be solved with analytical calculations

• John von Neumann and Stanislaw Ulam suggested that the problem be solved by modeling the experiment on a computer using chance

• Being secret, their work required a code name. Von Neumann chose the name "Monte Carlo".– The name is a reference to the Monte Carlo Casino in Monaco

where Ulam's uncle would borrow money to gamble.

Page 22: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Why Monte Carlo?

• Monte Carlo allows combination of uncertainties in the model– For example: multiplying of different uncertainties

(see next slide)

• Analytical method for combination of uncertainties is more complicated or, in some cases, impossible to calculate

• Since decision models involves always uncertainties, a method to propagate these uncertainties through the model is needed!

Page 23: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

0-4 -3 -2 -1 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.05

0.15

0.25

0.35

0.45

Variable a

Pro

ba

bil

ity

De

ns

ity

Normal (1,1)

1 20.4 0.6 0.8 1.2 1.4 1.6 1.8 2.20

1

2

0.5

1.5

Variable b

Pro

ba

bil

ity

De

ns

ity

0-4 -3 -2 -1 1 2 3 4 5 6 70

0.1

0.2

0.3

0.4

0.5

0.6

Results

Pro

ba

bil

ity

De

ns

ity

Lognormal (1,1.2)

Variable a x Variable b = ResultsNormal (1,1) x Lognormal(1,1.2) = Results

Page 24: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Estimation of Pi

The ratio of the area of an inscribed circle to that of the surrounding square is π/4.

Since the two areas are in the ratio π/4, the objects should fall in the areas in approximately the same ratio. Thus, counting the number of objects in the circle and dividing by the total number of objects in the square will yield an approximation for π/4.

Multiplying the result by 4 will then yield an approximation for π itself.

Page 25: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Iteration of uncertainty function with Monte Carlo

0

1000

100

200

300

400

500

600

700

800

900

0-3 -2 -1 1 2 3 4 5

Iter

ati

on

(R

un

)

Va1

Iteration Value1 -1.02 2.63 1.44 1.35 0.26 2.27 -0.88 1.59 1.5

10 -0.111 -0.912 1.513 1.014 1.315 1.116 0.217 1.218 0.519 0.520 1.021 2.522 0.223 0.824 0.925 0.626 1.627 0.0… …

1000 1.9

Normal distribution with mean of 1 and with standard deviation of 1

Page 26: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

0-3 -2 -1 1 2 3 4 50

0.1

0.2

0.3

0.4

0.05

0.15

0.25

0.35

0.45

Va1

Pro

ba

bil

ity

De

nsi

ty

Page 27: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Critical issues in Monte Carlo

• How iteration is done:– Iterations should be independent (non-

correlated) from each other;– Creation of random numbers is a science on

its own and we will not focus on that issue

• How many iterations is required?– More iterations, more computer power is

needed– In practice we prefer 10 000 iterations

Page 28: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Example from # of iterations

0-1 1 2-1.5 -0.5 0.5 1.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Va1

Pro

bab

ilit

y D

en

sity

0-3 -2 -1 1 2 3 4 50

0.1

0.2

0.3

0.4

0.05

0.15

0.25

0.35

Va1

Pro

bab

ilit

y D

en

sity

10 iterations 100 iterations

1000 iterations 10000 iterations

0-3 -2 -1 1 2 3 4 50

0.1

0.2

0.3

0.4

0.05

0.15

0.25

0.35

0.45

Va1

Pro

ba

bil

ity

De

ns

ity

0-4 -3 -2 -1 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.05

0.15

0.25

0.35

0.45

Va1

Pro

bab

ility

De

ns

ity

Page 29: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Common uncertainty distributions

Page 30: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Most used uncertainty distributions in our risk models

• Bernoulli

• Lognormal

• Normal

• Triangular

• Uniform

Page 31: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Normal distribution

0-4 -3 -2 -1 1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.05

0.15

0.25

0.35

0.45

Normal

Pro

ba

bil

ity

De

nsi

ty

The range [mean-standard deviation, mean + standard deviation] encloses about 68% of the probability.

Page 32: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Normal distribution

• Alternative names: Gaussian, Bell-shaped

• Most common distribution– Theoretically sum of number of independent

events has normal distribution

• Properties:– symmetric around the mean – the upper and lower bounds are unknown,

possibly very large or very small (unbounded)

Page 33: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Lognormal distribution

0 1 2 3 40.5 1.5 2.5 3.50

1

0.2

0.4

0.6

0.8

1.2

1.4

Lognormal

Pro

ba

bil

ity

Den

sit

y

The range [median/gsdev, median x gsdev] encloses about 68%of the probability

Page 34: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Lognormal distribution

• Alternative names: log normal, log-normal, Galton distribution.

• Also a common distribution:– The multiplicative version of the central limit theorem

says that the product or ratio of many independent variables tends to be lognormal — just as their sum tends to a normal distribution.

• Properties:– Asymmetric around the mean– the upper bound unknown, possibly very large

(unbounded)– Mean and median different!

Page 35: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Triangular distribution

1 20.4 0.6 0.8 1.2 1.4 1.6 1.80

1

0.2

0.4

0.6

0.8

1.2

1.4

Triangular

Pro

ba

bil

ity

De

ns

ity

Page 36: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Triangular distribution

• Properties:– Min, max and mode defined– Mean and median can be same or different

• Good to use:– When you want closed boundaries for your

distribution– And when you have strong candidate for most

likely value (=mode)

Page 37: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Uniform

1 20.9 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.10

1

0.10.2

0.30.4

0.50.6

0.70.8

0.9

Uniform

Pro

ba

bil

ity

De

ns

ity

Page 38: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Uniform

• Properties:– Min and max defined– Mean and mode same:

• (min+max)/2

• Good to use:– When you want closed boundaries for your

distribution– When the shape of the distribution is unknown

• Random number sampling e.g. in Excel is based on Uniform distribution

Page 39: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Bernoulli

0 10

0.1

0.2

0.3

0.4

0.5

0.6

Bernoulli

Pro

ba

bil

ity

Page 40: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Bernoulli

• Alternative name: Binomial distribution.• Properties:

– Defines a discrete probability distribution with probability p of result 1 and probability (1 - p) of result 0.

– “Coin flipping distribution”

• Good to use:– When you want to combine two sets of data

(e.g. two model results)

Page 41: Marko Tainio, marko.tainio[at]thl.fi Modeling and Monte Carlo simulation Marko Tainio Decision analysis and Risk Management course in Kuopio 10.3.2011.

Marko Tainio, marko.tainio[at]thl.fi

Example of uncertainty distributions

• Case: You need to define fine particulate matter (PM2.5) concentration in Kuopio for year 2008.

• You have following information available:– PM2.5 concentration for Jyväskylä (city 100 km west

from Kuopio) for year 2008: 8.0 μg/m3– PM2.5 concentration for Joensuu (city 100 km east

from Kuopio) for year 2008: 7.0 μg/m3– PM2.5 concentration for Kuopio for year 2000: 9.0

μg/m3

• Which distribution, and with which parameters, you would use to describe the concentration?