Introduction to Mathematical Probability

1

Probability

SOLO HERMELIN

Updated: 6.06.09

2

SOLO

Table of Content

Probability

Set TheoryProbability Definitions

Theorem of AdditionConditional Probability

Total Probability Theorem

Statistical Independent Events Theorem of Multiplication

Conditional Probability - Bayes Formula

Random Variables

Probability Distribution and Probability Density Functions

Conditional Probability Distribution and Conditional Probability Density FunctionsExpected Value or Mathematical Expectation Variance Moments Functions of one Random VariableJointly, Distributed Random Variables Characteristic Function and Moment-Generating Function

Existence Theorems (Theorem 1 & Theorem 2)

3

SOLO

Table of Content (continue - 1)

Probability

Law of Large Numbers (History)Markov’s Inequality Chebyshev’s Inequality Bienaymé’s Inequality Chernoff’s and Hoeffding’s Bounds

Chernoff’s BoundHoeffding’s Bound

Convergence Concepts The Law of Large Numbers Central Limit Theorem

Bernoulli Trials – The Binomial DistributionPoisson Asymptotical Development (Law of Rare Events)Normal (Gaussian) DistributionDe Moivre-Laplace Asymptotical DevelopmentLaplacian DistributionGama DistributionBeta Distribution

Distributions

4

SOLO


Probability

Cauchy DistributionExponential DistributionChi-square DistributionStudent’s t-DistributionUniform Distribution (Continuous)Rayleigh DistributionRice DistributionWeibull Distribution

Kinetic Theory of GasesMaxwell’s Velocity DistributionMolecular Models

Boltzman StatisticsBose-Einstein StatisticsFermi-Dirac Statistics

Monte Carlo Method

Generating Continuous Random Variables Importance Sampling

Generating Discrete Random Variables

Metropolis & Metropolis – Hastings AlgorithmsMarkov Chain Monte Carlo (MCMC) Gibbs SamplingMonte Carlo Integration

5

SOLO


Probability

AppendicesPermutationsCombinations

References

Random Processes

Stationarity of a Random Process

Ergodicity

Markov ProcessesWhite Noise

Markov ChainsExistence Theorems (Theorem 3)

6

SOLO Set Theory

A = (ζ1, ζ2,…, ζn) – a set of n elements

A set A is a collection of objects (elements of the set) ζ1, ζ2,…, ζn

A (x)= (|x| < 1) – a set of all numbers smaller than 1

A (x,y)= (0 <x <T, 0<y<T) – a set of points (x,y) in a square

Ǿ - the set that contains no elements

S - the set that contains all elements (Set space)

S

A = Set space of a die: six independent events {1}, {2}, {3}, {4}, {5}, {6}

Examples

7

SOLO Set Theory

Set Operations

Inclusion - A is included in B if BA BxAx

Equality ABandBABA

Addition BxorAxifBAxBA

Multiplication BxandAxifBAxBA

CBACBA

AAA AOA SSA

AAA OOA ASA

S

A

B

S

A

B

BA

S

A

B

BA

Complement of A A OAAandSAA S

A A

Difference BABA S BA

B

A

AB

8

SOLO Set Theory

Set Operations

Incompatible Sets A and B are incompatible iff OBA

Decomposition of a Set

jiOAAandAAAA jin 21

SOBA

B

A

S nAAAA 21

1A 2A nA

jiOAA ji

S

1A 2A nA

If

we say that A is decomposed in incompatible sets.

jiOAAandSAAA jin 21If

we say that the set space S is decomposed in exhaustive andincompatible sets.

De Morgan Law BABA BABA To find the complement of a set operations we must interchangebetween and , and use the complements of the sets.

August De Morgan(1806 – 1871)

On other form of De Morgan Law

AAi

ii

i

ii

iAA

Table of Content

9

SOLO Probability

Pr (A) is the probability of the event A if

S nAAAA 21

1A 2A nA

jiOAA ji

0Pr A(1)

(3) If jiOAAandAAAA jin 21

1Pr S(2)

then nAAAA PrPrPrPr 21

Probability Axiomatic Definition

Probability Geometric Definition

Assume that the probability of an event in a geometric region A is defined as theratio between A surface to surface of S.

SSurface

ASurfaceA Pr

0Pr A(1)

1Pr S(2)

(3) If jiOAAandAAAA jin 21


S

A

10

SOLO Probability

From those definition we can prove the following: 0OP(1’)

Proof: OOSandOSS

0PrPrPrPr3

OOSS

APAP 1(2’)

Proof: OAAandAAS

AAAAS Pr1PrPrPr1Pr32

1Pr0 A(3’)

Proof:

1Pr0Pr1Pr

1'2

AAA

APr01

0Pr A(1) 1Pr S(2) (3) If jiOAAandAAAA jin 21


AABAIf PrPr (4’)

Proof:

BAAABB PrPr0PrPrPr00

3

OAABandAABB

BABABA PrPrPrPr(5’)

Proof: OABBAandABBAB

OABAandABABA

BABABAABBAB

ABABA

PrPrPrPr

PrPrPr

PrPrPr3

3

Table of Content

11

SOLO Probability

n

ii

n

n

kjikji

kji

n

jiji

ji

n

ii

n

ii AAAAAAAA

1

13

,.

2

.

1

11

Pr1PrPrPrPr(6’)

Proof by induction:

212121 PrPrPrPr AAAAAA For n = 2 we found that satisfies the equation

Assume equation true for n – 1.

1

1

23

1

,.

21

.

11

1

1

1

Pr1PrPrPrPrn

ini

n

n

kjikji

nkji

n

jiji

nji

n

ini

n

ini AAAAAAAAAAAAA

Let calculate for n

but

1

1

1

1

23

1

,.

21

.

11

1

1

1

1

1

1

11

Pr1PrPrPr

PrPrPrPrPr

n

inin

n

ii

n

n

kjikji

kji

n

jiji

ji

n

ii

n

n

iin

n

iin

n

ii

n

ii

AAPAPAAAAAAA

AAAAAAA

1

1

23

1

,.

21

.

11

1

1

1

Pr1PrPrPrPrn

ii

n

n

kjikji

kji

n

jiji

ji

n

ii

n

ii AAAAAAAA

Theorem of Addition

12

SOLO Probability

(6’)

Proof by induction (continue):

1

1

23

1

,.

21

.

11

1

1

1

24

1

.,.

31

,.

21

.

11

11

Pr1PrPrPrPr

Pr1PrPrPrPrPr

n

ini

n

n

kjikji

nkji

n

jiji

nji

n

inin

n

ii

n

n

lkjilkji

lkji

n

kjikji

kji

n

jiji

ji

n

ii

n

ii

AAAAAAAAAAAA

AAAAAAAAAAAA

Use the fact that

1

11

!1!1

!1

!!1

!1

!!1

!1

!!

!

k

n

k

n

kkn

n

kkn

n

kkn

n

kn

n

kkn

n

k

n

to obtain

q.e.d.

n

ii

n

n

kjikji

kji

n

jiji

ji

n

ii

n

ii AAAAAAAA

1

13

,.

2

.

1

11

Pr1PrPrPrPr

1

1

23

1

,.

21

.

11

1

1

1

Pr1PrPrPrPrn

ini

n

n

kjikji

nkji

n

jiji

nji

n

ini

n

ini AAAAAAAAAAAAA

n

ii

n

n

kjikji

kji

n

jiji

ji

n

ii

n

ii AAAAAAAA

1

13

,.

2

.

1

11

Pr1PrPrPrPr

Theorem of Addition (continue)

Table of Content

13

SOLO Probability

Conditional Probability

S nAAAA 21

1A

jiOAA ji

1A

mAAAB 212A

2A 1A 2A

Given two events A and B decomposed in elementary events

jiOAAandAAAAA ji

n

iin

121

lkOAAandAAAAB lk

m

kkm

121

jiOAAandAAABA jir 21

nAAAA PrPrPrPr 21 mAAAB PrPrPrPr 21

nmrAAABA r ,PrPrPrPr 21

We want to find the probability of A event under the condition that the event B had occurred designed as P (A|B)

B

BA

AAA

AAABA

m

r

Pr

Pr

PrPrPr

PrPrPr|Pr

21

21

14

SOLO Probability

Conditional Probability S nAAAA 21

1A

jiOAA ji

1A

mAAAB 212A

2A 1A 2A

If the events A and B are statistical independent, that the fact that B occurred will not affect the probability of A to occur.

B

BABA

Pr

Pr|Pr

A

BAAB

Pr

Pr|Pr

ABA Pr|Pr BAAABBBABA PrPrPr|PrPr|PrPr

Definition:

n events Ai i = 1,2,…n are statistical independent if:

nrAAr

ii

r

ii ,,2PrPr

11

Table of Content

15

SOLO Probability

Conditional Probability - Bayes Formula

Using the relation:

llll AABBBABA Pr|PrPr|PrPr

klOBABABAB lk

m

kk ,

1

m

kk BAB

1

PrPr

we obtain:

m

kkk

lllll

AAB

AAB

B

AABBA

1

Pr|Pr

Pr|Pr

Pr

Pr|Pr|Pr

and Bayes Formula

Thomas Bayes 1702 - 1761

S

jiOAA ji

1A

mAAAB 21

2A 1A 2A

Table of Content

m

kkk

m

kk

m

kk AABBBABAB

111

Pr|PrPr|PrPrPr

16

SOLO Probability

Total Probability Theorem

Table of Content

nAAAS 21

jiOAA ji

1A 2A

nAB

jiOAAandSAAA jin 21If

we say that the set space S is decomposed in exhaustive andincompatible (exclusive) sets.

The Total Probability Theorem states that for any event B,its probability can be decomposed in terms of conditionalprobability as follows:

n

ii

n

ii BPBABAB

11

|Pr,PrPr

Using the relation:

llll AABBBABA Pr|PrPr|PrPr

klOBABABAB lk

n

kk ,

1

n

kk BAB

1

PrPr

For any event B

we obtain:

17

SOLO Probability

Statistical Independent Events

n

ii

n

n

kjikji i

i

n

jiji i

i

n

ii

tIndependenlStatisticaA

n

ii

n

n

kjikji

kji

n

jiji

ji

n

ii

n

ii

AAAA

AAAAAAAA

i

1

13

,.

3

1

2

.

2

1

1

1

1

13

,.

2

.

1

11

Pr1PrPrPr

Pr1PrPrPrPr

From Theorem of Addition

Therefore

n

ii


n

ii AA

i

11

Pr1Pr1

n

ii


n

ii AA

i

11

Pr11Pr

Since OAASAAn

ii

n

ii

n

ii

n

ii

1111

&

n

ii

n

ii AA

11

PrPr1

n

ii


n

ii AA

i

11

PrPr If the n events Ai i = 1,2,…n are statistical independent than are also statistical independent iA

n

iiA

1

Pr

n

ii

MorganDe

A1

Pr

n

ii


A

i

1

Pr1

nrAAr

ii

r

ii ,,2PrPr

11

Table of Content

18

SOLO Probability

Theorem of Multiplication

12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn

Proof ABABA /PrPrPr Start from

12121 /PrPrPr AAAAAAA nn

2131212 /Pr/Pr/Pr AAAAAAAAA nn in the same way

12122112211 /Pr/Pr/Pr nnnnnnn AAAAAAAAAAAAA

From those results we obtain:

12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn

q.e.d.

Table of Content

19

SOLO Review of Probability

Random Variables Let ascribe to each outcome or event a real number, such we have a one-to-one correspondence between the real numbers and the Space of Events. Any functionthat assigns a real number to each event in the Space of Events is called a randomvariable (a random function is more correct, but this is the used terminology).

X

x

0

X

1 2 3 4 5 6

x

The random variables can be:

- Discrete random variables for discrete events

- Continuous random variables for continuous events

Table of Content

20


Probability Distribution and Probability Density Functions The random variables map the space of events X to the space of real numbers x.

xP

x

0

0.1

The Probability Distribution Function or Cumulative Probability Distribution Function of x can be defined as:

(1)

PX (x) is a monotonic increasing function

xxXxPX Pr:

The Probability Distribution Function has the following properties:

xPX 0

(2) xPX 1

(3)

2121 xxxPxP XX

The Probability that X lies in the interval (a,b) is given by:

0Pr aPbPbXa XX

If PX (x) is a continuous differentiable function of x we can define

0lim

Prlim:

00

xd

xPd

x

xPxxP

x

xxXxxp XXX

xxX

the Probability Density Function of x.

xp

x

0

0.1

21


Probability Distribution and Probability Density Functions (continue – 1)

The Probability Distribution and Probability Density Functions of x can be defined also for discrete random variables.

integer

61

616/1

10

6

1Pr

0

6

10

k

k

kk

k

dxixdxxpkXkxPk

i

k

XX

xp

x

0 6

0.1

1 2 3 4 5

xP

6/1

3/1

2/13/2

6/5

ExampleSet space of a die:six independent events {x=1}, {x=2}, {x=3}, {x=4}, {x=5}, {x=6}

6

16

1:

iX ixxp

Where δ (x) is the Dirac delta function

1&0

00

dxxx

xx

X

1 2 3 4 5 6

x

22


Probability Distribution and Probability Density Functions (Examples)

(2) Poisson’s Distribution 00 exp!

, kk

knkp

k

(1) Binomial (Bernoulli) knkknk ppk

npp

knk

nnkp

11

!!

!,

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k

nkP ,

(3) Normal (Gaussian)

2

2/exp,;

22

xxp

(4) Laplacian Distribution

b

x

bbxp

exp

2

1,;

23



(5) Gama Distribution

00

0/exp

,;1

x

xxk

x

kxpk

k

(6) Beta Distribution

11

1

0

11

11

11

1,;

xx

duuu

xxxp

(7) Cauchy Distribution

220

0

1,;

xxxxp

24



SOLO

(8) Exponential Distribution

00

0exp;

x

xxxp

(9) Chi-square Distribution

00

02/exp2/

2/1;

12/

2/

x

xxxkkxp

k

k

Γ is the gamma function

0

1 exp dttta a

(10) Student’s t-Distribution

2/12 /12/

2/1;

x

xp

25



SOLO

(11) Uniform Distribution (Continuous)

bxxa

bxaabbaxp

0

1,;

(12) Rayleigh Distribution

2

2

2

2exp

;

xx

xp

(13) Rice Distribution

202

2

22

2exp

,;

vx

I

vxx

vxp

26



(14) Weibull Distribution

SOLO

00

0,,exp,,;

1

x

xxx

xp

Weibull Distribution Table of Content

27


Conditional Probability Distribution and Conditional Probability Density Functions

SOLO

The Conditional Probability Distribution Function or Cumulative Conditional Probability Distribution Function of x given is defined as:

xYyxXyxP YX /Pr://

Yy

(1) xyP YX 0//

(2) xyP YX 1//

PX/Y (x/y) is a monotonic increasing function (3)

212/1/ // xxyxPyxP YXYX

The Probability that X lies in the interval (a,b) given is given by: 0///Pr // yaPybPYbXa YXYX

If PX/Y (x/y) is a continuous differentiable function of x we can define

0

///lim

/Prlim:/ ///

00/

xd

yxPd

x

yxPyxxP

x

YxxXxyxp YXYXYX

xxYX

the Conditional Probability Density Function of x.

Yy

The random variables map the space of events X to the space of real numbers x.

28


Conditional Probability Distribution and Conditional Probability Density Functions

SOLO

Example 1Given PX (x) and pX (x) find PX/Y (x/x ≤ a) and pX/Y (x/x ≤ a)

axaPxP

axaxxP

XX

YX /

1//

axaPxp

axaxxp

XX

YX /

0//

Example 2Given PX (x) and pX (x) find PX/Y (x/ b <x ≤ a) and pX/Y (x/ b< x ≤ a)

ax

axbbPaP

bPxP

bx

axxPXX

XXYX

1

0

//

ax

axbbPaP

xp

bx

axxpXX

XYX

0

0

//

Table of Content

29


Expected Value or Mathematical Expectation

Given a Probability Density Function p (x) we define the Expected Value

For a Continuous Random Variable:

dxxpxxE X:

For a Discrete Random Variable: k

kXk xpxxE :

For a general function g (x) of the Random Variable x:

dxxpxgxgE X:

xp

x

0

0.1

xE

dxxp

dxxpxxE

X

X

:

The Expected Value is the center of surface enclosed between the Probability Density Function and x axis.

Table of Content

30


Variance

Given a Probability Density Functions p (x) we define the Variance

22222 2: xExExExExxExExExVar

Central Moment

kk xEx :'

Given a Probability Density Functions p (x) we define the Central Moment of order k about the origin

k

j

jkj

jkkk xE

j

kxExEx

0

'1:

Given a Probability Density Functions p (x) we define the Central Moment of order k about the Mean E (x)

Table of Content

31


Moments

Normal Distribution

2

2/exp;

22xxpX

oddnfor

evennfornxE

nn

0

131

12!22

2131

12 knfork

knforn

xEkk

n

n

Proof:

Start from: and differentiate k time with respect to a 0exp 2

aa

dxxa

Substitute a = 1/(2σ2) to obtain E [xn]

0

2

1231exp

1222

a

a

kdxxax

kkk

12

!

0

122/

0

222221212

!22

exp2

22

2/exp2

22/exp

2

1

2

kk

k

k

kxy

kkk

kdyyy

xdxxxdxxxxE

Now let compute:

2244 33 xExE

Chi-square

32


Moments

Gama Distribution

00

0/exp

,;1

x

xxk

x

kxpk

k

Beta Distribution

11

1

0

11

11

11

1,;

xx

duuu

xxxp

nknn

knk

n

k

kndxxx

kdxxx

kxE

0

1

0

1 /exp/exp1


0

1 exp dttta a

33


Moments

Uniform Distribution (Continuous)

cxxc

cxccccxp

0

2

1

,;

oddnfor

evennforn

c

n

x

cdxx

cxE

nc

c

nc

c

nn

0

1212

1

2

12

1

Rayleigh Distribution

2

2

2 2exp;

xx

xp

knfork

knforn

dxx

xdxxx

xxE

kk

n

nnn

2!2

121312

2exp

2

1

2exp

2

2

21

022

2

2

34


Example

Repeat an experiment m times to obtain X1, X2,…,Xm.

Define:Statistical Estimation:

m

XXXX m

m

21

Sample Variation: m

XXXXXXV mmmm

m

22

2

2

1

22 ii XEXE

jiXXE ji 0Since the experiment are uncorrelated:

m

XEXEXEXE m

m

21

mm

m

m

XXXEXEXEXVar m

mmmX m

2

2

22

2122

35


Example (continue)

Statistical Estimation:m

XXXX m

m

21

Sample Variation: m

XXXXXXV mmmm

m

22

2

2

1

Let compute: mVE

m

XXEXEXE

m

XXE

m

XXE mimimimi

22222

mm

XXXXEXXE

jiXXE

XE

miimi

ji

i

201

22

Therefore:

2

222

222122

m

m

mmm

m

XXEXEXE

m

XXE mimimi

2

222

2

2

1 11

m

m

mm

mm

m

XXEXXEXXEVE mmmm

m

Table of Content

36


Functions of one Random Variable

Let y = g (x) a given function of the random variable x defined o the domain Ω, withprobability distribution pX (x). We want to find pY (y).

Fundamental Theorem

Assume x1, x2, …, xn all the solutions of the equation nxgxgxgy 21

n

nXXXY xg

xp

xg

xp

xg

xpyp

''' 2

2

1

1 dyy

y

xpX

xg

3xd

x1x 2x

1xd2xd

3x

xd

xgdxg :'

Proof

n

i i

iXn

iiiX

n

iiiiY yd

xg

xpxdxpxdxxxydyYyydyp

111 'PrPr:

q.e.d.

CauchyDistribution

Derivation ofChi-square

37


Functions of one Random Variable (continue – 1)

Example 1

bxay

a

byp

ayp XY

1

Example 2

x

ay

y

ap

y

ayp XY 2

Example 32xay yU

a

yp

a

yp

yayp XXY

2

1

Example 4

xy yUypypyp XXY

Yp

x x x

xy

XPXp

YP

Table of Content

38


Jointly, Distributed Random Variables

We are interested in function of several variables.

nnnXXX xXxXxXxxxPn

,,,Pr:,,, 22112121

The Jointly Cumulative Probability Distribution of the random variables X1, X2, …,Xn is defined as:

The Cumulative Probability Distribution of the random variable Xi, can be obtained from

,,,,,,,,,,Pr2121 iXXXniiiX xPXxXXXxP

ni

nXXX xxxPn

,,, 2121

If the Jointly Cumulative Probability Distribution is continuous and differentiable in each of the components than we can define the Joint Probability Density Function as:

n

nXXXn

nXXX xxx

xxxPxxxp n

n

21

2121

,,,:,,, 21

21

ik

ikni nknXXXiX xdxdxdxxxpxp ,,,,,,, 12121

39


Jointly, Distributed Random Variables (continue – 1)

We define:

nnXXXnn xdxdxxxpxxxgxxxgEn

,,,,,,,,:,,, 1212121 21

m

iimm XXXXS

121:

Example: Given the Sum of m Variables

m

ii

m

innXXXi

nnXXXnm

xExdxdxxxpx

xdxdxxxpxxxSE

n

n

11121

12121

,,,,,

,,,,,:

21

21

m

jii

m

ijj

ji

m

ii

m

jii

m

ijj

jjii

m

iii

m

iii

XS

XESEmmm

XXCovXVar

XEXXEXEXEXE

XEXESESESVar

m

iim

m

iim

1 11

1 11

2

2

1

2

,2

2

:1

1

40



Given the joint density function of n random variables X1, X2, …, Xn: nXXX xxxpn

,,, 2121

we want to find the joint density function of n random variables Y1, Y2, …, Yn thatare related to X1, X2, …, Xn, through

nnn

n

n

XXXgY

XXXgY

XXXgY

,,,

,,,

,,,

21

2122

2111

nn

nnn

n

n

n Xd

Xd

Xd

X

g

X

g

X

g

X

g

X

g

X

g

X

g

X

g

X

g

Yd

Yd

Yd

2

1

21

2

2

2

1

2

1

2

1

1

1

2

1

Assuming that the Jacobian

n

nnn

n

n

n

X

g

X

g

X

g

X

g

X

g

X

g

X

g

X

g

X

g

XXXJ

21

2

2

2

1

2

1

2

1

1

1

21 det,,,

is nonzero for each X1, X2, …, Xn, exists a unique solution Y1, Y2, …, Yn

41



Assume that for a given Y1, Y2, …, Yn we can find k solutions (X1, X2, …, Xn)1,… ( X1, X2, …, Xn)k.

k

in

n

nXX

k

innXX

k

iinnnn

nnnnnnYY

ydydxxJ

xxp

xdxdxxpxdxXxxdxXx

ydyYyydyYyydydyyp

n

n

n

11

1

1

111

11111

111111

,,

,,

,,,,Pr

,,Pr:,,

1

1

1

Therefore

k

in

nXX

nYY xxJ

xxpyyp n

n1

1

1

1 ,,

,,,, 1

1

The relation between the differential volume in (Y1, Y2, …, Yn) and the differential volume in (X1, X2, …, Xn) is given by

nnn xdxdxxJydyd 111 ,, 1xd

2xd

3xd

1yd

2yd

3yd

42



Example 1

0,

/exp/exp/exp, 11

11

,

yxyxyxyyxx

yxp YX

X and Y are independent gamma random variables with parameters (α,λ) and(β, λ), respectively, compute the joint densities of U= X + Y and V = X / (X + Y)

VUY

VUX

YXXYXgV

YXYXgU

1/,

,

2

1

UYXYX

X

YX

YJ11

11

22

uvuuv

u

vuuvJ

vuuvp

yxJ

yxpvup YXYX

VU

11,,

, 1/exp

1,

1,

,

,,

11

1

1/exp

vvuu

Therefore

/exp1 uu

upU

11 1

vvvpV

gamma distribution

beta distribution

Table of Content

43


Characteristic Function and Moment-Generating Function

Given a Probability Density Functions pX (x) we define the Characteristic Function or Moment Generating Function

xX

XX

X

discretexxpxj

continuousxxPdxjdxxpxjxjE

exp

expexpexp:

This is in fact the complex conjugate of the Fourier Transfer of the Probability Density Function. This function is always defined since the condition of the existence of a Fourier Transfer :

Given the Characteristic Function we can find the Probability Density Functions pX (x) using the Inverse Fourier Transfer:

10

dxxpdxxp X

xp

X

dxjxp XX exp2

1

is always fulfilled.

44


Properties of Moment-Generating Function

dxxpxxjjd

dX

X

exp

10

dxxpXX

xEjdxxpxjd

dX

X

0

dxxpxxjjd

dX

X 22

2

2

exp

2222

0

2

2

xEjdxxpxjd

dX

X

dxxpxxjjd

dX

nn

n

X

n

exp

nnX

nn

nX

n

xEjdxxpxjd

d

0

dxxpxj XX exp

This is the reason why ΦX (ω) is also called the Moment-Generation Function.

45



nn

nn

Xn

XXXX

xEn

jxE

jxE

j

d

d

nd

d

d

d

!!2!11

!

1

!2

1

22

0

2

0

2

2

00

Develop ΦX (ω) in a Taylor series

dxxpxj XX exp

46


Moment-Generating Function

Binomial Distribution

n

n

k

knkn

k

knk

pjp

ppkjknk

npp

knk

nkjkjE

1exp

1exp!!

!1

!!

!expexp

00

knk ppknk

nnkp

1

!!

!,

Poisson Distribution integer positive

!

exp; k

kkp

k

1expexpexpexpexp

!

expexp

!

expexp

00

jj

k

j

kkj

k

k

k

k

Exponential Distribution

00

0exp;

x

xxxp

jxj

j

dxxjdxxxj

0

00

exp

expexpexp

47


Moment-Generating Function

Normal Distribution

2

2

2exp

2

1,;

x

xp

dx

xjxxdx

xxj

2

222

2

2

2

22exp

2

1

2expexp

2

1

Let write

24222

22222

222222

2

222

jjx

jjx

xjxxjxx

Therefore

1

2

22

2

242

2

2

2exp

2

1

2

2exp

2expexp

2

1

dx

jxjdx

xxj

Central LimitTheorem

dxjxp XX exp2

1Using

dxj

x 22

2

2

2

1exp

2

1

2exp

2

1

j22

2

1exp Poisson

Distribution

48



Moment-Generating Function of the Sum of Independent Random Variables

mm XXXS 21:

Given the Sum of Independent Random Variables

mmXmXX

xpxpxpxxxp

VariablesRandomtIndependen

mmSmS

xdxpxjxdxpxjxdxpxj

dxdxdxxxxpXXXj

m

mmXXXmmS

mm

expexpexp

,,,exp

222111

,,,

212121

21

221121

mm XXXS

21

,m,,ik

kkp i

i

k

iiiiX i

21integer positive!

exp;

mijiX i,,2,11expexp

1expexp 2121 jmXXXS mm

dxxpxj XX exp

Sum of Poisson Independent Random Variables is a Poisson Random Variable with mSm

21

Example 1: Sum of Poisson Independent Random Variablesmm XXXS 21:

49



Example 2: Sum of Normal Independent Random Variables

mm

mm

XXXS

j

jjj

mm

21

22

2

2

1

2

22

2

2

2

2

1

2

1

2

2

1exp

2

1exp

2

1exp

2

1exp

21

Sum of Normal Independent Random Variables is a Normal Random Variable with

dxxpxj XX exp

2

2

2exp

2

1,;

i

i

i

iii

xxp

mm XXXS 21:

iiX j

i 22

2

1exp

mS

mS

m

m

21

22

2

2

1

2

Therefore the Sm probability distribution is:

2

2

2exp

2

1,;

m

m

m

mm

S

S

S

SSm

xSp

Table of Content

50


Existence Theorems

Existence Theorem 1

Given a function G (x) such that

1lim,1,0

xGGGx

2121 0 xxifxGxG ( G (x) is monotonic non-decreasing)

xGxGxG n

xxxx

n

n

lim

We can find an experiment X and a random variable x, defined on X, such thatits distribution function P (x) equals the given function G (x).

xG

x

0

0.1

Proof of Existence Theorem 1

Assume that the outcome of the experiment X is any real number -∞ <x < +∞. We consider as events all intervals, the intersection or union of intervals on thereal axis.

5x1x 2x 3x 4x 6x 7x 8x

To specify the probability of those events we define P (x)=Prob { x ≤ x1}= G (x1).From our definition of G (x) it follows that P (x) is a distribution function.

Existence Theorem 2 Existence Theorem 3

51


Existence Theorems

Existence Theorem 2

If a function F (x,y) is such that

0,,,,

1,,0,,

11122122 yxFyxFyxFyxF

FxFyF

for every x1 < x2, y1 < y2, then two random variables x and y can be found such thatF (x,y) is their joint distribution function.

x

0

y

2x1x

2y

1y


Assume that the outcome of the experiment X is any real number -∞ <x < +∞.Assume that the outcome of the experiment Y is any real number -∞ <y < +∞. We consider as events all intervals, the intersection or union of intervals on thereal axes x and y.

To specify the probability of those events we define P (x,y)=Prob { x ≤ x1, y ≤ y1, }= F (x1,y1).From our definition of F (x,y) it follows that P (x,y) is a joint distribution function.

The proof is similar to that in the Existence Theorem 1

52


Histogram

A histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram.

Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions:

k

iimn

1

A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as:

i

jji mM

1

An ordinary and a cumulative histogram of the same data. The data shown is a random sample of 10,000 points from a normal distribution with a mean of 0 and a standard deviation of 1.

Cumulative Histogram

http://en.wikipedia.org/wiki/File:Cumulative_vs_normal_histogram.svg

53


Law of Large Numbers (History)

The Weak Law of Large Numbers was first proved by the Swiss mathematician James Bernoulli in the fourth part of his work “Ars Conjectandi” published posthumously in 1713.

Jacob Bernoulli1654-1705

The Law of Large Numbers has three versions:• Weak Law of Large Numbers (WLLN)• Strong Law of Large Numbers (SLLN)• Uniform Law of Large Numbers (ULLN)

The French mathematician Siméon Poisson generalizedBernoulli’s theorem around 1800.

Siméon Denis Poisson1781-1840

The next contribution was by Binaymé and later in 1866 by Chebyshev and is known as Binaymé- Chebyshev Inequality.

Pafnuty LvovichChebyshev1821 - 1894

Irénée-Jules Bienaymé1796 - 1878

Weak Law of Large Numbers (WLLN)

54


Law of Large Numbers (History - continue)

Francesco Paolo Cantelli

1875-1966

Félix Edouard Justin ĖmileBorel

1871-1956

Andrey Nikolaevich Kolmogorov1903 - 1987

Table of Content

Borel-Cantelli Lemma

55


Markov’s Inequality

If X is a random variable which takes only nonnegative values, then for any value a>0

a

XEaX Pr

Proof:

Suppose X is continuous with probability density function xpX

aXa

dxxpadxxpadxxpx

dxxpxdxxpxdxxpxXE

aX

aX

aX

aX

a

XX

Pr

00

1856 - 1922

Since a > 0:

a

XEaX Pr

x

0

0.1

xE

xPX

a

aPX

aPaX X 1Pr

xpX

Table of Content

56


Chebyshev’s Inequality

If X is a random variable with mean μ = E (X) and variance σ2= E [(X – μ)2] ,then for any value k > 0

2

2

Prk

kX

Proof:

Since (X – μ)2 is a nonnegative random variable, we can apply Markov’s inequality with a = k2 to obtain

2

2

2

222Pr

kk

XEkX

But since (X – μ)2 ≥ k2 if and only if | X – μ| ≥ k, the above is equivalent to

2

2

Prk

kX

Pafnuty LvovichChebyshev1821 - 1894

Weak Law ofLarge Numbers Take k σ instead of k to obtain

2

1Pr

kkX

xpX

x0

0.1

k k

Bernoulli’sTheorem

Table of Content

57

SOLO Review of ProbabilityBienaymé’s Inequality

If X is a random variable then for any values a, k > 0 use

Proof:

Let prove first that if the random variable y takes only positive values, than for any α>0

0Pr

a

k

aXEkaX

n

n

nn

yEy Pr

i.e.

0Pr

ak

aXEkaX

n

n

ydyypdyypydyypyyE YYY Pr0

Define and choose α = kn > 00: naXy

0Pr

a

k

aXEkaX

n

n

nn

kaXkaX nn

0Pr

a

k

aXEkaX

n

n

Irénée-JulesBienaymé

1796 - 1878


For n = 2 and a = μ we obtain the Chebyshev’s Inequality. For this reasonChebyshev’s Inequality is known also as Bienaymé - Chebyshev’s Inequality


x

0

0.1

n

n

k

aXE

aka ka

n

n

k

aXE

xpX

Table of Content

58

SOLO Review of ProbabilityChernoff’s and Hoeffding’s Bounds

Start from Markov’s Inequalityfor a nonnegative random variable Z and γ > 0

0,0Pr Z

ZEZ

Now let take a random variable Y and define the logarithmicgenerating function

otherwise

YtEifYtEtY ,

exp,expln:

Using the fact that exp (x) is a monotonic increasing function 0expexp ttYtY

and applying Markov’s inequality with tYtZ exp:&exp:

we obtain:

0exp

exp

expexpexpPrPr ttt

t

YtEtYtY Y

Therefore: ttY Yt

expinfPr

0

From this inequality, by using different Y, we obtain the Chernoff’s and Hoeffding’s bounds

To compute ΛY(t) we need to know the distribution function pY (y).

Markov, Chebyshev and Bienaymé inequalities use only Expectation Value information. Let try to obtain a tighter bound when the probability distribution function is known.

0infimum

t

Table of Content

59


Chernoff’s Bound

Let X1, X2,… be independent Bernoulli’s random variables with Pr (Xi=1) = p and Pr (Xi=0) = 1-p

Herman Chernoff1921 -

Use

mm XmXXY :/: 1 Define:

ptp

ptptXtEt iX i

1expln

10exp1explnexpln

ttY Yt

expinfPr

0

pmtpmXm

tEt

m

iiY

1/explnexpln1

tttt Yt

Yt

00supexpinf

pmtpmttt Y 1/expln

0

1/exp

/exp

pmtp

mtptt

td

dY

1

1/*exp

p

pmt

pp

mp

mp

pmtt Y 1

1ln1ln

1

1ln

1

1ln**

61


Chernoff’s Bound (continue – 2)

10|supexp|expsup/Pr1010

1

ppHpHmXX m

1,01

1ln1ln:|

p

ppmpH

2

4

102

00

2

|!2

|!1

||

pm

pHp

ppHp

ppHpHm

From which we arrive to the Chernoff’s Bound

1,02exp/Pr 21 ppmmXX m

Define p:

102exp/Pr 2

1 pmpmXX m

62


Chernoff’s Bound (continue – 3)

Using the Chernoff Bound we obtain

Define now:

102exp1/11Pr 2

1 pmpmXX m

mm XmXXY 1:/11: 1

or since pmXXmXX mm 1/1/11 11

pmXX m /1

102exp/Pr 2

1 pmpmXX m

together with:

102exp/Pr 2

1 pmpmXX m Chernoff’s Bounds

Herman Chernoff1921 -

102exp2/Pr 2

1 pmpmXX m

By summing those two inequalities we obtain:

102exp2//Pr 2

11 pmpmXXpmXX mm or:

Table of Content

63


Hoeffding’s Bound

Suppose that Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and assume E (Y) = 0

Define: 10:

bYa

ab

Yb

We have: babab

aYa

ab

YbY

1

Since exp (.) is a convex function, for any t ≥ 0 we have: 0expexpexp1expexp

ttbab

aYta

ab

YbtbtatY

ta tbtY

taexp

tbexp

tYexp

tbta exp1exp

01 ttbtatY

Let take the expectation of this inequality and define:

10/:0

pabapYE

0exp:expexp1

expexp1

expexpexp

00

tutabptabpp

tbptap

tbab

aYEta

ab

YEbtYE

Let start with a simpler problem:

64


Hoeffding’s Bound (continue – 1)

0exp:expexp1exp tutabptabpptYE

where:

abtu : 00exp1ln: uppupu

00

exp1

1exp1

exp1

exp

ud

d

upp

upp

upp

uppu

ud

d

Differentiating we obtain:

22

2

exp1

exp1

upp

uppu

ud

d

p

puupp

upp

uppu

ud

d

1

*exp0exp1exp1

exp133

3

4

1

11

11

* 22

2

2

2

pp

pp

pp

ppu

ud

du

ud

d

65



0exp:expexp1exp tutabptabpptYE

where:

abtu : 00exp1ln: uppupu

222

4

1008

1''

2

10'0 abtuuu

08/expexp 22 tabttYE

End of the simpler problem:

Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and assume E (Y) = 0

66



Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi fori = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and

Therefore we have

08/expexp8/expexp 22

022

tabttZEabttZE iii

ZE

iii

i

Generalize the result

Use 0

exp

expPr t

t

YtEY

in

08

exp2

8/expexp2

expexpexpexp

expexpexpexp

PrPrPr

1

22

1

22

11

11

111

tabt

t

abtt

ZtEtZtEt

ZtEtZtEt

ZZZ

m

iii

m

iii

m

ii

m

ii

m

ii

m

ii

m

ii

m

ii

m

ii

mm ZZZZ 21

67



Wassily Hoeffding1914 - 1991

Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi fori = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and

Therefore we have

Generalize the result

but

08

infexp28

exp2Pr1

22

01

22

1

tabt

tabt

tZm

iii

t

m

iii

m

ii

mm ZZZZ 21

m

iii

abtm

iii

tabab

tt

m

iii

1

22

/4*

1

22

0/2

8inf

1

2

m

iii

m

ii

abZ

1

2

1

2exp2Pr

We finally obtain Hoeffding’s Bound

Table of Content

68


Convergence Concepts

We say that the sequence Xn, converge to X with probability 1 if the set of outcomes x such that

has the probability 1, or

nforXX n 1Pr

xXxX nn

lim

Convergence Almost Everywhere (a.e.) (or with Probability 1, or Strongly)

Convergence in the Mean-Square sence (m.s.)

We say that the sequence Xn, converge to X in the mean-square sense if

nforXXE n 02

Convergence in Probability (p) (or Stochastic Convergence or Convergence in Measure) We say that the sequence Xn, converge to X in Probability sense if

nforXX n 0Pr

Convergence in Distribution (d) (weak convergence)

We say that the sequence Xn, converge to X in Distribution sense if

nforxpxp XX n

No convergence

Distribution

AlmostEverywhere

(d)

(p)(a.e.)(m.s.)

Mean Square

Probability

implies

implies

impliesor XXea

n

..

or XXsm

n

..

or XXP

n

or XXd

n

Weak Law ofLarge Numbers

Central LimitTheorem

Bernoulli’sTheorem

69


Convergence Concepts (continue – 1)

According to Cauchy Criterion of Convergence the sequence Xn, converge to a unknown limit if

Cauchy Criterion of Convergence

Augustin Louis Cauchy ( 1789-1857)00 manyandnforXX mnn

Convergence Almost Everywhere (a.e.)

01Pr manyandnforXX mnn

Convergence in the Mean-Square sence (m.s.)

002 manyandnforXXE mn

nforXX n 0Pr

Using Chebyshev Inequality

22/Pr XXEXX nn

If Xn →X in the m.s. sense, then the right hand, for a given ε, tends to zero, also the left hand side, i.e.: Convergence in Probability (p)

The opposite is not true, convergence in probability doesn’t imply convergence in m.s.

No convergence

Distribution

AlmostEverywhere

(d)

(p)(a.e.)(m.s.)

Mean Square

Probability

implies

implies

implies

Table of Content

70


The Laws of Large Numbers

The Law of Large Numbers is a fundamental concept in statistics and probability thatdescribes how the average of randomly selected sample of a large population is likelyto be close to the average of the whole population. There are two laws of large numbersthe Weak Law and the Strong Law.

The Weak Law of Large Numbers

The Weak Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequenceof random variables that have the same expected value μ and variance σ2, and areuncorrelated (i.e., the correlation between any two of them is zero), then

nXXX nn /: 1 converges in probability (a weak convergence sense) to μ . We have

nforX n 1Pr converges in probability

The Strong Law of Large Numbers The Strong Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequenceof random variables that have the same expected value μ and variance σ2, and areuncorrelated (i.e., the correlation between any two of them is zero), and E (|X i|) < ∞then ,i.e. converges almost surely to μ. nforX n 1Pr

converges almost surely

71


The Law of Large Numbers

Differences between the Weak Law and the Strong Law

The Weak Law states that, for a specified large n, (X1 + ... + Xn) / n is likely to be near μ. Thus, it leaves open the possibility that | (X1 + ... + Xn) / n − μ | > ε happens an infinite number of times, although it happens at infrequent intervals.

The Strong Law shows that this almost surely will not occur. In particular, it implies that with probability 1, we have for any positive value ε, the inequality | (X1 + ... + Xn) / n − μ | > ε is true only a finite number of times (as opposed to an infinite, but infrequent, number of times).

Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.

72


The Law of Large Numbers

Proof of the Weak Law of Large Numbers

iXE i iXVar i 2 jiXXE ji 0

nnnXEXEXE nn //1

nn

n

n

XEXE

n

XXE

n

XXEXEXEXVar

njiXXE

nnnnn

ji 2

2

2

2

221

0

2

1

2

12

Given

we have:

Using Chebyshev’s inequality on we obtain:nX 2

2 /Pr

n

X n

Using this equation we obtain:

n

XXX nnn 2

2

1Pr1Pr1Pr

As n approaches infinity, the expression approaches 1.

Chebyshev’sinequality

q.e.d. Table of Content

Monte CarloIntegration

Monte CarloIntegration

73


Central Limit Theorem

The first version of this theorem was first postulated by the French-born English mathematician Abraham de Moivre in1733, using the normal distribution to approximate thedistribution of the number of heads resulting from many tossesof a fair coin. This was published in1756 in “The Doctrine of Chance” 3th Ed.

Pierre-Simon Laplace(1749-1827)

Abraham de Moivre(1667-1754)

This finding was forgotten until 1812 when the French mathematician Pierre-Simon Laplace recovered it in his work “Théory Analytique des Probabilités”, in which he approximate the binomial distribution with the normal distribution. This is known as the De Moivre – Laplace Theorem.

De Moivre – Laplace Theorem

The present form of the Central Limit Theorem was given by theRussian mathematician Alexandr Lyapunov in 1901.

Alexandr MikhailovichLyapunov

(1857-1918)

74


Central Limit Theorem (continue – 1)

Let X1, X2, …, Xm be a sequence of independent random variables with the sameprobability distribution function pX (x). Define the statistical mean:

m

XXXX m

m

21

m

XEXEXEXE m

m

21

mm

m

m

XXXEXEXEXVar m

mmmX m

2

2

22

2122

Define also the new random variable

m

XXXXEXY m

X

mm

m

21:

We have:

The probability distribution of Y tends to become gaussian (normal) as m tends to infinity, regardless of the probability distribution of the random variable, as long as the mean μ and the variance σ2 are finite.

75



m

XXXXEXY m

X

mm

m

21:

Proof

The Characteristic Function

m

X

m

im

i

i

mY

m

X

m

jE

m

XjE

m

XXXjEYjE

i

expexp

expexp

1

21

0/lim2

1

!3

/

!2

/

!1

/1

2222

33

1

22

0

mmmm

XE

mjXE

mjXE

mj

m

m

iiiX i

Develop in a Taylor series

miX

76



Proof (continue – 1)

The Characteristic Function

m

XYm

Ei

0/lim2

12222

mmmmm m

X i

2/exp2

1 222

mm

Y mm

Therefore

2/exp2

12/exp

2

1exp

2

1 22 ydyjdyjypm

YY

The probability distribution of Y tends to become gaussian (normal) as m tends to infinity(Convergence in Distribution).

Characteristic Functionof Normal Distribution

ConvergenceConcepts

Table of ContentMonte CarloIntegration

77


Bernoulli Trials – The Binomial Distribution

knkknk ppk

npp

knk

nnkp

11

!!

!,

JacobBernoulli

1654-1705

!

,1

!;;

00 k

k

i

eipkP

k

i

ik

i

pnxE

Probability Density Functions

Cumulative Distribution Function

Mean Value

Variance ppnxVar 1

x

a dtttxa0

1 exp,γ is the incomplete gamma function

Moment Generating Function

npjp 1exp DistributionExamples

78


Bernoulli Trials – The Binomial Distribution (continue – 1)

p – probability of success (r = 1) of a given discrete trial

q – probability of failure (r=0) of the given discrete trial

1 qp

n – number of independent trials

nkp , – probability of k successes in n independent trials (Bernoulli Trials)

knkknk ppk

npp

knk

nnkp

11

!!

!,

Using the binomial theorem we obtain

n

k

knkn ppk

nqp

0

11

therefore the previous distribution is called binomial distribution.

JacobBernoulli

1654-1705

Given a random event r = {0,1}

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k

nkP ,

The probability of k successful trials from n independent trials is given by

The number of k successful trials from n independent trials is given by

!!

!

knk

n

k

n

with probability knk pp 1 to permutations

and Combinations

DistributionExamples

79



knkknk ppk

npp

knk

nnkp

11

!!

!,

pnpppnpp

knk

npn

ppini

npnpp

ini

niXE

nn

k

knkik

n

i

inin

i

ini

11

0

11

1

1

0

11!1!

!1

1!!1

!11

!!

!

Mean Value


nj

n

k

knkjn

k

knkkjXj

pep

ppeknk

npp

knk

neeE

1

1!!

!1

!!

!

00

80



knkknk ppk

npp

knk

nnkp

11

!!

!,

n

i

inin

i

ini

n

i

inin

i

ini

ppini

npp

ini

n

ppini

nipp

ini

niXE

12

00

22

1!!1

!1

!!2

!

1!!1

!1

!!

!

pnpnn

ppknk

npnpp

mnm

npnn

ppini

npnpp

ini

npnn

nn pp

n

k

knk

pp

n

m

mnm

n

i

inin

i

ini

2

1

0

1

1

0

22

1

1

2

22

1

1!1!

!11

!2!

!21

1!!1

!11

!!2

!21

12

ppnpnpnpnnXEXEXVar 11 2222

Variance

81



Let apply Chebyshev’s Inequality:

pnxE Mean Value

Variance ppnXEXExVar 122

2

2

2

222Pr

kk

XEXEkXEX

2

22 1Pr

k

ppnkpnX

we obtain:

An upper bound to this inequality, when p varies (0 ≤ p ≤ 1), can be obtained bytaking the derivative of p (1 – p), equating to zero, and solving for p. The result isp = 0.5.

2

22

4Pr

k

nkpnX

Chebyshev’s Inequality

We can see that when k → ∞ ,i.e. X converges in Probability to Mean Value n p. This is known as Bernoulli’s Theorem.

0Pr 22P

kpnX Convergence in

Probability

82


Generalized Bernoulli Trials

Consider now r mutually exclusive events A1, A2, …, Ar

rjiOAA ji ,,2,1 with their sum equal to certain event S: SAAA r 21

and the probability of occurrence rr pAppAppAp ,,, 2211

Therefore 12121 rr pppApApAp

We want to find the probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar, kr times such that nkkk r 21

!!!

!

21 rkkk

n

The number of possible combinations of k1 events A1, k2 events A2, …,kr events Ar

is and the probability of each combination isrk

rkk ppp 21

21

We obtain the probability of the Generalized Bernoulli Trials as

rkr

kk

rr ppp

kkk

nnkkkp

21

2121

21 !!!

!,,,,

Permu-tations

& combi-nations

Table of Content

83


Poisson Asymptotical Development (Law of Rare Events)

knkknk ppknk

npp

k

nnkp

1

!!

!1,Start with the Binomial Distribution

We assume that n >> 1 and 1/1/ 00 nknkp

0

0

000 111,0 kn

k

k

nn

n en

k

n

kpnkp

nkpk

k

n

kn

nk

nn

nkpk

k

n

kn

knnn

n

k

n

k

k

knnnnkp

k

kk

kk

kk

knk

n

,0!

1

11

11

,0!

1

11

1!

11,

0

1

0

0

0

00

00 exp!

, kk

knkp

k

This is Poisson Asymptotical Development (Law of Rare Events)



84


Poisson Distribution


integer positive

!

exp; k

kkp

k

!

,1

!;;

00 k

k

i

eipkP

k

i

ik

i

exp

0

1:

1

1

0 !exp

!1exp

!

exp

k

kik

i

i

i

i

kii

iXE


Mean Value

Variance 22 XEXExVar


1expexpexpexp

!

expexp

!

expexpexp

00

jje

m

j

m

mjkjE

m

m

m

m

2

exp

1

1

exp

2

2

1

1

0

22

!1!2exp

!1exp

!

exp

i

i

i

i

i

i

i

i

iii

i

i

iXE

85



Moment Generating Function 1expexp j

Approximation to the Gaussian Distribution

sin2/sin2sin1cos1exp 2 jjj

For λ sufficient large Φ (ω) is negligible for all but very small values of ω, in which case sin&2/2/sin 22

jj2

exp1expexp2

j22

2

1exp

For a normal distribution with mean μ and variance σ2 we found the Moment Generating Function:

Therefore the Poisson Distribution can be approximated by a Gaussian Distribution with mean μ = λ and variance σ2 = λ

2exp

2

1~

!

exp;

2k

kkp

k

86




integer positive

!

exp; k

kkp

k

!

,1

!;;

00 k

k

i

eipkP

k

i

ik

i

xE



Mean Value

Variance xVar

x

a dtttxa0



1expexp j

Table of Content

87


Normal (Gaussian) Distribution

Karl Friederich Gauss1777-1855

2

2exp

,;2

2

x

xp

x

duu

xP2

2

2exp

2

1,;

xE

xVar

2exp

exp2

exp2

1

exp

22

2

2

j

duuju

xjE



Mean Value

Variance



Table of Content

88


De Moivre-Laplace Asymptotical Development

knkknk ppknk

npp

k

nnkp

1

!!

!1,Start with the Binomial Distribution

Use Stirling asymptotical approximation nnnnn exp2!

knk

knk

knk

n

kn

qn

k

pn

knk

n

qpknknknkkk

nnnnkp

2

exp2exp2

exp2,

Define pnkkkpnk k 1:&1: 00

qkqk

pnk

qk

pkn

qpn

knkqp

knk

n

nkp

nkp

k

knk

knk

11

11

1

!

!1!1

!!

!

,1

,11

nkpnkppnk

nkpnkppnk

,1,1

,1,1

89


De Moivre-Laplace Asymptotical Development (continue – 1)

knk

kn

qn

k

pn

knk

nnkp

2,

or

pnkpnkn

k

n

11

0 &

qpn:

nkk

qn

kn

pn

k

qpn

knknkp

2

1

222

1,

kk pn

k

qn

kkk

pnqnpnqnnkp

1111

2

1,

2

1

2

1

2

2

1

2

1

211

2

1kk qn

k

pn

k

qnpn

kk

k

k

qn

k

pn

k

pn

qn qnpn

11

2

12

2

1

2

1

90


De Moivre-Laplace Asymptotical Development (continue – 2)

kk qn

k

pn

k

qnpnnkp

11

2

1,

2

2

22

22

2

22

22

1ln

2

2

11

2

22

1ln1ln,2ln

2

2

k

qpnk

kkk

kkk

xxx

kk

kk

qnpn

qnqnqn

pnpnpn

qnqn

pnpnnkp

from which

2

2

2 2exp

2

1,

knkp



Abraham de Moivre(1667-1754)

This result was first published by De Moivre in1756 in “The Doctrine of Chance” 3th Ed. and reviewed by Laplace, “Théorie Analytiques de Probabilités”, 1820

CentralLimit

Theorem

91


De Moivre-Laplace Asymptotical Development for Generalized Bernoulli Trials

Consider the r mutually exclusive events A1, A2, …, Ar

rjiOAA ji ,,2,1 with their sum equal to certain event S: SAAA r 21

and the probability of occurrence rr pAppAppAp ,,, 2211

Therefore 12121 rr pppApApAp

The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar, kr times such that nkkk r 21

rkr

kk

rr ppp

kkk

nnkkkp

21

2121

21 !!!

!,,,,

For n goes to infinity and we havenpnknpn iii

rr

r

rr

kr

kk

r ppn

pnpnk

pnpnk

pppkkk

nr

11

2

1

211

2121 2

21

exp

!!!

!21

92


De Moivre-Laplace Asymptotical Development for Generalized PoissonTrials

Consider the r-1 mutually exclusive events A1, A2, …, Ar-1

rjiOAA ji ,,2,1 with small probability of occurrence 112211 ,,, rr pAppAppAp

such that 11:121121 rrr ppppApApAp

The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar-1, kr-1 times such that rr knkkk :121

rkr

kk

rr ppp

kkk

nnkkkp

21

2121

21 !!!

!,,,,

For n goes to infinity

!

exp

!

exp

!!!

!

1

11

1

1121

21

11

21

r

rk

rk

kr

kk

r k

pnpn

k

pnpnppp

kkk

n r

r

Table of Content

93


Laplacian Distribution


b

x

bbxp

exp

2

1,;

x

dub

u

bbxP

exp

2

1,;

xE

22bxVar

221

exp

expexp2

1

exp

b

j

duujb

u

b

xjEX



Mean Value

Variance



Table of Content

94


Gama Distribution

00

0/exp

,;1

x

xxk

x

kxpk

k

kxE

2kxVar

k

X

j

xjE

1

exp



Mean Value

Variance Moment Generating Function

00

0/,

,;

x

xk

xk

kxP


0

1 exp dttta a

x

a dtttxa0



Table of Content

95


Beta Distribution

11

1

0

11

11

11

1,;

xx

duuu

xxxp

x

duuuxP0

11 1,;

xE

12

xVar

1

1

0 !1

exp

k

kk

r

X

k

j

r

r

xjE



Mean Value

Variance



0

1 exp dttta a


Beta DistributionExample Table of Content

96


Cauchy Distribution

Augustin Louis Cauchy ( 1789-1857)

22

02

0

0

1

1

1,;

xxxx

xxp

2

1arctan

1,; 0

0

xx

xxP



Mean Value not defined

Variance not defined

Moment Generating Function not defined


97

SOLO Review of ProbabilityCauchy Distribution

elsewere

p

0

2

111

1

Example of Cauchy Distribution DerivationParticle

Trajectory

O

a

y

x

Assume a particle is leaving the origin, moving with constant velocity toward a wall situated at a distance a from the origin. The angle θ, between particle velocity vector and Ox axis, is a random variable uniform distributed between – θ1 and + θ1. Find the probability distribution function of y, the distance from Ox axis at which the particle hits the wall. tanay y

2/1

12/

elsewere

ya

a

elsewere

a

add

pypY

0

2/

0

tan1

2/1

tan

11221

1121

Therefore we obtainFunctions of

One Random Variable

2/1 12/ 2/1 12/

p ypY

Table of Content

98


Exponential Distribution

00

0exp;

x

xxxp

1expexp

exp

00exp

0

dxxxx

dxxxxE

xu

dxxdv

2

22 1

xExExVar

1

0

0

1exp

expexpexp

jxj

j

dxxxjxjEX



Mean Value

Variance


00

0exp1exp;

x

xxdxxxP

x

2

0

2

222 2

d

djxE X

Distributionsexamples

Table of Content

99


Chi-square Distribution

00

02/exp2/

2/1;

2/2

2/

x

xxxkkxp

k

k

kxE

kxVar 2

2/21

expk

X

j

xjE



Mean Value


00

02/

2/,2/

;

x

xk

xk

kxP


0

1 exp dttta a

x

a dtttxa0



100


Derivation of Chi and Chi-square Distributions

Given k normal random independent variables X1, X2,…,Xk with zero men values and same variance σ2, their joint density is given by

2

22

12/

12/1

2

2

1 2exp

2

1

2

2exp

,,1

k

kk

k

i

i

normal

tindependenkXX

xx

x

xxpk

Define

Chi-square 0:: 22

1

2 kk xxy

Chi 0: 22

1 kk xx

kkkkkk dxxdp

k 22

1Pr

The region in χk space, where pΧk (χk) is constant, is a hyper-shell of a volume

(A to be defined)

dAVd k 1

Vd

kk

kkkkkkkk dAdxxdpk

1

2

2

2/

22

1 2exp

2

1Pr

2

2

2/

1

2exp

2

k

kk

k

k

Ap

k

Compute

1x

2x

3x

d ddV 24

101


Derivation of Chi and Chi-square Distributions (continue – 1)

kk

kk

k

k UA

pk

2

2

2/

1

2exp

2

Chi-square 0: 22

1

2 kk xxy

00

02

exp22

1 2

2/1

2/

0

2

22

y

yy

yy

A

ypyp

d

ydypp

k

kk

y

k

Yk kkk

A is determined from the condition 1

dyypY

2/

212/

222exp

22

2/

2/20

2

2

2

22/ kAk

Ayd

yyAdyyp

k

k

k

kY

yUyy

kkyp

kk

Y

2

2/2

2

2/

2exp

2/

2/1,;


0

1 exp dttta a

kk

k

k

k

k

k Uk

pk

2

212/2

2exp

2/

2/1

00

01:

a

aaU

Function ofOne Random

Variable

102



Table of Content

Chi-square 0: 22

1

2 kk xxy

Mean Value 2 2 2 21k kE E x E x k

4

2 42 2 4

0

1, ,& 3

th

i

i i

Moment of aGauss Distribution

x i i i i

x E x

i kE x x E x x

2

4

2 4

22 22 2 2 2 2 4 2 2 4

1

2 2 2 4 4 2 2 2 4

1 1 1 1 13

2 2 4 43 2

k

k

k k ii

k k k k k

i j i i ji j i i j

i j

k k

E k E k E x k

E x x k E x E x x k

k k k k k

k

kMain

DiagonalkVariance 2

22 2 2 42k

kE k k

where xi

are gaussianwith

Gauss’ Distribution

103



Tail probabilities of the chi-square and normal densities.

The Table presents the points on the chi-square distribution for a given upper tail probability

xyQ Pr

where y = χn2 and n is the number of degrees

of freedom. This tabulated function is also known as the complementary distribution.

An alternative way of writing the previousequation is: QxyQ n 1Pr1 2which indicates that at the left of the point xthe probability mass is 1 – Q. This is 100 (1 – Q) percentile point.

Examples

1. The 95 % probability region for χ22 variable

can be taken at the one-sided probabilityregion (cutting off the 5% upper tail): 99.5,095.0,0 2

2

5.99

2. Or the two-sided probability region (cutting off both 2.5% tails): 38.7,05.0975.0,025.0 22

22

0.51

0.975 0.0250.05

7.38

3. For χ1002 variable, the two-sided 95% probability region (cutting off both 2.5% tails) is:

130,74975.0,025.0 2100

2100

74130

104



Note the skewedness of the chi-square distribution: the above two-sided regions arenot symmetric about the corresponding means

nE n 2

Tail probabilities of the chi-square and normal densities.

For degrees of freedom above 100, thefollowing approximation of the points on thechi-square distribution can be used:

22 1212

11 nQQn G

where G ( ) is given in the last line of the Tableand shows the point x on the standard (zeromean and unity variance) Gaussian distributionfor the same tail probabilities.In the case Pr { y } = N (y; 0,1) and withQ = Pr { y>x }, we have x (1-Q) :=G (1-Q)

5.990.51

0.975 0.0250.05

7.38

Table of Content

105

SOLO Review of ProbabilityStudent’s t-Distribution

2/12 /12/

2/1;

x

xp

1

10

undefinedxE

otherwisexVar

22/



Mean Value

Variance

Moment Generating Function not defined

0

2

!2

3

2

1

2

1

2/

2/1

2

1;

n

n

n

nn

n

x

xxP


0

1 exp dttta a

121: naaaaa n

It get his name after W.S. Gosset that wrote under pseudonym “Student”

William Sealey Gosset

1876 - 1937


Table of Content

106


Uniform Distribution (Continuous)

bxxa

bxaabbaxp

0

1,;

2

baxE

12

2abxVar

abj

ajbj

xjE

expexp

exp



Mean Value

Variance


bx

bxaab

ax

xa

baxP

1

0

,;


Moments

Table of Content

107



2

2

2

2exp

;

xx

xp

2

xE

2

2

4 xVar



Mean Value

Variance


2

2

2exp1;

x

xP

jerfi

222/exp1 22

John William Strutt

Lord Rayleigh

(1842-1919)


Moments

Rayleigh Distribution is the chi-distribution with k=2

kk

k

k

k

k

k Uk

pk

2

212/2

2exp

2/

2/1

108



Given X and Y, two independent gaussian random variables, with zero means and thesame variances σ2

Example of Rayleigh Distribution

2

22

2 2exp

2

1,

yx

yxpXY

find the distributions of R and Θ given by: XYYXR /tan& 122

dprdrpdrdrr

ydxdyxydxdyxpdrdrp

r

XYR

22

2

22

22

22exp

22exp,,

where:

20

2

1p

02

exp2

2

2

r

rrrpr

Uniform Distribution


Solution

Table of Content

x

y

r

109


Rice Distribution

202

2

22

2exp

,;

vx

I

vxx

vxp

2

xE

2

2

4 xVar



Mean Value


2

2

2exp1;

x

xP

jerfi

222/exp1 22

Stuart Arthur Rice1889 - 1969


where:

2

0220 '

2

'cosexp

2

1d

vxvxI

110


Rice Distribution

The Rice Distribution applies to the statistics of the envelope of the output of a bandpassfilter consisting of signal plus noise.

Example of Rice Distribution

tAtntAtn

ttnttntAtnts

SC

SC

00

000

sinsincoscos

sincoscos

X = nC (t) and Y = nS (t) are gaussian random variables, with zero mean and the samevariances σ2 and φ is the unknown but constant signal phase.

Define the output envelope R and phase Θ:

cos/sintan

sincos1

22

AtnAtn

AtnAtnR

CS

SC

222

22

22

2

2

2

22

cosexp

2exp

22

sinexp

2

cosexp,,

drdrrAAr

ydxdAyAxydxdyxpdrdrp XYR

Solution

2

0222

222

0 2

cosexp

22exp, d

rArArdrprp RR

111


Rice Distribution

Example of Rice Distribution (continue – 1)

2

022

22

2

2

0

'2

'cosexp

2

1

2exp, d

rAArrdrprp RR

where:

2

0220 '

2

'cosexp

2

1d

rAArI

is the zero-order modified Bessel function of the first kind

202

22

2 2exp,;

Ar

IArr

ArpR Rice Distribution

Since I0 (0) = 1, if in the Rice Distribution we take A = 0 we obtain:


2

2

2 2exp,0;

rr

ArpR

Table of Content

112


Weibull Distribution

00

0,,exp,,;

1

x

xxx

xp

x

dxxpxPx

exp1,,;,,;

1

1xE


0

1 exp dttta a

Ernst HjalmarWaloddi Weibull

1887 - 1979



Mean Value

Variance 22 21 xExVar


Table of Content

113

KINETIC THEORY OF GASESSOLOMAXWELL’S VELOCITY DISTRIBUTION

IN 1859 MAXWELL PROPOSED THE FOLLOWING MODEL:

ASSUME THAT THE VELOCITY COMPONENTS OF N MOLECULES, ENCLOSED IN A CUBE WITH SIDE l, ALONG EACH OF THE THREE COORDINATE AXES ARE INDEPENDENTLY AND IDENTICALLY DISTRIBUTED ACCORDING TO THE DENSITY f0(α) = f0(-α), I.E.,

JAMES CLERKMAXWELL

(1831 – 1879)

zyx

zyxzzyyxx

vdvdvdvvvvBA

vdvdvdvvfvvfvvfvdvf

00

0000003

0

exp

f (Vi) d Vi = THE PROBABILITY THAT THE i VELOCITY

COMPONENTS IS BETWEEN vi AND vi + d vi ; i=x,y,zMAXWELL ASSUMED THAT THE DISTRIBUTION DEPENDS ONLY ON THE MAGNITUDE OF THE VELOCITY.

rafael

(1) Maxwell J.C., "Illustration of the dynamical theory of gases", Phil. Mag. Ser. 4,19, 19;20, 21,231860(2) Maxwell J.C., "On the dynamical theory of gases", Phil. Trans.Roy. Soc., (London), 157, 49,1867(3) Maxwell J.C., "On the dynamical evidence of the molecular constitution of bodies", Nature 11, 53, 1874

114

KINETIC THEORY OF GASESSOLOMAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)

SINCE THE DEFINITION OF THE TOTAL NUMBER OF PARTICLES N IS:

tvrfvdrdN ,,33

WE HAVE IN EQUILIBRIUM

2

3

222

2220

3

expexpexp

exp

BAdvvBdvvBdvvBA

dvdvdvvvvBAvfvdV

N

zzyyxx

zyxxxx

WHERE V IS THE VOLUME OF THE CONTAINER rdV 3

IT FOLLOWS THAT B > 0 AND

V

NBA

2

3

LET FIND THE CONSTANTS A, B AND IN 200 exp vvBAvf

0v

115


LET FIND THE CONSTANTS A, B AND IN 200 exp vvBAvf

0v

THE AVERAGE VELOCITY IS GIVEN BY:

003

003

03

03

exp

exp

vvvBvvvdN

VA

vvvvBvvdN

VA

vfvd

vfvvdv

THE AVERAGE KINEMATIC ENERGY OF THE MOLECULES ε WHEN IS

00

v

B

mvBvvd

N

VAm

vfvd

vfvmvd

4

3exp

221

223

03

023

WE FOUND ALSO THAT FOR A MONOATOMIC GAS Tk2

3

Tk

mmB

24

3

V

N

Tk

m

V

NBA

2

3

2

3

2

THEREFORE

116


MAXWELL VELOCITY DISTRIBUTION BECOMES

vv

Tk

m

Tk

m

V

Nvf

2

exp2

2

3

0

zyxzyx

zyxzyx

vdvdvdvvvTk

m

Tk

m

V

N

vdvdvdvfvfvfvdvf

222

2/3

0

2exp

2

3

OR

117


vv

Tk

m

Tk

m

V

Nvf

2

exp2

2

3

0

M

kTvrms

3

M

kTv

8

VkT

M2/1

2

vfN

V

M

Tk 0

2/32

Most probablespeed Main speed

Root Meansquared speed

Tk

vM

Tk

M

V

Nvf

2exp

2

22

3

0

M

kTvmp

2

Table of Content

Maxwell’s Distribution is the chi-distribution with k=3

kk

k

k

k

k

k Uk

pk

2

212/2

2exp

2/

2/1

118

KINETIC THEORY OF GASESSOLO

MOLECULAR MODELS

BOLTZMANN STATISTICS

• DISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF

PARTICLES PER QUANTUM STATE.

BOSE-EINSTEIN STATISTICS

• INDISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF


FERMI-DIRAC STATISTICS

• INDISTINGUISHABLE PARTICLES• ON PARTICLE PER QUANTUM STATE.

LUDWIG BOLTZMANN

SATYENDRANATH N. BOSE ALBERT EINSTEIN

ENRICO FERMI PAUL A.M. DIRAC

j j

Nj

N

gNw

j

!!

j jjj

jj

Ng

Ngw

!!1

!1

j jjjj

j

NNg

gw

!!

!

j

jNN j

jj NE '

NUMBER OF MICROSTATESFOR A GIVEN MACROSTATE



Table of Content

119


MOLECULAR MODELS


• DISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF


LUDWIG BOLTZMANN

j j

Nj

Boltz N

gNw

j

!!NUMBER OF MICROSTATES

FOR A GIVEN MACROSTATE

MASS (N) FIXED

VOLUME (V) FIXED

ENERGY (E) FIXED

j

jNN

j

jj NE '

NUMBER OF WAYS N DISTINGUISHABLE PARTICLES CAN BE DIVIDED IN GROUPS WITH N1, N2,…,Nj,…PARTICLES IS

j

jNN

j

jN

N

!

!

NUMBER OF WAYS Nj PARTICLES CAN BE PLASED IN THE gj STATES IS jNjg

A MACROSTATE IS DEFINED BY- QUANTUM STATES g1,g2,…,gj

AT THE ENERGY LEVELS

- NUMBER OF PARTICLES N1,N2,…NjIN STATES g1,g2, …,gj

j',,',' 21

120

KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE


j j

Nj

Boltz N

gNw

j

!!

USING STIRLING FORMULA

0'' EdNdNEj

jjj

jj

aaaa ln!ln

j

jjjjj

STIRLING

jjjj NNNgNNNNNgNNw lnlnln!lnln!lnln

0lnlnln j

jjjjj NdNNdgNdwd

0 NdNdNNj

jj

j

TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST COMPUTE THE DIFFERENTIAL

CONSTRAINTED BY:

121


BOLTZMANN STATISTICS (CONTINUE)

j j

Nj

Boltz N

gNw

j

!!

0' j

jj Nd

0lnln

jj

j

j Ndg

Nwd

0j

jNd

WE OBTAIN

LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS

0'*

ln0'ln

j

j

j

jjj

j

j

g

NNd

g

N

,

TO OBTAIN

OR jeegN jBoltzj'*

BOLTZMANNMOST PROBABLE MACROSTATE

Table of Content

122


MOLECULAR MODELS


• INDISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF



MASS (N) FIXED

VOLUME (V) FIXED

ENERGY (E) FIXED

j

jNN

j

jj NE '

j

jNN

NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED IN THE gj STATES IS

!!1

!1

jj

jj

Ng

Ng

A MACROSTATE IS DEFINED BY- QUANTUM STATES g1,g2,…,gj

AT THE ENERGY LEVELS

- NUMBER OF PARTICLES N1,N2,…NjIN STATES g1,g2, …,gj

SATYENDRANATH N. BOSE(1894-1974)

ALBERT EINSTEIN(1879-1955)

j',,',' 21

j jj

jjEB Ng

Ngw

!!1

!1

123


BOSE-EINSTEIN STATISTICS(CONTINUE)

USING STIRLING FORMULA aaaa ln!ln

j j

jjjjj

jjjjjjjjjjjjj

STIRLING

jjjjj

g

NgNgN

NNNggggNgNgNNggNw

1ln/1ln

lnlnln!ln!ln!ln

01ln1

1

11lnln

2

jj

j

j

jj

j

j

jj

j

j

j

j

jj

j NdN

gNd

g

N

gg

N

g

N

g

NN

gwd


j jj

jj

j jj

jjEB Ng

Ng

Ng

Ngw

!!

!

!!1

!1

124


BOSE-EINSTEIN STATISTICS (CONTINUE)

0' j

jj Nd0j

jNd

WE OBTAIN


0'*

1ln0'1ln

j

j

j

jjj

j

j

N

gNd

N

g

,

TO OBTAIN

OR

1* '

jee

gN j

EBj BOSE-EINSTEIN

MOST PROBABLE MACROSTATE

j jj

jj

j jj

jjEB Ng

Ng

Ng

Ngw

!!

!

!!1

!1

01lnln

jj

j

j NdN

gwd

Table of Content

125


MOLECULAR MODELS



MASS (N) FIXED

VOLUME (V) FIXED

ENERGY (E) FIXED

j

jNN

j

jj NE '

j

jNN

NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED IN THE gj STATES IS !!

!

jjj

j

NNg

g

j jjj

jDF NNg

gw

!!

!

• INDISTINGUISHABLE PARTICLES• ON PARTICLE PER QUANTUM STATE.

ENRICO FERMI(1901-1954)

PAUL A.M. DIRAC(1902-1984)

A MACROSTATE IS DEFINED BY- QUANTUM STATES g1,g2,…,gj AT THE

ENERGY LEVELS

- NUMBER OF PARTICLES N1,N2,…Nj AT THE ENERGY LEVELSIN STATES g1,g2, …,gj

j',,',' 21

j',,',' 21

126


FERMI-DIRAC STATISTICS(CONTINUE)

USING STIRLING FORMULA aaaa ln!ln

jjjjjjjjj

jjjjjjjjjjjjj

STIRLING

jjjjj

NNNgNggg

NNNNgNgNggggNNggw

lnlnln

lnlnln!ln!ln!lnln

0lnlnlnln

jj

j

jj

jjjjjjjj Nd

N

NgNdNNdNdNgNdwd


j jjj

jDF NNg

gw

!!

!

127


FERMI-DIRAC STATISTICS (CONTINUE)

0' j

jj Nd0j

jNd

WE OBTAIN


0'*

*ln0'ln

j

j

jj

jjj

j

jj

N

NgNd

N

Ng

,

TO OBTAIN

OR

1* '

jee

gN j

DFj FERMI-DIRAC

MOST PROBABLE MACROSTATE

0lnln

jj

j

jj NdN

Ngwd

j jjj

jDF NNg

gw

!!

!

128



OR

j jjj

jDF NNg

gw

!!

!

j jj

jjEB Ng

Ngw

!!1

!1


j j

Nj

Boltz N

gNw

j

!!

BOLTZMANNSTATISTICS

FOR GASES AT LOW PRESSURES OR HIGH TEMPERATURE THE NUMBEROF QUANTUM STATES gj AVAILABLE AT ANY LEVEL IS MUCH LARGERTHAN THE NUMBER OF PARTICLES IN THAT LEVEL Nj.

jj Ng

j

jjN

j

Ng

jjjjjj

jj gNggggg

Ng

121

!1

!1

jjj

Nj

Ng

jjjjjj

j gNgggNg

g

11!

!

j j

NjBoltz

Ng

DF

Ng

EB N

g

N

www

jjjjj

!!AND j

jjjj

eegNNN jBoltzj

Ng

DFj

Ng

EBj'***

129


j j

NjBoltz

Ng

DF

Ng

EB N

g

N

www

jjjjj

!!AND j

jjjj

eegNNN jBoltzj

Ng

DFj

Ng

EBj'***

DIVIDING THE VALUE OF w FOR BOLTZMANN STATISTICS, WHICHASSUMED DISTINGUISHABLE PARTICLES, BY N! HAS THE EFFECT OFDISCOUNTING THE DISTINGUISHABILITY OF THE N PARTICLES.

Table of Content

130

SOLO Review of ProbabilityMonte Carlo Method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used when simulating physical and mathematical systems. Because of their reliance on repeated computation and random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is infeasible or impossible to compute an exact result with a deterministic algorithm.

The term Monte Carlo method was coined in the 1940s by physicists Stanislaw Ulam, Enrico Fermi, John von Neumann, and Nicholas Metropolis, working on nuclear weapon projects in the Los Alamos National Laboratory (reference to the Monte Carlo Casino in Monaco where Ulam's uncle would borrow money to gamble)

Stanislaw Ulam1909 - 1984

Enrico - Fermi1901 - 1954

John von Neumann1903 - 1957

Monte Carlo Casino

Nicholas Constantine Metropolis

(1915 –1999)

131


Monte Carlo Approximation

Monte Carlo runs, generate a set of random samples that approximate the distribution p (x). So, with P samples, expectations with respect to the filtering distribution are approximated by

P

L

LxfP

dxxpxf1

1

and , in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some degree of approximation.

P

L

LxP

dxxpxxE1

1

1

P

L

nLnnn x

PdxxpxxE

1111

1

Table of Content

x(L) are generated (draw) samples from distribution p (x) xpx L ~

132

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (Unknown Statistics)

jimxExE ji ,

DefineEstimation of thePopulation mean

k

iik x

km

1

1:ˆ

A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,and sample variance, , as estimates of the population mean, m, and variance, σ2.

2ˆ kkm̂

2

1

2

1

2222

22222

1 112

1

2

2

11

2

1

2

111

1

11

121

112

1

ˆˆ21

ˆ1

k

k

kk

mkmkkk

mmkk

mk

xxk

Exk

xExEk

mxmxEk

mxk

E

k

i

k

i

k

i

k

ll

k

jj

k

jjii

k

k

iik

k

ii

k

iki

jimxExE ji ,2222

mxEk

mEk

iik

1

1ˆ

jimxExExxE ji

tindependenxx

ji

ji

,2,

Compute

Biased

k elements

k elements

k elementsk2-k elements

Unbiased

Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.

133

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 1)

jimxExE ji ,

DefineEstimation of thePopulation mean

k

iik x

km

1

1:ˆ


2ˆ kkm̂

2

1

2 1ˆ

1 k

kmx

kE

k

iki

jimxExE ji ,2222

mxEk

mEk

iik

1

1ˆ

jimxExExxE ji

tindependenxx

ji

ji

,2,

Biased

Unbiased

Therefore, the unbiased estimation of the sample variance of the population is defined as:

k

ikik mx

k 1

22 ˆ1

1:̂ since 2

1

22 ˆ1

1:ˆ

k

ikik mx

kEE

Unbiased


134



2ˆ kkm̂

k

iik x

km

1

1:ˆ

ix

m

k1 2 3

k

ikik mx

k 1

22 ˆ1

1:̂

mxEk

mEk

iik

1

1ˆ

2

1

22 ˆ1

1:ˆ

k

ikik mx

kEE


135


mxEk

mEk

iik

1

1ˆ 2

1

22 ˆ1

1:ˆ

k

ikik mx

kEE

We found:

Let Compute:

k

mxEmxEmxEk

mxmxEmxEk

mxk

Emxk

EmmE

k

i

k

ijj

ji

k

ii

k

i

k

ijj

ji

k

ii

k

ii

k

iikmk

2

1 100

1

2

2

1 11

2

2

2

1

2

1

22ˆ

2

1

1

11ˆ:

k

mmE kmk

222

ˆ ˆ:

136


Let Compute:

2

22

11

2

2

2

1

22

2

2

1

22

2

1

22222

ˆ

ˆ11

ˆ2

1

1

ˆˆ21

1

ˆ1

1ˆ

1

1ˆ:2

k

k

ii

kk

ii

k

ikkii

k

iki

k

ikik

mmk

kmx

k

mmmx

kE

mmmmmxmxk

E

mmmxk

Emxk

EEk

k

k

k

ii

kk

ii

k

k

k

ii

k

k

ii

k

kk

ii

k

k

k

k

ii

k

kk

i

k

ijj

ji

k

k

ii

mmEk

kmxE

k

mmEmxE

k

mmEk

mxEk

mxEk

mmEkmxE

k

mmE

mmEk

kmxE

k

mmEmxEmxEmxE

kk

/

22

10

2

0

10

2

3

1

22

1

2

2

/

2

1

3

2

0

44

2

2

1

2

2

/

2

1 1

22

1

4

2

2

ˆ

2

222

22

22

4

2

ˆ1

2

1

ˆ4

1

ˆ4

1

2

1

ˆ2

1

ˆ4

ˆ11

ˆ4

1

1

Since (xi – m), (xj - m) and are all independent for i ≠ j: kmm ˆ

137


Since (xi – m), (xj - m) and are all independent for i ≠ j: kmm ˆ

4

2

24

224

44

2

4

44

2

2

2

4

2

4

242

ˆ

ˆ11

7

11

2

1

2

1

2

ˆ11

4

1

1

12

k

k

mmEk

k

k

k

k

k

kk

k

k

k

mmEk

k

kk

kk

k

kk

kk

442

ˆ 2

4

4 : mxE i

k

k

k

ii

kk

ii

k

k

k

ii

k

k

ii

k

kk

ii

k

k

k

k

ii

k

kk

i

k

ijj

ji

k

k

ii

mmEk

kmxE

k

mmEmxE

k

mmEk

mxEk

mxEk

mmEkmxE

k

mmE

mmEk

kmxE

k

mmEmxEmxEmxE

kk

/

22

10

2

0

10

2

3

1

22

1

2

2

/

2

1

3

2

0

44

2

2

1

2

2

/

2

1 1

22

1

4

2

2

ˆ

2

222

22

22

4

2

ˆ1

2

1

ˆ4

1

ˆ4

1

2

1

ˆ2

1

ˆ4

ˆ11

ˆ4

1

1

138


mxEk

mEk

iik

1

1ˆ

2

1

22 ˆ1

1:ˆ

k

ikik mx

kEE

We found:

k

mmE kmk

222

ˆ ˆ:

k

mxk

EEk

ikik

k

44

2

2

1

22222

ˆˆ

1

1ˆ:2

44 : mxE i

Kurtosis of random variable xiDefine

44:

k

mxk

EEk

ikik

k

42

2

1

22222

ˆ

1ˆ

1

1ˆ:2


2ˆ

2k

2

kˆ-0Prob n

For high values of k, according to the Central Limit Theorem the estimations of mean and of variance are approximately Gaussian Random Variables.

km̂2ˆ k

We want to find a region around that will contain σ2 with a predefined probabilityφ as function of the number of iterations k.

2ˆ k

Since are approximately Gaussian Random Variables nσ is given by solving:

2ˆ k

n

n

d2

2

1exp

2

1

nσ φ

1.000 0.6827

1.645 0.9000

1.960 0.9500

2.576 0.9900

Cumulative Probability within nσ

Standard Deviation of the Mean for aGaussian Random Variable

22k

22 1ˆ-

1 k

nk

n

22k

2 11

ˆ-11

kn

kn

42222 1,0;ˆ~ˆ&,0;ˆ~ˆ kkkk kmmmk NN


2ˆ

2k

2

kˆ-0Prob n

22k

22 1ˆ-

1 k

nk

n

22k

2 11

ˆ-11

kn

kn

22

ˆ

12

k

k

22k

2 11ˆ

11

kn

kn

k

nk

n1

1

ˆ1

1

22

k

2

kn

kn

11

:ˆ:1

1

k


99% Confidence95% Confidence

kn

11

1

100 200 300 400 500Number of Monte Carlo Trials, k

kn

11

1

ˆ/,LimitLower

ˆ/,LimitUpper

256k

576.2n 96.1n

440k

95% Confidence

99% Confidence

0.7

0.9

1.0

1.1

Estim

ated

Sta

ndar

d D

evia

tion

Conf

iden

ce In

terv

als M

ultip

liers

1.5

Typical Confidence Interval Multipliers for the Esstimated Standard Deviation of a Gaussian Random Variable (λ = 3)

576.2n 96.1n


kn

11

1

Degree of Confidence = 95 % (nσ = 1.96)

k = 256 Monte Carlo trials performed

kn

11

1

KURTOSIS, λ5 10 15 20

0.8

1.0

1.1

1.2

1.3

1.4

Effect of Kurtosis on Confidence Interval Limits

Est

ima

ted S

tan

da

rd D

evia

tion

Co

nfi

den

ce I

nte

rvals

Mu

ltip

liers

143


kn

kn kk 1ˆ

1

:&1ˆ

1

:

00

Monte-Carlo Procedure

Choose the Confidence Level φ and find the corresponding nσ

using the normal (Gaussian) distribution.

nσ φ

1.000 0.6827

1.645 0.9000

1.960 0.9500

2.576 0.9900

1

Run a few sample k0 > 20 and estimate λ according to2

2

1

2

0

1

4

0

0

0

0

0

0

ˆ1

ˆ1

:ˆ

k

iki

k

iki

k

mxk

mxk

0

010

1:ˆ

k

iik x

km

3 Compute and as function of k

4 Find k for which

2ˆ

2k

2

kˆ-0Prob n

5 Run k-k0 simulations

144

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue – 11)

Monte-Carlo Procedure

Choose the Confidence Level φ = 95% that gives the corresponding nσ=1.96.

nσ φ

1.000 0.6827

1.645 0.9000

1.960 0.9500

2.576 0.9900

1

The kurtosis λ = 32

3 Find k for which

2kˆ

22k

2 1ˆ-0Prob

kn

4 Run k>800 simulations

Example:Assume a Gaussian distribution λ = 3

95.02

96.1ˆ-0Prob

2kˆ

22k

2

k

Assume also that we require also that with probability φ = 95 % 22k

2 1.0ˆ-

1.02

96.1 k

800k

145


Kurtosis of random variable xi

Kurtosis

Kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly-sized deviations.

1905 Pearson defines Kurtosis, as a measure of departure from normality in a paper published in Biometrika. λ=3 for the normal distribution and the terms ‘leptokurtic’ (λ>3), mesokurtic (λ=3), platikurtic (λ<3) are introduced.

224 /: mxEmxE ii

22

4

:mxE

mxE

i

i

Karl Pearson (1857 –1936)

A leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "fat tails" (that is, a higher probability than a normally distributed variable of extreme values). A platykurtic distribution has a smaller "peak" around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and "thin tails" (that is, a lower probability than a normally distributed variable of extreme values).

146Hyperbolic-Secant

25

x2

sech2

1


Distribution GraphicalRepresentation

FunctionalRepresentation

Kurtosisλ

ExcessKurtosis

λ-3

Normal

2

2exp 2

2

x3 0

Laplace

b

x

b

exp

2

16 3

Uniformbxorxa

bxaab

0

1

1.8 -1.2

WignerRx

RxxRR

0

2 222 -1.02

147


Skewness of random variable xi

Skewness

2/32

3

:mxE

mxE

i

i

Karl Pearson

(1857 –1936)

Negative Skew Positive Skew

Negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed.

1

Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed.

2

More data in the left tail thanit would be expected in a normal distribution

More data in the right tail thanit would be expected in a normal distribution

Karl Pearson suggested two simpler calculations as a measure of skewness:• (mean - mode) / standard deviation • 3 (mean - median) / standard deviation

148

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics)

We found that using k measurements the estimated mean and variance are given in batch form by:

k

iik x

kx

1

1:ˆ

A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,and the variance pk, by a Recursive Filter

kx̂

The k+1 measurement will give:

1

1

11 ˆ

1

1

1

1ˆ

kk

k

iik xxk

kx

kx

kkkk xxk

xx ˆ1

1ˆˆ 11

Therefore the Recursive Filter form for the k+1 measurement will be:

k

ikik xx

kp

1

2ˆ1

1:

1

1

211 ˆ

1 k

ikik xx

kp

149

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics) (continue – 1)

We found that using k+1 measurements the estimated variance is given in batch form by:


kx̂

kkkkk pk

kxx

kpp

1ˆ

1

1 211

2

12

12

1

0

11

21

1

1

2

1

1

2

11

1

211

ˆ1

111ˆ

1

1

ˆˆˆ1

2ˆˆ

1

1

ˆˆ

1ˆ

1

kkkkk

kk

k

ikikkkk

pk

k

iki

k

i

kkki

k

ikik

xxk

pk

xxkk

k

xxxxxxkk

xxxxk

k

xxxx

kxx

kp

k

kkkk xxk

xx ˆ1

1ˆˆ 11

150

SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics) (continue – 2)


kx̂

kkkkk pk

kxx

kpp

1ˆ

1

1 211

kkkk xxk

xx ˆ1

1ˆˆ 11

kkkk xxkxx ˆˆ1ˆ 11

kkkkk p

kxxkpp

1ˆˆ1 2

11

151


Estimate the value of a constant x, given discrete measurements of x corrupted by anuncorrelated gaussian noise sequence with zero mean and variance r0.The scalar equations describing this situation are:

kk xx 1

kkk vxz

System

Measurement 0,0~ rNvk

The Discrete Kalman Filter is given by:

kk xx ˆˆ 1

111

01111 ˆˆˆ

1

kk

K

kkkk xzrppxx

k

0

1 kkk

I

kk wxx

kk

I

kk vxHz

kT

I

Tkk

I

kT

kkkkk pQpxxxxEp 0

11111 ˆˆ

0

011

1

0111111

11111

1

1

ˆˆ

rp

prpHrHpHHpp

xxxxEp

k

kpp

k

I

k

K

T

I

kk

I

kT

I

kkk

Tkkkkk

kk

k

General Form

with Known Statistics Moments Using a Discrete Recursive FilterEstimation of the Mean and Variance of a Random Variable

152


Estimate the value of a constant x, given discrete measurements of x corrupted by anuncorrelated gaussian noise sequence with zero mean and variance r0.

We found that the Discrete Kalman Filter is given by:

kkkkk xzKxx ˆˆˆ 111

0

0

01

1r

pp

rp

prp

k

k

k

kk

0

0

01

1rp

pp

0

1

12

1r

pp

p

krpp

pk

0

0

0

1

0

1 rp

pK

k

kk

0

1 rp

pK

k

kk

kkkk xz

krp

rp

xx ˆ11

ˆˆ 1

0

0

0

0

1

0k1k

0

0

0

21

rp

p

111

1

0

0

0

0

0

0

0

0

0

0

0

krp

rp

rk

rpp

krpp

with Known Statistics Moments Using a Discrete Recursive Filter (continue – 1)Estimation of the Mean and Variance of a Random Variable

153


Estimate the value of a constant x, given continuous measurements of x corrupted by anuncorrelated gaussian noise sequence with zero mean and variance r0.The scalar equations describing this situation are:

0x

vxz

System

Measurement rNv ,0~

The Continuous Kalman Filter is given by:

00ˆ&ˆˆˆ

1

1

0

xtxtzrHtptxAtx

kK

I

00

wxAx

vxHzI

TtxtxtxtxEtp ˆˆ:

12

1

1

000

rtptptHrtHtptGQtGtAtptptAtp TT

I

TT

General Form

with Known Statistics Moments Using a Continuous Recursive FilterEstimation of the Mean and Variance of a Random Variable

012 0& ptprtptp or:

tp

p

dtrp

pd

02

0

1 t

rp

ptp

0

0

1

t

rprp

rtpK0

0

1

1 txz

tr

pr

p

tx ˆ1

ˆ0

0

154



Pseudo-Random Number Generators

• First attempts to generate “random numbers”:- Draw balls out of a stirred urn- Roll dice

• 1927: L.H.C. Tippett published a table of 40,000 digits taken “at random” from census reports.

• 1939: M.G. Kendall and B. Babington-Smith create a mechanical machine to generate random numbers. They published a table of 100,000 digits.

• 1946: J. Von Neumann proposed the “middle square method”.

• 1948: D.H. Lehmer introduced the “linear congruential method”.

• 1955: RAND Corporation published a table of 1,000,000 random digits obtainedfrom electronic noise.

• 1965: M.D. MacLaren and G. Marsaglia proposed to combine two congruentialgenerators.

• 1989: R.S. Wikramaratna proposed the additive congruential method.

Routine RANDU (IBM Corp)“We guarantee that each number is random individually, but we don’t guaranteethat more than one of them is random”

155




On a computer the “random numbers” are not random at all – they are strictlydeterministic and reproducible, but they look like a stream of random numbers.For this reason the computer programs are called “Pseudo-Random Number Generators”.

Essential Properties of a Pseudo-Random Number Generator

Repeatability – the same sequence should be produced with the same initial values (or seeds)

Randomness – should produce independent uniformly distributed random variables that passes all statistical tests for randomness.

Long Period – a pseudo-random number sequence uses finite precision arithmetic, so the sequence must repeat itself with a finite period. This should be much longer than the amount of random numbers needed for simulation.

Insensitive to seeds – period and randomness properties should not depend on the initial seeds.

156




Essential Properties of a Pseudo-Random Number Generator (continue -1)

Portability – should give the same results on different computers

Efficiency – should be fast (small number of floating point operations) and not use much memory.

Disjoint subsequences – different seeds should produce long independent (disjoint) subsequences so that there are no correlations between simulations with different initial seeds.

Homogeneity – sequences of all bits should be random.

157




A Random Number represents the value of a random variable uniform distributed on (0,1). Pseudo-Random Numbers constitute a sequence of values, which although are deterministically generated, have all the appearances of being independent uniform distributed on (0,1).One approach

1. Define x0 = integer initial condition or seed

2. Using integers a and m recursively compute

mxax nn modulo1 mxIntegerxkmaxmkxa nnn ,,,1

Therefore xn takes the values 0,1,…,m-1 and the quantity un=xn/m , called a pseudo-randomnumber is an approximation to the value of uniform (0,1) random variable.

In general the integers a and m should be chose to satisfy three criteria:

1. For any initial seed, the resultant sequence has the “appearance” of being a sequence of independent (0,1) random variables.

2. For any initial seed, the number of variables that can be generated before repetitionbegins is large.

3. The values can be computed efficiently on a digital computer.

Multiplicative congruential method

158



Pseudo-Random Number Generators (continue – 1)

A gudeline is to choose m to be a large prime number compared to the computer word size.

Examples:

32 bits word computer: (some IBM systems)807,16712 531 am

125,35312 535 am36 bits word computer:

Another generator of pseudo-random numbers uses recursions of the type:

mcxax nn modulo1 mxIntegerxkmcaxmkcxa nnn ,,,,1

Mixed congruential method

32 bits word computer: (VAX)069,69232 am

32 bits word computer: (transputers)525,664,1232 am

48 bits word computer: (UNIX, RAND 48 routine)161648 6652 BcDDEECEam

48 bits word computer: (CDC vector machine)052 1547 cam48 bits word computer: (Cray vector machine)01757228752 16

48 cBEAam

64 bits word computer: (Numerical Algorithms Group)0132 1359 cam

Return to Table of Content

159



Histograms


A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what proportion of cases fall into each of several categories: it is a form of data binning. The categories are usually specified as non-overlapping intervals of some variable. The categories (bars) must be adjacent. The intervals (or bands, or bins) are generally of the same size.

Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram always equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.

A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as:

An ordinary and a cumulative histogram of the same data. The data shown is a random sample of 10,000 points from a normal distribution with a mean of 0 and a standard deviation of 1.

Mathematical Definition

k

iimn

1

In a more general mathematical sense, a histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram. Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions:

i

jji mM

1

160



The Inverse Transform Method

Suppose we want to generate a discrete random variable X having probability density function:

1,1,0)( j

jjj pjxxpxp

xp

x

0x6x1x 2x

3x 4x 5x

To accomplish this, let generate a random number U that is uniformly distributedover (0,1) and set:

j

ii

j

iij pUpifx

ppUpifx

pUifx

X

1

1

1

1001

00

j

j

ii

j

iij ppUpPxXP

1

1

1

)(

Since , for any a and b such that 0 < a < b < 1, and U is uniformly distributed P (a ≤ U < b) = b-a, we have:

and so X has the desired distribution.



The Inverse Transform Method (continue – 1)

Suppose we want to generate a discrete random variable X having probability density function:

Generate a random number(uniform distributed on (0,1) )

PxXpPi 000

U

ixX

1:

:1

ii

P

pP i

UYes

No

Stop

i ≥ N

1,1,0)( j

jjj pjxxpxp

N/1x

0x6x1x 2x

3x 4x 5x

x

0x6x1x 2x

3x 4x 5x

Histogram

Draw X, N times, from p (x)

Histogram of theResults




Generating a Poisson Random Variable:


PXePi 00

U

iX

1:

:

1/:

ii

P

iPP

UYes

No

Stop

i ≥ N

1,1,0!

)(

ii

i

i pii

eiXPp

1

!

!1

1

1

ii

e

ie

p

pi

i

i

i

0 5 10 15 200.0

0.1

0.2

λ= 4

1/N

X

0 5 10 15 200.0

0.1

0.2

λ= 4

X

Draw X , N times, from Poisson Distribution

Histogram of the Results




Generating Binominal Random Variable:


PXpPi n 010

U

iX

1:

:

1/1/:

ii

P

PppiinP

UYes

No

Stop

i ≥ N

1,1,01!!

!)(

ii

inii pipp

ini

niXPp

p

p

i

in

ppini

n

ppini

n

p

pini

ini

i

i

111!!

!

1!1!1

! 11

1


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k

nkP ,

Histogram of the Results

164



The Accaptance-Rejection Technique

Suppose we have an efficient method for simulating a random variable having aprobability density function { qj, j ≥0 }. We want to use this to obtain a randomvariable that has the probability density function { pj, j ≥0 }.

Let c be a constant such that: 0.. jj

j qtsjcq

p

If such a c exists, it must satisfy: cqcpj

jj

j 1

11

Rejection Method

Step 1: Simulate the value of Y, having probability density function qj.

Step 2: Generate a random number U (that is uniformly distributedover (0,1) ).Step 3: If U < pY/c qY, set X = Y and stop. Otherwise return to Step 1.


U

Generate Y withprobability density function qj

YY qcpU / YX No Yes

Start

x

xp

xq

xx

u

165



The Acceptance-Rejection Technique (continue – 1)


U


YY qcpU / YX No Yes

Start

Theorem

The random variable X obtained by the rejection method has probability densityfunction P { X=i } = pi.Proof

Acceptance

,

Acceptance

Acceptance,Acceptance

MethodAcceptance

MethodAcceptance

P

qc

pUiYP

P

iYPiYPiXP i

i

Bayes

AcceptanceAcceptanceAcceptance

(0,1) ddistributeuniformlyU

ceindependenby

Pc

p

P

qc

pq

P

qc

pUPiYP

ii

ii

i

i

qi

Summing over all i, yields

Acceptance

1

1

Pc

piXP i

i

i

1Acceptance Pc

ipiXP

11

Acceptance c

P

q.e.d.

166





U


YY qcpU / YX No Yes

Start

Example

Generate a truncated Gaussian using theAccept-Reject method. Consider the case with

otherwise

xexp

x

0

4,42/2/2

Consider the Uniform proposal function

otherwise

xxq

0

4,48/1

In Figure we can see the results of theAccept-Reject method using N=10,000 samples.


167

SOLO Review of ProbabilityGenerating Continuous Random Variables

The Inverse Transform Algorithm

Let U be a uniform (0,1) random variable. For any continuous distribution function F the random variable X defined by

UFX 1has distribution F. [ F-1(u) is defined to be that value of x such that F (x) = u ]

xF

x

0

0.1

Proof

Let Px(x) denote the Probability Distribution Function X=F-1(U)

xUFPxXPxPx 1

Since F is a distribution function, it means that F (x) is a monotonic increasing function of x and so the inequality “a ≤ b” is equivalent to the inequality“F (a) ≤ F (b)”, therefore

xFxFUP

xFUFFPxPuniformU

xF

UUFF

x

1,0

10

1

1


UFX 1

U


168


Generating Continuous Random Variables

The Accaptance-Rejection Technique

Suppose we have an efficient method for simulating a random variable having aprobability density function g (x). We want to use this to obtain a randomvariable that has the probability density function f (x).

Let c be a constant such that: ycyg

yf

If such a c exists, it must satisfy: cdyygcdyyf 1

11

Rejection Method

Step 1: Simulate the value of Y, having probability density function g (Y).

Step 2: Generate a random number U (that is uniformly distributedover (0,1) ).Step 3: If U < f (Y)/c g (Y), set X = Y and stop. Otherwise return to Step 1.


U


Ygc

YfU YX No Yes

Start

x

xp

xq

xx

u

169





U


ygc

yfU YX No Yes

Start

Theorem

The random variable X obtained by the rejection method has probability densityfunction P { Y=y } = f (y).Proof

Acceptance

,

Acceptance

Acceptance,Acceptance

MethodAcceptance

MethodAcceptance

P

ygcyf

UyP

P

yPYPyYP

Bayes

AcceptanceAcceptanceAcceptance

(0,1) ddistributeuniformlyU

ceindependenby

Pc

yf

P

ygcyf

yg

P

ygcyf

UPyP

yg

Summing over all i, yields

Acceptance

1

1

Pc

dyyfdyyYP

1Acceptance Pc

yfyYP

11

Acceptance c

P

q.e.d.


170

SOLO

The Bootstrap

• Popularized by Bradley Efron (1979)

• The Bootstrap is a name generically applied to statistical resampling schemes that allow uncertainty in the data to be assesed from the data themselves, in other words

“pulling yourself up by your bootstraps”

The disadvantage of bootstrapping is that while (under some conditions) it is asymptotically consistent, it does not provide general finite-sample guarantees, and has a tendency to be overly optimistic.The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples) where these would be more formally stated in other approaches.

The advantage of bootstrapping over analytical methods is its great simplicity - it is straightforward to apply the bootstrap to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients.

Generating Discrete Random Variables Bradley Efron

1938Stanford U.

Review of Probability

171

SOLO

The Bootstrap (continue -1)

• Given n observation zi i=1,…,n and a calculated statistics S, what is the uncertainty in S?

• The Procedure:


- Draw m values z’i i=1,…,m from the original data with replacement

- Calculate the statistic S’ from the “bootstrapped” sample

- Repeat L times to build a distribution of uncertainty in S.



172


Importance Sampling (IS)

Let Y = (Y1,…,Ym) a vector of random variables having a joint probability densityfunction p (y1,…,ym), and suppose that we are interested in estimating

mmmmp dydyyypyygYYgE 1111 ,,,,,, Suppose that a direct generation of the random vector Y so as to compute g (Y) is inefficient possible because (a) is difficult to generate the random vector Y, or

(b) the variance of g (Y) is large, or

(c) both of the above

Suppose that W=(W1,…,Wm) is another random vector, which takes values in thesame domain as Y, and has a joint density function q (w1,…,wm) that can be easily generated. The estimation θ can be expressed as:

WgWq

WpEdwdwwwq

wwq

wwpwwgYYgE qmm

m

mmmp

111

111 ,,

,,

,,,,,,

Therefore, we can estimate θ by generating values of random vector W, and thenusing as the estimator the resulting average of the values g (W) p (W)/ q (W).


173


Importance Sampling (IS) (continue – 1)

N

i

w

i

iiqp

i

xq

xpx

Nx

xq

xpEdxxq

xq

xpxxE

1

1

In Figure the Histogram using the Importance Weight wi is presented together with the true PDF


Example: Importance Sample for a Bi-Modal Distribution

Consider the following distribution:

2/1,3:2

11,0:

2

1xxxp NN

We want to calculate the mean value (g (x) = x)using Importance Sampling.

Use: 5,5& Uxqxxg

For i=1,…,N, sample (draw) xi using q (x)We obtain:

i

ii xq

xpw : Importance Weight

For N=10,000 samples we obtain Ep [x]=1.4915 instead of 1.5.

Nixqx ii ,,1~


174

SOLO

Metropolis Algorithm • This method of generation of an arbitrary probability distribution was invented by Metropolis, Rosenbluth and Teller (supposedly at a Los Alamos dinner party) and published June 1953.



Procedure• Set up a Markov Chain that has as a unique stationary solution the required π (x) Probability Distribution Function (PDF)

• Run the chain until stationary.

• All subsequent samples are from stationary distribution π (x) as required.

Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.,

“ Equations of state calculations by fast computing machine”,

Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092

Nicholas Constantine Metropolis ( 1915 – 1999)

This is also called Markov Chain Monte Carlo (MCMC) method.

X3 X20.30.3

0.10.2

X1

0.60.50.3

0.6

0.1

175

SOLO

Metropolis Algorithm (continue – 1)



Nicholas Constantine Metropolis ( 1915 – 1999)

Proof of the Procedure

Pr (X,t) - the probability of being in the state X at time t.Pr (X→Y)=Pr (Y|X) - the probability, per unit time, of transition probability, of going

from state X to state Y.

Y

tXXYtYYXtXtX ,Pr|Pr,Pr|Pr,Pr1,Pr

At large t, once the arbitrary initial state is “forgotten”, we want Pr (X,t) → Pr (X).

Clearly a sufficient (but not necessary) condition for an equilibrium (time independent)probability distribution is the so called

tYYXtXXY ,Pr|Pr,Pr|Pr Detailed Balance Condition:

tA,Pr

APr

t

This method can be used for any probability distribution, but Metropolis used:

AEBEEE

EeAB

kTE

:01

0|Pr

/ Note: E (A) is equivalent toEnergy level of state A

1|PrPr YB

XYYX Sum of probabilities of all states reached from X.

X Y

XY |Pr

YX |Pr

XY

XX

|Pr

Pr

YY

YY

|Pr

Pr

176

SOLO




tYYXtXXY ,Pr|Pr,Pr|Pr Detailed Balance Condition:

tA,Pr

APr

t

Metropolis defined a symmetric Q (Y|X) = Q (X|Y) as a candidate generating density, for p (Y|X) such that: 1|

Y

XYQ

In general Q (Y|X) will not satisfy the “Detailed Balance” condition, for example:

tYYXQtXXYQ ,Pr|,Pr|

X Y

XY |Pr

YX |Pr

XY

XX

|Pr

Pr

YY

YY

|Pr

Pr

The process moves from X to Y too often and from Y to X too rarely.

A convenient way to correct this is to reduce the number of moves from X to Y byintroducing a probability 0 < Α (Y|X) ≤ 1. This is called the Acceptance Probability.

XYXYXYQXY |||Pr

177

SOLO




X Y

XY |Pr

YX |Pr

XY

XX

|Pr

Pr

YY

YY

|Pr

Pr

Let define the Acceptance Probability as:

XYXYXYQXY |||Pr

YX

YXYXXY

PrPr1

PrPrPr/Pr|

YXXY

YXYX

PrPrPr/Pr

PrPr1|

XYYXYXQYX |||Pr

If Pr (X) ≤ Pr (Y) then A (X|Y) = 1, A (Y|X) = Pr (X)/Pr (Y)

If Pr (X) >Pr (Y) then A (X|Y) = Pr (Y)/ Pr (X), A (Y|X) = 1

In both cases:

Y

X

YX

XY

YXYXQ

XYXYQ

YX

XY YXQXYQ

Pr

Pr

|

|

||

||

|Pr

|Pr ||

which is just the Detailed Balance condition.

178

SOLO




tAABtBBA ,Pr|Pr,Pr|Pr Detailed Balance Condition:

tA,Pr

APr

t

This method can be used for any probability distribution, but Metropolis used:

AEBEEE

EeAB

kTE

:01

0|Pr

/

TkE

TkBEAE

TkAEBE

e

AEBEEe

AEBEEe

tA

tB

BA

AB /

/

/

01

01

,Pr

,Pr

|Pr

|Pr

Therefore

A B

BA Pr

AB Pr

BA

AA

Pr1

Pr AB

BB

Pr1

Pr

179

SOLO

Metropolis-Hastings (M-H) Algorithm



• Set up a Markov Chain T (x’|x) that has as a unique stationary solution the required π (x’) Probability Distribution Function (PDF)

xdxxxTx |''

W. Keith Hastings improved the Metropolis algorithm by allowing a non-symmetrical Candidate Generating Density.Hastings, W., “Monte Carlo Simulation Methods Using Markov Chains and Their Applications”, Biometrica, 1970, No. 57, pp. 97 - 109

Here we give the development for Continuous Random Variables(for Discrete Random Variables the development is similar to that used forMetropolis Algorithm).

180

SOLO




• The problem is to find the conditional transition probability distribution T (x’|x) of the Markov Chain, that has states converging, after a transition time, to π (x’).

xdxxxTx |''

To satisfy this requirement a “necessary condition” (but “not sufficient”) is:

Proof:

''|'''||'

1

xxdxxTxxdxxxTxdxxxT q.e.d.

Let define Q (x’|x) as a candidate generating density, for T (x’|x) such that:

1'|' xdxxQ

In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:

''||' xxxTxxxT “Detailed Balance”or “ Reversibility Condition”or “Time Reversibility”

''||' xxxQxxxQ Loosely speaking, the process moves from x to x’ too often and from x’ to x too rarely.

181

SOLO




In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:

''||' xxxQxxxQ Loosely speaking, the process moves from x to x’ too often and from x’ to x too rarely.

A convenient way to correct this is to reduce the number of moves from x to x’ byintroducing a probability 0 < α (x’|x) ≤ 1. This is called the Acceptance Probability.

xxxxxxQxxT '|'|'|' If the move is not made the process again returns to x as a value from target distribution.

''||'|' xxxQxxxxxQ The Detailed Balance is

From which

1

|'

''||'

xxxQ

xxxQxx

xxxQ

xxxQxx

|'

''|,1min|'

In the same way (by interchangingx’ with x)

''|

|',1min'|

xxxQ

xxxQxx

182

SOLO




Let prove that we satisfy the “Detailed Balance” condition:

xxxQ

xxxQxxxQxxxT

|'

''|,1min|'|'

''|

|',1min''|''|

xxxQ

xxxQxxxQxxxT

Suppose xxxQxxxQ |'''|

''|

|'

''||'|' xxxQ

xxxQ

xxxQxxxQxxxT

''|1''|''| xxxQxxxQxxxT Therefore ''||' xxxTxxxT q.e.d.

183

SOLO




The Transition Kernel of the Metropolis Hastings Algorithm is:

'|'1|'|'|' xxxxxxxQxxT x

where δx is the Dirac-mass on {x}.

184

SOLO




Therefore the M-H Algorithm will:

Use the previous generated x(t)

Draw a new value xnew from the candidate distribution Q (xnew| x(t)):

tnewnew xxQx |~

Compute the acceptance probability α (xnew| xj):

ttnew

newnewttnew

xxxQ

xxxQxx

|

|,1min|

Use the Acceptance/Rejection method with(uniform distribution between 0 to 1) and c = 1 (because U [0,1] > α (xnew|x(t)) )


U

tnew xxU | newt xx 1No Yes tt xx 1

newt xx ,

1: tt

ttnew

newnewttnew

xxxQ

xxxQxx

|

|,1min|

otherwise

xUxq

0

1011,0

1

2

3

4

186

SOLO




1

1

p (x)

x

1

1

p (x)

x1

sample rejected

1

1

p (x)

x1

sample rejected

2

2

sample accepted

1

1

p (x)

x1

sample rejected

2

2

sample accepted

3

3sample

accepted

1

1

p (x)

x1

sample rejected

2

2

sample accepted

3

3sample

accepted

4

4

sample accepted

1

1

p (x)

x1

sample rejected

2

2

sample accepted

3

3sample

accepted

4

4

sample accepted

4sample rejected

1

1

p (x)

x1

sample rejected

2

2

sample accepted

3

3sample

accepted

4

4

sample accepted

4

sample rejected

5

sample accepted

5

Run This

Example

187

SOLO




The convergence of the M-H Algorithm to the desired unique stationary solutionthe required π (x) occurs under the following conditions:

• Irreducibility: every state is eventually reachable from any start state; for all x, there exist a t such that π (x,t) > 0

• Aperiodicity: the chain doesn’t get caught in cycles.

The process is ergodic if it is both irreductible and aperiodic.

In M-H algorithm the draws are used as sample from the target density π (x) onlyafter the Markov Chain has passed the transient stage and the effect of the chosenstarting value x0 has become so small that it can be ignored. The rate of convergenceof the Markov Chain is a function of the chosen candidate generating density Q (x’,x)The efficiency of the algorithm depends on how close is the Acceptance Probability αto 1.

188

SOLO




Example:

2

2

102.0exp7.0

2.0exp3.0

x

xx

Proposed CandidateDistribution:

100,| ttnew xxxQ N

Ramon SagarnaR,[email protected]

“Lecture 19Markov Chain Monte Carlo

Methods (MCMC)”

189

SOLO

Metropolis Algorithm



If we choose a symmetric candidate generating density: Q (x’|x) = Q (x|x’) for each x’,x then

x

xxx

'

,1min|'

',1min'|

x

xxx

We obtain the Metroplis Algorithm.

xExEEE

EexxQ

kTE

':01

0|'

/

Metropolis has chosen:

xExEEEe

ExxQ

kTE

':

0

01'|

/

190

SOLO

Metropolis Algorithm




U

Generate a candidate xnew from theprobability density function

Q(xnew|x(t)) = Q(x(t)|xnew)

No Yes tnew xxU |

t := t+1

tt xx 1 new

t xx 1

tnewt

new x

xxx

,1min|

Startt=0, x0


191

SOLO

Gibbs Sampling



Stuart GemanBrown University

Donald GemanJohns Hopkins

University

Josiah Willard Gibbs1839 - 1903

In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution, or to compute an integral (such as an expected value). Gibbs sampling is a special case of the Metropolis-Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm. The algorithm is named after the physicist J. W. Gibbs, in reference to an analogy between the sampling algorithm and statistical physics. The algorithm was devised by Stuart Geman and Donald Geman, some eight decades after the passing of Gibbs, and is also called the Gibbs sampler.

Geman, S. and Geman, D., “Stochastic Relaxation, Gibbs Distributions and the BayesRestoration of Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence,1984, 6, pp. 721 - 741

192

SOLO

Gibbs Sampling (continue – 1)



Gibbs sampler uses what are called the full (or complete) conditional distributions:

jjj

jj

jkjjj

kjjjBayes

x

kjjjxdxx

xx

xdxxxxx

xxxxxxxxxx

j

,

,

,,,,,,

,,,,,,,,,,,|

111

111111

The Gibbs sampler sample one variable in turn

113

121

21

11

12

11

1

41

21

131

3

31

121

2

3211

1

,,,|~

,,,|~

,,,,|~

,,,|~

,,,|~

tk

ttt

tk

ttk

tk

tk

tttt

tk

ttt

tk

ttt

xxxxX

xxxxX

xxxxxX

xxxxX

xxxxX

(0)

(1)

(2)

(3)

X1

X2

Gibbs sampler always uses the most recent values.

Suppose that is k ( ≥2 ) dimensional. kxxxx ,,, 21

193

SOLO




Gibbs Sampling is a special case of Metropolis – Hastings Algorithm.To see this let define the candidate generating density: Q (x’|x(t)) as

otherwise

xxifxxxxQ

tj

newj

tj

newjtnew

0

|Pr|

ttnew

newnewttnew

xxxQ

xxxQxx

Pr|

Pr|,1min|

tnew

jnewj

newtj

tj

xxx

xxx

Pr|Pr

Pr|Pr,1min

At any moment one variable is drawn: newjj

newj xxx |~ new

jx

where 111121 ,,,,,,:

tk

tj

tj

ttnewj xxxxxx

The acceptance probability is: tnew xx |

newj

newj

tk

tj

newj

tj

ttnew xxxxxxxxx

|,,,,,,, 111121 The will benewx

194

SOLO




otherwise

xxifxxxxQ

tj

newj

tj

newjtnew

0

|Pr|

tnew

jnewj

newtj

tj

ttnew

newnewttnew

xxx

xxx

xxxQ

xxxQxx

Pr|Pr

Pr|Pr,1min

Pr|

Pr|,1min|

tt

jnew

jnewj

newnewj

tj

tjtnew

xxxx

xxxxxx

PrPr,Pr

PrPr,Pr,1min|

tj

tj

tj

Bayestj

tj x

xxxx

Pr

,Pr|Pr

newj

newj

newj

Bayesnew

jnewj x

xxxx

Pr

,Pr|Pr

1Pr

Pr,1min|

tj

newj xx

tj

newjtnew

x

xxx

tj

tj

ttj

tj

t xxxxxx ,PrPr, newj

newj

newnewj

newj

new xxxxxx ,PrPr,

The acceptance probability is: tnew xx |

Gibbs Sampling always accepts newjx

Gibbs Sampling is a special case of Metropolis – Hastings Algorithm.

candidate generating density: Q (x’|x(t))

195

SOLO






Monte Carlo Integration

Monte Carlo Method can be used to numerically evaluate multidimensional integrals

xdxgdxdxxxgI mm 11 ,,

To use Monte Carlo we factorize xpxfxg

1&0 xdxpxp

in such a way that is interpreted as a Probability Density Function such that xp

We assume that we can draw NS samples from xp Si Nix ,,1,

Si Nixpx ,,1~

Using Monte Carlo we can approximate

SN

iS

i Nxxxp1

/

SS

S

S

N

i

i

S

N

i

i

S

N

iS

iN

xfN

xdxxxfN

xdNxxxfIxdxpxfI

11

1

11

/


Monte Carlo Integration

we draw NS samples from xp Si Nix ,,1,

Si Nixpx ,,1~

S

S

N

i

i

SN xf

NIxdxpxfI

1

1

If the samples are independent, then INS is an unbiased estimate of I.

ix

IIsa

NN

SS

..

xdxpIxff22 :If the variance of is finite; i.e.: xf

The error of the MC estimate, e = INS – I, is of the order of O (NS

-1/2), meaning

that the rate of convergence of the estimate is independent of the dimension ofthe integrand. Return to Table of Content

According to the Law of Large Numbers INS will almost surely converge to I:

then the Central Limit Theorem holds and the estimation error converges indistribution to a Normal Distribution:

2,0~lim fNSN

IINS

S

N

198

Random ProcessesSOLO

Random Variable: A variable x determined by the outcome Ω of a random experiment.

xx

Random Process or Stochastic Process:

A function of time x determined by the outcome Ω of a random experiment.

,txtx

1

2

3

4

x

t

This is a family or an ensemble of functions of time, in general different for each outcome Ω.

Mean or Ensemble Average of the Random Process:

dptxEtx tx,:

Autocorrelation of the Random Process:

ddptxtxEttR txtx 21 ,2121 ,,:,

Autocovariance of the Random Process: 221121 ,,:, txtxtxtxEttC

2121212121 ,,,, txtxttRtxtxtxtxEttC

Table of Content

199

SOLO

Stationarity of a Random Process

1. Wide Sense Stationarity of a Random Process: • Mean Average of the Random Process is time invariant:

.,: constxdptxEtx tx

• Autocorrelation of the Random Process is of the form:

RttRttRtt 21:

2121 ,

12,2121 ,,,:,21

ttRddptxtxEttR txtx

since:

We have: RR

Power Spectrum or Power Spectral Density of a Stationary Random Process:

djRS exp:

2. Strict Sense Stationarity of a Random Process: All probability density functions are time invariant: .,, constptp xtx

Ergodicity:

,,2

1:, lim txExdttx

Ttx

ErgodicityT

TT

A Stationary Random Process for which Time Average = Assembly Average

Random Processes

200

SOLO

Time Autocorrelation:

Ergodicity:

T

TT

dttxtxT

txtxR ,,2

1:,, lim

For a Ergodic Random Process define

Finite Signal Energy Assumption:

T

TT

dttxT

txR ,2

1,0 22 lim

Define:

otherwise

TtTtxtxT 0

,:,

dttxtxT

R TTT ,,2

1:

T

T

TT

T

T

TT

T

T

TT

T

TT

T

T

TT

T

TTT

dttxtxT

dttxtxT

dttxtxT

dttxtxT

dttxtxT

dttxtxT

R

,,2

1,,

2

1,,

2

1

,,2

1,,

2

1,,

2

1

00

Let compute:

T

T

TTT

T

T

TTT

TT

dttxtxT

dttxtxT

R

,,2

1,,

2

1limlimlim

RdttxtxT

T

T

TT

T

,,2

1lim

0,,2

1,,

2

1 suplimlim

txtxT

dttxtxT TT

TtTT

T

T

TTT

therefore: RRTT

lim

,,2

1:, lim txExdttx

Ttx

ErgodicityT

TT

T T

txT

t

Random Processes

201

SOLO

Ergodicity (continue):

TTTT

TT

TT

TTT

XXT

dvvjvxdttjtxT

dtjtxdttjtxT

ddttjtxtjtxT

dttxtxdjT

djR

*

2

1exp,exp,

2

1

exp,exp,2

1

exp,exp,2

1

,,exp2

1exp

Let compute:

where: and * means complex-conjugate.

dvvjvxX TT exp,:

Define:

ddttxtxE

TjdjRE

T

XXES

T

T

TT

T

T

T

TT

T

,,2

1expexp

2: limlimlim

*

Since the Random Process is Ergodic we can use the Wide Stationarity Assumption:

RtxtxE TT ,,

djR

ddtT

jRddtRT

jT

XXES

T

TT

T

TT

TT

T

exp

2

1exp

2

1exp

2:

1

*

limlimlim

Random Processes

202

SOLO

Ergodicity (continue):

We obtained the Wiener-Khinchine Theorem (Wiener 1930):

dtjR

T

XXES TT

T

exp2

:*

lim

Norbert Wiener1894 - 1964

Alexander YakovlevichKhinchine1894 - 1959

The Power Spectrum or Power Spectral Density of a Stationary Random Process S (ω) is the Fourier Transform of the Autocorrelation Function R (τ).

Random Processes

203

SOLO

White Noise

A (not necessary stationary) Random Process whose Autocorrelation is zero for any two different times is called white noise in the wide sense.

211

2

2121 ,,, ttttxtxEttR

1

2 t - instantaneous variance

Wide Sense Whiteness

Strict Sense Whiteness

A (not necessary stationary) Random Process in which the outcome for any two different times is independent is called white noise in the strict sense.

2121, ,,21

ttttp txtx

A Stationary White Noise Random has the Autocorrelation:

2,, txtxER

Note

In general whiteness requires Strict Sense Whiteness. In practice we have only moments (typically up to second order) and thus only Wide Sense Whiteness.

Random Processes

204

SOLO

White Noise

A Stationary White Noise Random has the Autocorrelation:

2,, txtxER

The Power Spectral Density is given by performing the Fourier Transform of the Autocorrelation:

22 expexp

dtjdtjRS

S

2

We can see that the Power Spectrum Density contains all frequencies at the same amplitude. This is the reason that is called White Noise.

The Power of the Noise is defined as: 20

SdtRP

Random Processes

205

SOLO

Table of Content

Markov Processes

A Markov Process is defined by:

Andrei AndreevichMarkov

1856 - 1922

111 ,|,,,|, tttxtxptxtxp

i.e. the Random Process, the past up to any time t1 is fully defined by the process at t1.

Examples of Markov Processes:

1. Continuous Dynamic System wuxthtz

vuxtftx

,,,

,,,

2. Discrete Dynamic System

kkkkk

kkkkk

wuxthtz

vuxtftx

,,,

,,,

1

1

x - state space vector (n x 1)u - input vector (m x 1)v - white input noise vector (n x 1)

- measurement vector (p x 1)z

- white measurement noise vector (p x 1)w

Random Processes

206

SOLO

Table of Content

Markov Processes


3. Continuous Linear Dynamic System txCtz

tvtxAtx

Using the Fourier Transform we obtain:

VHVAIjCZH

1

Using the Inverse Fourier Transform we obtain:

dvthtz ,

dthvddtjHv

dtjdjvHdtjVHtz

th

egrattionoforderchange

V

exp2

1

expexp2

1exp

2

1

int

htv (t) z (t)

Random Processes

207

SOLO

Table of Content

Markov Processes


3. Continuous Linear Dynamic System txCtz

tvtxAtx

The Autocorrelation of the output is:

dvthtz ,

htv (t) z (t)

dhhdthth

ddththddvvEthth

dvthdvthEtztzER

v

t

v

v

zz

2

111

2

212121

2

212111

222111

1

2

vvv tvtvER

22 expexp vvvvvv djdjRS

2*2

22

2

expexp

expexpexpexp

expexp

xx

xx

x

RR

zzzz

HHdjhdjh

djdjhhdjdjhh

djdhhdjRSzzzz

vvzz SHHS *

Random Processes

208

SOLO

Table of Content

Markov Processes


4. Continuous Linear Dynamic System

dvthtz ,

2

vvv tvtvER 2

vvvS

v (t) z (t)

xj

KH

/1

xj

KH

/1

The Power Spectral Density of the output is:

2

22

*

/1 x

v

vvzz

KSHHS

2

22

/1 x

vvzz

KS

x

22

vvK

2/22

vvK

The Autocorrelation of the output is:

dsss

K

jdj

K

djSR

x

vjs

x

v

zzzz

exp/12

1exp

/12

1

exp2

1

2

22

2

22

j

x

R

0/1 2

22

R

s

x

vv dses

K

0

/1 2

22

R

s

x

vv dses

K

x

js

00

xeK

R vvxzz

2

22

2/22

vvxK

xvxK

exp2

22

0exp

Reexp2

1

0exp

Reexp2

1

222

22

222

222

22

222

x

vx

x

vx

x

vx

x

vx

s

sKsdss

s

K

j

s

sKsdss

s

K

j

x

x

Random Processes

209

SOLO

Markov Processes


5. Continuous Linear Dynamic System with Time Variable Coefficients

21121&

:&:

tttQteteE

twEtwtetxEtxteT

ww

wx

w (t) x (t)

tF

tG x (t)

twtGtxtFtxtxtd

d

tetGtetFte wxx

t

t

dwGttxtttx0

,, 00

The solution of the Linear System is:

where:

3132210000 ,,,&,&,, ttttttItttttFtttd

d

t

t

wxx deGttettte0

,, 00

twEtGtxEtFtxE

Random Processes

210

SOLO

Markov Processes


5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 1)

21121&

:&:

tttQteteE

twEtwtetxEtxteT

ww

wx

w (t) x (t)

tF

tG x (t)

t

t

dwGttxtttx0

,, 00 t

t

wxx deGttettte0

,, 00

teteEtxVartV T

xxx : teteEtxVartV T

xxx :

teteEttRteteEttR T

xxx

T

xxx :,&:,

t

t

TTT

xxx dtGQGttttVttttRtV0

,,,,, 000

t

t

TTT


,,,,, 000

Random Processes

211

SOLO Markov Processes



21121&

:&:

tttQteteE

twEtwtetxEtxteT

ww

wx

w (t) x (t)

tF

tG x (t)

t

t

dwGttxtttx0

,, 00 t

t

wxx deGttettte0

,, 00

teteEtxVartV T

xxx : teteEtxVartV T

xxx :

teteEttRteteEttR T

xxx

T

xxx :,&:,

0,,,,

0,,,,

,

0

0

000

000

t

t

TTT

x

t

t

TTT

x

x

dtGQGttttVtt

dtGQGttttVtt

ttR

0,

0,,,

0,,,

0,

,

tttV

dtGQGttttV

or

dtGQGttVtt

tVtt

ttR

T

x

t

t

TTT

x

t

t

TT

x

x

x

Random Processes

212




21121&

:&:

tttQteteE

twEtwtetxEtxteT

ww

wx

w (t) x (t)

tF

tG x (t)

t

t

wxx deGttettte0

,, 00

0,,,,

0,,,,

,

0

0

000

000

t

t

TTT

x

t

t

TTT

x

x

dtGQGttttVtt

dtGQGttttVtt

ttR

t

t

TTT


,,,,, 000

teteEtxVartV T

xxx :

teteEttRteteEttR T

xxx

T

xxx :,&:,

tGtQtGdtFtGQGttFtttVtt

dtGQGttFtttVtttFtVtd

d

T

t

t

TTTTT

x

t

t

TTT

xx

0

0

,,,,

,,,,

000

000

tGtQtGtFtVtVtFtVtd

d TT

xxx

tGtQtGtFtVtVtFtVtd

d TT

xxx

Random Processes

213




21121&

:&:

tttQteteE

twEtwtetxEtxteT

ww

wx

w (t) x (t)

tF

tG x (t)

t

t

wxx deGttettte0

,, 00 teteEtxVartV T

xxx :

teteEttRteteEttR T

xxx

T

xxx :,&:,

0,,,,

0,,,,

,

0

0

000

000

t

t

TTT

x

t

t

TTT

x

x

dtGQGttttVtt

dtGQGttttVtt

ttR

0,,,

0,,,,

tttGtQtGtFttRttRtF

tGtQtGtttFttRttRtFttR

td

dTTT

xx

TT

xx

x

0,,,

0,,,,

tGtQtGtttFttRttRtF

tttGtQtGtFttRttRtFttR

td

dTT

xx

TTT

xx

x

Random Processes

214



6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise

w (t) x (t)

tF

tG x (t)

twtGtxtFtxtxtd

dGiven a Continuous Linear System:

we want to decide if can be approximated by a white noise. tw

Let start with a first order linear system with white noise input : tw '

twT

twT

tw '11

w (t)w' (t) Ts

sH

1

1

Ttt

w ett /

00,

tQwEwtwEtwE ''''

ttRtwEtwtwEtwE ww ,

ttRtwEtwtwEtwE ww ,

ttRtVwEwtwEtwE wwww ,

tGtQtGtFtVtVtFtVtd

d TT

xxx QT

tVT

tVtd

dwwww 2

12

00 ,1

, ttT

tttd

dww

where

Random Processes

215




(continue – 1)

QT

tVT

tVtd

dwwww 2

12

T

t

T

t

wwww eT

QeVtV

22

12

0 t2/T

T

t

ww eV2

0

T

t

eT

Q 2

12 T

QV statesteadyww 2

tVww

0,

0,,

tVetttV

tVetVttttR

wwTT

www

wwT

www

ww

0,

0,,

tVetVtt

tVetttVttR

wwT

www

wwTT

www

ww

For T

QVtVtV

T statesteadywwwwww 25

TTstatesteadywwwwwwww e

T

QeVVttRttR

T

2

,,5

w (t)w' (t) Ts

sH

1

1

Random Processes

216




(continue – 2)

TTstatesteadywwwwwwww e

T

QeVVttRttR

T

2

,,5

T

ww eT

QV /

2

T

QV statesteadyww 2

T T

1 eV statesteadyww Qde

T

QdVArea T

ww

02

2

T is the correlation time of the noise w (t) and can be found from Vww (τ) by tacking the time corresponding to Vww steady-state /e.

One other way to find T is by tacking the double sides Laplace Transform L2 on τ of:

QdetQtQs s

ww

2'' L

sHQsHsT

Q

deeT

QVs sT

sswwww

2

/

2

1

2

L 2

2/1/1

QQww

T/12/1

Q

2/Q

T/12/1

T can be found by tacking ω1/2 of half of the power spectrum Q/2 and T=1/ ω1/2.

Random Processes

217



2

2/1/1

QQww

T/12/1

Q

2/Q

T/12/1

Let return to the original system: twtGtxtFtxtxtd

d

w (t) x (t)

tF

tG x (t)


(continue – 3)

Compute the power spectrum ofand define Q and T.

jsww tw

then can be approximated by the white noise with tw tw '

tQwEwtwEtwE ''''

then can be approximated by a colored noise that can be obtained by passingthe predefined white noise through a filter

tw tw '

sTsH

1

1

If F of eigenvalue maximum

1F of constant time minimumT 51

If F of eigenvalue maximum

1F of constant time minimumT 52

Random Processes

218


Examples of Markov Processes: 7. Digital Simulation of a Contimuos Process

Let start with a first order linear system with white noise input : tw '

twT

twT

tw '11

w (t)w' (t) Ts

sH

1

1

Ttt

w ett /

00, 00 ,

1, tt

Ttt

td

dww

t

t

TtTtt dwT

etwetw0

0 '1/

0

/

Let choose t = (k+1) ΔT and t0 = k ΔT

tQwEwtwEtwE ''''where

Tkw

Tk

Tk

TTkTT dwT

eTkweTkw

1'

1

/1/ '1

1

Random Processes

219


Examples of Markov Processes: 7. Digital Simulation of a Contimuos Process (continue – 1)

Define: TTe /:

2/21/12

2

1

12

/12

2

1 1

122112

/1/1

1111

12

122

1

''''1

''''

11

21

21

T

Qe

T

Qe

T

T

QdQ

Te

ddwEwwEwET

ee

TkwETkwTkwETkwE

TTTk

Tk

TTk

Tk

Tk

TTk

Tk

Tk

Tk

Tk Q

TTkTTk

Tkw

Tk

Tk

TTkTT dwT

eTkweTkw

1'

1

/1/ '1

1

Define w’ (k) such that:

T

QkwEkwkwEkwE

2:''''

2

1

1

':'

kwkw

Therefore: kwTkwTkw '11 2

Random Processes

220

SOLOMarkov Chains

Random Processes

X3 X20.3

0.30.1

0.2

X1

0.60.5

0.3

0.6

0.1

Markov chain, named after Andrey Markov, is a stochastic process with the Markov property. Having the Markov property means that, given the present state, future states are independent of the past states. In other words, the description of the present state fully captures all the information that could influence the future evolution of the process. Being a stochastic process means that all state transitions are probabilistic. Andrey

AndreevichMarkov

1856 - 1922

At each step the system may change its state from the current state to another state (or remain in the same state) according to a probability distribution. The changes of state are called transitions, and the probabilities associated with various state-changes are called transition probabilities

Definition of Markov Chains

A Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov property, namely that, given the present state, the future and past states are independent.

nnnnnn xXxXxXxXxX |Pr,,|Pr 1111

221

SOLOMarkov Chains

Random Processes

Properties of Markov Chains

Define the probability of going from state i to state j in m time steps as: iXjXp omm

ji |Pr

and the single step transition as:

iXjXp ji 01 |Pr

X3 X2

X1

1.011 p

6.021 p

5.012 p

3.032 p

3.023 p

3.031 p

6.013 p

1.033 p 2.022 p

For a time-homogeneous Markov Chain: iXjXp kkmm

ji |Pr

and:

iXjXkp kkji |Pr 1

so, the n-step transition satisfies the Chapman-Kolmogorov equation, that for any k such that 0 < k <n:

Sr

knjr

kri

nji ppp

222

SOLOMarkov Chains

Random Processes

Properties of Markov Chains (continue – 1)

The marginal distribution Pr (Xk = x) is the distribution over states at time k:

iXjXkp kkji |Pr 1

X3 X2

X1

1.011 p

6.021 p

5.012 p

3.032 p

3.023 p

3.031 p

6.013 p

1.033 p 2.022 p

In Matrix form it can be written as:

kN

kK

NNNN

N

N

kN X

X

X

ppp

ppp

ppp

X

X

X

2

1

21

22221

11211

1

2

1

PrPr

where N is the number of states of the Markov Chain.

1.03.03.0

3.02.06.0

6.05.01.0

KProperties of the Transition Matrix K:

10 np ji

11

N

jji np

1

2

For a time-homogeneous Markov Chain:

Srkjr

Srkkkk

rXkp

rXrXjXjX

Pr

Pr|PrPr 11

223

SOLOMarkov Chains

Random Processes


A state j is said to be accessible from a state i (written i → j) if a system started in state i has a non-zero probability chance of transitioning into state j at some point. Formally, state j is accessible from state i if there exists an integer n≥0 such that :

njion piXjX |Pr

Reducibility

Allowing n to be zero means that every state is defined to be accessible from itself. A state i is said to communicate with state j (written i ↔ j) if both i → j and j → i. A set of states C is a communicating class if every pair of states in C communicates with each other, and no state in C communicates with any state not in C. It can be shown that communication in this sense is an equivalence relation and thus that communicating classes are the equivalence classes of this relation. A communicating class is closed if the probability of leaving the class is zero, namely that if i is in C but j is not, then j is not accessible from i. Finally, a Markov chain is said to be irreducible if its state space is a single communicating class; in other words, if it is possible to get to any state from any state.

224

SOLOMarkov Chains

Random Processes


A state i has period k if any return to state i must occur in multiples of k time steps. Formally, the period of a state is defined as:

0|Pr: iXiXndivisorcommongreatestk on

Periodicity

Note that even though a state has period k, it may not be possible to reach the state in k steps. For example, suppose it is possible to return to the state in {6,8,10,12,...} time steps; then k would be 2, even though 2 does not appear in this list. If k = 1, then the state is said to be aperiodic; otherwise (k>1), the state is said to be periodic with period k. It can be shown that every state in a communicating class must have the same period.

225


Existence Theorems

Existence Theorem 3

Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)having S (ω) as its power spectrum or R (τ) as its autocorrelation.


Define

fa

S

a

SfdSa

222 :&

1:

Since , according to Existence Theorem 1,

we can find a random variable ω with the even density function f (ω), andprobability density function

1&0

dff

dfP :

We now form the process , where is a random variableuniform distributed in the interval (-π,+π) and independent of ω.

tatx cos:

226


Existence Theorems

Existence Theorem 3

Proof of Existence Theorem 3 (continue – 1)

Since is uniform distributed in the interval (-π,+π) and independent of ω,its spectrum is

0sinsincoscos00

,

EtEaEtEatxEtindependen

sin

2

1

2

1

2

1

j

ee

j

edeeES

jjjjj

or

sin

sincos EjEeE j

1 1

02022,

22

2

2sin2sin2

2cos2cos2

cos2

22cos2

cos2

coscos

EtE

aEtE

aE

a

tEa

Ea

ttEatxtxE

tindependen

2 2


227


Existence Theorems

Existence Theorem 3

Proof of Existence Theorem 3 (continue – 2)

0txE

xRdfa

Ea

txtxE

cos2

cos2

22

tatx cos:We have

Because of those two properties x (t) is wide-sense stationary with a power spectrumgiven by:

dRdjRS x

RR

xx

xx

cossincos

dSdjSR x

SS

xx

xx

cos2

1sincos

2

1

Therefore faSx2

q.e.d.

Fourier InverseFourier

dfa

cos2

2

f (ω) definition

S


228

SOLO Permutation & Combinations

Permutations

Given n objects, that can be arranged in a row, how many different permutations (new order of the objects) are possible?

1 2 3 nn-1

To count the possible permutations , let start by moving only the first object {1}.

1 2 3 nn-11

Number of permutations

12 3 nn-12

12 3 nn-13

12 3 nn-1n

By moving only the first object {1}, we obtained n permutations.

229


Permutations (continue -1)

Since we obtained all the possible position of the first object, we will perform the same procedure with the second object no {2}, that will change position with all other objects, in each of the n permutations that we obtained before .

For example from the group 1 we obtain the following new permutations

Number of new permutations

Since this is true for all permutations (n-1 new permutations for each of the first n permutations) we obtain a total of n (n-1) permutations .

1 2 3 nn-11

1 23 nn-12

n-2 1 23 nn-1

1 23 nn-1n-1

230


Permutations (continue -2)

If we will perform the same procedure with the third object {3}, that will change position with all other objects, besides those with objects no {1} and {2} that we already obtained, in each of the n (n-1) permutations that we obtained before , we will obtain a total of n (n-1) (n-2) permutations.

We continue the procedure with the objects {4}, {5}, …, {n}, to obtain finally the total number of permutations of the n objects:

n (n-1) (n-2) (n-3)… 1 = n !

The gamma function Γ is defined as:

0

1 exp dttta a

Gamma Function Γ

If a = n is an integer then:

nn

dtttnttdtttn nn

dvu

n

0

1

00

expexpexp1

1expexp10

0

tdtt

Therefore: !11211 nnnnnnn Table of Content

231


Combinations

Given k boxes, each box having a maximum capacity (for box i the maximumobject capacity is ni ).

1 2 3 nn-1

n1 n2 nk

Box 1 Box 2 Box k

Given also n objects, that must be arranged in k boxes, each box must be filled to it’s maximum capacity :

The order of the objects in the box is not important.

Example: A box with a capacity of three objects in which we arranged the objects {2}, {4}, {7)

42 7

4

2 74 27

427

42 7

4 2 7

4 27

Equivalent

3!=6 arrangements

1 outcome

nnnn k 21

232


Combinations (continue - 1)

1 2 3 nn-1

n1 n2 nk

Box 1 Box 2 Box k

12 3 nn-1

123 nn-1

123

kn

!n

1n 2n

In order to count the different combinations we start with n ! different arrangements of then objects.

nnnn k 21

In each of the n! arrangements the first n1 objects will go to box no. 1, the next n2

objects in box no. 2, and so on, and the last nk objects in box no. k, and since:

all the objects are in one of the boxes.

233



But since the order of the objects in the boxes is not important, to obtain the number ofdifferent combinations, we must divide the total number of permutations n! by n1!, becauseof box no.1, as seen in the example bellow, where we used n1=2.

1 2 3 nn-1

n1=2 n2 nk

Box 1 Box 2 Box k

12 3 nn-1

kn

!n

21 n 2n

123 nn-1

123 nn-1

12nn-1

12n n-1

4

4

4

4

n-2

n-2

n-3

n-3

!1n

!1n

!1n

SameCombination

SameCombination

SameCombination

!

!

1n

n

Therefore since the order of the objects in the boxes is not important, and becausethe box no.1 can contain only n1 objects, the number of combination are

234



Since the order of the objects in the boxes is not important, to obtain the number ofdifferent combinations, we must divide the total number of arrangements n! by n1!, becauseof box no.1, by n2!, because of box no.2, and so on, until nk! because of box no.k, to obtain

!!!

!

21 knnn

n

Combinations

to BernoulliTrials

To GeneralizedBernoulli

Trials

Table of Content

235


References

[1] W.B. Davenport, Jr., and W.I. Root, “ An Introduction to the Theoryof Random Signals and Noise”, McGraw-Hill, 1958

[2] A. Papoulis, “ Probability, Random Variables and Stochastic Processes”, McGraw-Hill, 1965

[4] S.M. Ross, “ Introduction to Probability Models”, 4th Ed., AcademicPress, 1989

[6] R.M. McDonough, and A.W. Whalen, “ Detection of Signals in Noise”,2nd Ed., Academic Press, 1995

[7] Al. Spătaru, “ Teoria Transmisiunii Informaţiei – Semnale şi Perturbaţii”,(romanian), Editura Tehnică, Bucureşti, 1965

[8] http://www.york.ac.uk/depts/maths/histstat/people/welcome.htm

[9] http://en.wikipedia.org/wiki/Category:Probability_and_statistics

[10] http://www-groups.dcs.st-and.ac.uk/~history/Biographies

[3] K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley & Sons, 1988

[5] S.M. Ross, “ A Course in Simulation”, Macmillan & Collier MacmillanPublishers, 1990

Table of Content

236


Integrals Used in Probability

!1

!!1

1

0 mn

mnduuu mn

2

1expexp

aa

xxadxxax

a

x

a

x

axadxxax

2

23

2 22expexp

2

1exp

0

2

dxx

02

1exp

0

2

aa

dxxa

dxx 2exp

0exp 2

aa

dxxa

,3,2,1,0!exp0

nndxxx n ,3,2,1,0,0!

exp1

0

naa

ndxxax

n

n

237


Gamma Function

238


Incomplete Gamma Function

אפריל 23 239

SOLO

TechnionIsraeli Institute of Technology

1964 – 1968 BSc EE1968 – 1971 MSc EE

Israeli Air Force1970 – 1974

RAFAELIsraeli Armament Development Authority

1974 – 2013

Stanford University1983 – 1986 PhD AA

240


Ferdinand Georg Frobenius (1849 –1919)

Perron–Frobenius Theorem

In linear algebra, the Perron–Frobenius Theorem, named after Oskar Perron and Georg Frobenius, asserts that a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector has strictly positive components. This theorem has important applications to probability theory (ergodicity of Markov chains) and the theory of dynamical systems (subshifts of finite type).

Oskar Perron (1880 – 1975)


Monte Carlo Categories

1. Monte Carlo Calculations

Design various random or pseudo-random number generators.

2. Monte Carlo Sampling

Develop efficient (variance – reduction oriented) sampling techniques for estimation.

3. Monte Carlo Optimization

Optimize some (non-convex, non-differentiable) functions, to name a few,Simulated annealing, dynamic weighting, genetic algorithms.

Introduction to Mathematical Probability

Science

Transcript of Introduction to Mathematical Probability