Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1,...

23
Meeting 10: Wrapping up Assignment 1

Transcript of Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1,...

Page 1: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Meeting 10:

Wrapping up Assignment 1…

Page 2: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Assignment 1, Task 1

A bank official is concerned about the rate at which the bank’s tellers provide service

for their customers.

He feels that all of the tellers work at about the same speed, which is either 30, 40 or

50 customers per hour.

States of the world: speed = ሚ𝜆 {30, 40, 50} (customers/hour)

Furthermore, 40 customers per hour is twice as likely as each of the two other

values, which are assumed to be equally likely.

Prior probabilities: 𝑃 ሚ𝜆 = 40 = 2 ∙ 𝑃 ሚ𝜆 = 30 = 2 ∙ 𝑃 ሚ𝜆 = 50

𝑃 ሚ𝜆 = 30 = 0,25 ; 𝑃 ሚ𝜆 = 40 = 0,5 ; 𝑃 ሚ𝜆 = 50 = 0,25

In order to obtain more information, the official observes all five tellers for a two-

hour period, noting that 380 customers are served during that period.

380 is the sum of all tellers work during 2 hours

Should be interpreted as same average speed

Show that the posterior probabilities for the three possible speeds are

approximately 0.000045, 0.99996 and 0.00000012 respectively.

Page 3: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Serving customers can be assumed to be a Poisson process

It is actually a process that takes a break when there are no customers waiting, and

part of a birth-and-death process. However, in this examples we can set the number

of customers served during 2 hours as approximately equal to the number of

customers arriving during the same time.

Let 𝑥𝑖 denote the number of customers served by teller i during the two hours.

Then, 𝑥𝑖 is Poisson distributed with mean ሚ𝜆 ∙ 2.

Assume further that given ሚ𝜆 = 𝜆 , the number of customers served by one teller is

conditionally independent of the number of customers served by another teller.

Let 𝑦 denote the total number of customers served by the tellers during the two

hours.

Then, 𝑦 = σ𝑖=15 𝑥𝑖 , which is Poisson distributed with mean 5 ∙ ሚ𝜆 ∙ 2 = 10 ሚ𝜆.

Page 4: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Likelihoods:

𝐿 𝜆; 𝑦 = 𝑃 𝑦 = 𝑦ห ሚ𝜆 = 𝜆 =10𝜆 𝑦

𝑦!⋅ 𝑒−10𝜆

⇒ 𝐿 𝜆 = 30; 𝑦 = 380 = 𝑃 𝑦 = 380ห ሚ𝜆 = 30 =10 ∙ 30 380

380!⋅ 𝑒−10∙30 =

300380

380!⋅ 𝑒−300

𝐿 𝜆 = 40; 𝑦 = 380 = 𝑃 𝑦 = 380ห ሚ𝜆 = 40 =400380

380!⋅ 𝑒−400

𝐿 𝜆 = 50; 𝑦 = 380 = 𝑃 𝑦 = 380ห ሚ𝜆 = 50 =500380

380!⋅ 𝑒−500

Issue: It is not possible to calculate factorials or powers of these magnitudes using

a calculator or manually using mathematical operators in software.

But, using dpois in R works: > dpois(x=380,lambda=300)

[1] 1.103549e-06

> dpois(x=380,lambda=400)

[1] 0.01230448

> dpois(x=380,lambda=500)

[1] 3.064927e-09

However, R was not invented when the book was written.

Page 5: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Posterior probabilities:

𝑃 ሚ𝜆 = 𝜆𝑖ȁ 𝑦 = 𝑦 =𝑃 𝑦 = 𝑦 ቚ ሚ𝜆 = 𝜆𝑖 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑖

σ𝑗=13 𝑃 𝑦 = 𝑦 ቚ ሚ𝜆 = 𝜆𝑗 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑗

=

10 ⋅ 𝜆𝑖𝑦

𝑦! 𝑒−10⋅𝜆𝑖 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑖

σ𝑗=13 10 ⋅ 𝜆𝑗

𝑦

𝑦! 𝑒−10⋅𝜆𝑗𝑖 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑗

=

10 ⋅ 𝜆𝑖𝑦𝑒−10⋅𝜆𝑖 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑖

σ𝑗=13 10 ⋅ 𝜆𝑗

𝑦𝑒−10⋅𝜆𝑗 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑗

=

10 ∙ 𝜆1 = 10 ∙ 30 = 300 = 3 ∙ 10010 ∙ 𝜆2 = 10 ∙ 40 = 400 = 4 ∙ 10010 ∙ 𝜆3 = 10 ∙ 50 = 500 = 5 ∙ 100

=

100 ⋅ Τ𝜆𝑖 10𝑦𝑒−10⋅𝜆𝑖 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑖

σ𝑗=13 100 ⋅ Τ𝜆𝑖 10

𝑦𝑒−10⋅𝜆𝑗 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑗

=100𝑦 ∙ Τ𝜆𝑖 10 𝑦𝑒−10⋅𝜆𝑖 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑖

σ𝑗=13 100𝑦 ∙ Τ𝜆𝑖 10 𝑦𝑒−10⋅𝜆𝑗 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑗

=

Τ𝜆𝑖 10 𝑦𝑒−10⋅𝜆𝑖 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑖

σ𝑗=13 Τ𝜆𝑖 10 𝑦𝑒−10⋅𝜆𝑗 ⋅ 𝑃 ሚ𝜆 = 𝜆𝑗

The factorials disappear!

The intractable powers are

replaced by tractable ones !

Page 6: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Thus,

𝑃 ሚ𝜆 = 30ȁ 𝑦 =380 =3380𝑒−300 ⋅ 0.25

3380𝑒−300 ⋅ 0.25 + 4380𝑒−400 ⋅ 0.5 + 5380𝑒−500 ⋅ 0.25≈ 0.0000448

𝑃 ሚ𝜆 = 40ȁ 𝑦 =380 =4380𝑒−400 ⋅ 0.5

3380𝑒−300 ⋅ 0.25 + 4380𝑒−400 ⋅ 0.5 + 5380𝑒−500 ⋅ 0.25≈ 0.999955

approx.5.8073 ⋅ 1054

5.8076 ⋅ 1054

approx.2.6042 ⋅ 1050

5.8076 ⋅ 1054

𝑃 ሚ𝜆 = 50ȁ 𝑦 =380 =5380𝑒−500 ⋅ 0.25

3380𝑒−300 ⋅ 0.25 + 4380𝑒−400 ⋅ 0.5 + 5380𝑒−500 ⋅ 0.25≈ 0.00000012

approx.7.2327 ⋅ 1047

5.8076 ⋅ 1054

Page 7: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Alternatively…

Since the first edition of the textbook was published in 1972, it was not expected

that one could manage powers with very high (380) or very low (–300, –400, –300)

exponents – there were no pocket calculators or PC:s

Use normal approximation:

𝑃 𝑦 = 𝑦 𝑦 ∈ Po 𝜇 ≈ 𝑃𝑦 − 0,5 − 𝜇

𝜇< ǁ𝑧 ≤

𝑦 + 0,5 − 𝜇𝜇

ǁ𝑧 ∈ 𝑁 0,1

if 𝜇 > 15

⇒ 𝑃 𝑦 = 𝑦 𝑦 ∈ Po 𝜇 ≈ Φ𝑦 + 0,5 − 𝜇

𝜇− Φ

𝑦 − 0,5 − 𝜇

𝜇

Page 8: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

𝑃 ሚ𝜆 = 30ȁ 𝑦 =380 ≈

Φ 4,69 − Φ 4,65 ⋅ 0.25

Φ 4,69 − Φ 4,65 ⋅ 0.25 + Φ −0,98 − Φ −1,03 ⋅ 0.5 + Φ −5,34 − Φ −5,39 ⋅ 0.25

≈ with very extended tables ≈

0.9999986 − 0,9999983 ⋅ 0.25

0.9999986 − 0,9999983 ⋅ 0.25 + 0,1635 − 0,1515 ⋅ 0.5 + 0,000000046 − 0,000000035 ⋅ 0.25

≈ 0,000031

The likelihood are then approximated as

𝐿 𝜆 = 30; 𝑦 = 380 ≈ Φ380,5 − 300

300− Φ

379,5 − 300

300≈ Φ 4,69 − Φ 4,65

𝐿 𝜆 = 40; 𝑦 = 380 ≈ Φ380,5 − 400

400− Φ

379,5 − 400

400≈ Φ −0,98 − Φ −1,03

𝐿 𝜆 = 50; 𝑦 = 380 ≈ Φ380,5 − 500

500− Φ

379,5 − 500

500≈ Φ −5,34 − Φ −5,39

…and the posterior probabilities as

Page 9: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

𝑃 ሚ𝜆 = 40ȁ 𝑦 =380 ≈

Φ −0,98 − Φ −1,03 ⋅ 0.5

Φ 4,69 − Φ 4,65 ⋅ 0.25 + Φ −0,98 − Φ −1,03 ⋅ 0.5 + Φ −5,34 − Φ −5,39 ⋅ 0.25≈

0,1635 − 0,1515 ⋅ 0.5

0.9999986 − 0,9999983 ⋅ 0.25 + 0,1635 − 0,1515 ⋅ 0.5 + 0,000000046 − 0,000000035 ⋅ 0.25

≈ 0,999968

𝑃 ሚ𝜆 = 50ȁ 𝑦 =380 ≈

Φ −5,34 − Φ −5,39 ⋅ 0.25

Φ 4,69 − Φ 4,65 ⋅ 0.25 + Φ −0,98 − Φ −1,03 ⋅ 0.5 + Φ −5,34 − Φ −5,39 ⋅ 0.25≈

0,000000046 − 0,000000035 ⋅ 0.25

0.9999986 − 0,9999983 ⋅ 0.25 + 0,1635 − 0,1515 ⋅ 0.5 + 0,000000046 − 0,000000035 ⋅ 0.25

≈ 0,0000011

𝑃 ሚ𝜆 = 30ȁ 𝑦 =380 ≈ 0,000031

𝑃 ሚ𝜆 = 40ȁ 𝑦 =380 ≈ 0,999968

𝑃 ሚ𝜆 = 50ȁ 𝑦 =380 ≈ 0,0000011

Hence, with normal approximation: Compare with ”true” Poisson:

𝑃 ሚ𝜆 = 30ȁ 𝑦 =380 ≈ 0,000045

𝑃 ሚ𝜆 = 40ȁ 𝑦 =380 ≈ 0,999955

𝑃 ሚ𝜆 = 50ȁ 𝑦 =380 ≈ 0,00000012

<≈≫

Page 10: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Couldn’t we just ”project” to single tellers?

• 380/5 = 76 customers per teller for 2 hours ?

• 380/10 = 38 customers per teller for one hour ?

With 76:

Likelihoods:

𝑃 𝑥 = 76 ሚ𝜆 = 30 =30 ∙ 2 76

76!∙ 𝑒− 30∙2 =

6076

76!∙ 𝑒−60 ≈ 0,00640

Number of customers served = 𝑥 ∈ Po ሚ𝜆 ∙ 2

𝑃 𝑥 = 76 ሚ𝜆 = 40 =40 ∙ 2 76

76!∙ 𝑒− 40∙2 =

8076

76!∙ 𝑒−80 ≈ 0,04129

𝑃 𝑥 = 76 ሚ𝜆 = 50 =50 ∙ 2 76

76!∙ 𝑒− 50∙2 =

10076

76!∙ 𝑒−100 ≈ 0,00197

Compare with the true likelihoods:

0,0000011 ; 0,01230 ; 0,0000000031

> dpois(x=380,lambda=300)

[1] 1.103549e-06

> dpois(x=380,lambda=400)

[1] 0.01230448

> dpois(x=380,lambda=500)

[1] 3.064927e-09

Page 11: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

…but we must of course extend to five tellers (with independent contributions):

Likelihoods:

𝑃 𝑦 = 380ห ሚ𝜆 = 30 = 𝑃 𝑥 = 76 ሚ𝜆 = 305=

6076

76!∙ 𝑒−60

5

=60380

76! 5 ∙ 𝑒−300 ≈

≈ Powers are to large for R ≈ 0,00640 5 ≈ 0,000000000011

𝑃 𝑦 = 380ห ሚ𝜆 = 40 = 𝑃 𝑥 = 76 ሚ𝜆 = 405=80380

76! 5 ∙ 𝑒−400 ≈ 0,04129 5 ≈ 0,00000012

𝑃 𝑦 = 380ห ሚ𝜆 = 50 = 𝑃 𝑥 = 76 ሚ𝜆 = 505=100380

76! 5 ∙ 𝑒−500 ≈ 0,00197 5 ≈

≈ 0,000000000000030

Not even close to the true likelihoods: 0,0000011 ; 0,01230 ; 0,0000000031

Page 12: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

…but Nota bene!:

The posterior probabilities become

𝑃 ሚ𝜆 = 30ȁ 𝑦 =380 =

60380

76! 5 ∙ 𝑒−300 ∙ 0,25

60380

76! 5 ∙ 𝑒−300 ∙ 0,25 +

80380

76! 5 ∙ 𝑒−400 ∙ 0,5 +

100380

76! 5 ∙ 𝑒−500 ∙ 0,25

=

=60380 ∙ 𝑒−300 ∙ 0,25

60380 ∙ 𝑒−300 ∙ 0,25 + 80380 ∙ 𝑒−400 ∙ 0,5 + 100380 ∙ 𝑒−500 ∙ 0,25=

=20380 ∙ 3380 ∙ 𝑒−300 ∙ 0,25

20380 ∙ 3380 ∙ 𝑒−300 ∙ 0,25 + 20380 ∙ 4380 ∙ 𝑒−400 ∙ 0,5 + 20380 ∙ 5380 ∙ 𝑒−500 ∙ 0,25=

=3380 ∙ 𝑒−300 ∙ 0,25

3380 ∙ 𝑒−300 ∙ 0,25 + 4380 ∙ 𝑒−400 ∙ 0,5 + 5380 ∙ 𝑒−500 ∙ 0,25

Same expression as on slide 6, hence the same result: ≈ 0.0000448

Analogously: 𝑃 ሚ𝜆 = 40ȁ 𝑦 =380 ≈ 0,999955

𝑃 ሚ𝜆 = 50ȁ 𝑦 =380 ≈ 0,00000012

The incorrect factorials cancel

out and the powers have

common factors that cancel out

Page 13: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

But why did the likelihoods go wrong?

𝑃 𝑦 = 380ȁ 𝑦 ∈ Po 10𝜆 =10𝜆 380

380!∙ 𝑒−10𝜆

𝑃 𝑥 = 76ȁ 𝑥 ∈ Po 2𝜆5=

2𝜆 76

76!∙ 𝑒−2𝜆

5

=2𝜆 380

76! 5 ∙ 𝑒−10𝜆

Numerical comparison of 10𝜆 380

380!and

2𝜆 380

76! 5is difficult.

However if 𝑦 ∈ Po 10𝜆 what distribution has 𝑥 = Τ𝑦 5 ?

𝐸 𝑦 = 10𝜆 ; 𝑉𝑎𝑟 𝑦 = 10𝜆

𝐸 𝑥 = ΤΤ𝐸 𝑦 5 = 10𝜆 5 =2𝜆 ; 𝑉𝑎𝑟 𝑥 = Τ𝑉𝑎𝑟 𝑦 25 = Τ10𝜆 25 =0,4𝜆 ≠ 2𝜆

Hence, 𝑥 ∉ Po 2𝜆

Page 14: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

10 000 simulations of y Po(400) (blue) and the corresponding x = y/5

Page 15: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

The mean value ”tunnel vision”

Assume you wish to obtain the total weight of 5 heavy balls that seem identical in

appearance, but only one ball is possible to weigh at a time. Which of these two

methods gives the highest accuracy?

• Randomly sampling one ball, weigh it, and multiply its weight with 5 ?

• Weigh each ball (independently of each other) and sum their weights ?

Let Xi denote the weight of ball i for i = 1, 2, 3, 4, 5. From the visual inspection it

seems reasonable to assume that E(Xi) = and Var(Xi ) = 2 (i.e. the same for all

balls).

Then,

𝐸 5 ∙ 𝑋𝑖 = 5 ∙ 𝐸 𝑋𝑖 = 5 ∙ 𝜇 and 𝑉𝑎𝑟 5 ∙ 𝑋𝑖 = 25 ∙ 𝑉𝑎𝑟 𝑋𝑖 = 25 ∙ 𝜎2

while

𝐸 𝑖=1

5

𝑋𝑖 =𝑖=1

5

𝐸 𝑋𝑖 = 5 ∙ 𝜇 and 𝑉𝑎𝑟 𝑖=1

5

𝑋𝑖 =𝑖=1

5

𝑉𝑎𝑟 𝑋𝑖 = 5 ∙ 𝜎2

Page 16: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Assignment 1, Task 2

Assume that data follows a beta distribution with parameters a and b, i.e. the

probability density function is 𝑓 𝑦 𝑎, 𝑏 = Τ𝑦𝑎−1 1 − 𝑦 𝑏−1 𝐵 𝑎, 𝑏 where

𝐵 𝑎, 𝑏 is the beta function also equal to ΤΓ 𝑎 Γ 𝑏 Γ 𝑎 + 𝑏 . This could be the

case when data consists of the analysed purity (in percent) of a narcotic substance

in a number of seized packages.

a) Show – by using the properties of distributions belonging to the exponential

class – that the probability density function of a conjugate family to the family

of beta distributions can be expressed as

𝑓′ 𝑎, 𝑏 𝛾, 𝛿, 𝜃 ∝𝑒𝛾𝑎+𝛿𝑏

𝐵 𝑎, 𝑏𝜃

Page 17: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

The general expression for the pdf of a conjugate prior distribution when the data

distribution belongs to the exponential class is

𝑓′ 𝜃 = 𝑒σ𝑗=1𝑘 𝐴𝑗 𝜃 ⋅𝛼𝑗+𝛼𝑘+1⋅𝐷 𝜃 +𝐾 𝛼1,…,𝛼𝑘+1

The pdf of the beta distribution is

𝑓 𝑥ȁ𝜃 = 𝑓 𝑥ȁ𝑎, 𝑏 =𝑥𝑎−1 1−𝑥 𝑏−1

𝐵 𝑎,𝑏= 𝑒 𝑎−1 ⋅𝑙𝑜𝑔 𝑥 + 𝑏−1 ⋅𝑙𝑜𝑔 1−𝑥 −𝑙𝑜𝑔 𝐵 𝑎,𝑏 ;

𝑎 > 0, 𝑏 > 0

(i.e. the parameter is two-dimensional: (a, b))

The likelihood from a sample of size n: x = (x1,…,xn) is

𝐿 𝜃; 𝒙, 𝑛 = 𝐿 𝑎, 𝑏; 𝒙, 𝑛 = ς𝑖=1𝑛 𝑓 𝑥𝑖ȁ𝑎, 𝑏 =

𝑖=1

𝑛

𝑒 𝑎−1 ⋅𝑙𝑜𝑔 𝑥𝑖 + 𝑏−1 ⋅𝑙𝑜𝑔 1−𝑥𝑖 −𝑙𝑜𝑔 𝐵 𝑎,𝑏 =

= 𝑒 𝑎−1 σ𝑖=1𝑛 𝑙𝑜𝑔 𝑥𝑖 + 𝑏−1 σ𝑖=1

𝑛 𝑙𝑜𝑔 1−𝑥𝑖 −𝑛⋅𝑙𝑜𝑔 𝐵 𝑎,𝑏

Page 18: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Hence, the density of the conjugate prior distribution can be written

𝑓′ 𝑎, 𝑏ȁ𝛾, 𝛿, 𝜃 = 𝑒 𝑎−1 ⋅𝛼1+ 𝑏−1 ⋅𝛼2−𝛼3⋅𝑙𝑜𝑔 𝐵 𝑎,𝑏 +𝐾1 𝛼1,𝛼2,𝛼3

= 𝑒 𝑎−1 ⋅𝛾+ 𝑏−1 ⋅𝛿−𝜃⋅𝑙𝑜𝑔 𝐵 𝑎,𝑏 +𝐾1 𝛾,𝛿,𝜃

= 𝑒𝑎⋅𝛾+𝑏⋅𝛿−𝜃⋅𝑙𝑜𝑔 𝐵 𝑎,𝑏 +𝐾2 𝛾,𝛿,𝜃 ∝𝑒𝛾⋅𝑎+𝛿⋅𝑏

𝐵 𝑎, 𝑏𝜃

,

where 𝛾 < 0, 𝛿 < 0 and 𝜃 ≥ 0

b) Find the corresponding expression for the probability density function of the posterior

distribution when a sample of size n has been obtained

Page 19: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

The density of the posterior distribution is obtained as

𝑓" 𝑎, 𝑏ȁ𝒙, 𝑛, 𝛾, 𝛿, 𝜃 =𝐿 𝑎, 𝑏; 𝑥, 𝑛 ⋅ 𝑓′ 𝑎, 𝑏ȁ𝛾, 𝛿, 𝜃

𝑢>0𝑣>0

𝐿 𝑢, 𝑣; 𝑥, 𝑛 ⋅ 𝑓′ 𝑢, 𝑣ȁ𝛾, 𝛿, 𝜃 𝑑𝑢𝑑𝑣

that is proportional to

𝐿 𝑎, 𝑏; 𝒙, 𝑛 ⋅ 𝑓′ 𝑎, 𝑏ȁ𝛾, 𝛿, 𝜃 =

= 𝑒 𝑎−1 σ𝑖=1𝑛 𝑙𝑜𝑔 𝑥𝑖 + 𝑏−1 σ𝑖=1

𝑛 𝑙𝑜𝑔 1−𝑥𝑖 −𝑛⋅𝑙𝑜𝑔 𝐵 𝑎,𝑏 ⋅ 𝑒𝑎⋅𝛾+𝑏⋅𝛿−𝜃⋅𝑙𝑜𝑔 𝐵 𝑎,𝑏 ∝

∝ 𝑒𝑎⋅ 𝛾+σ𝑖=1𝑛 𝑙𝑜𝑔 𝑥𝑖 +𝑏 𝛿+σ𝑖=1

𝑛 𝑙𝑜𝑔 1−𝑥𝑖 − 𝜃+𝑛 ⋅𝑙𝑜𝑔 𝐵 𝑎,𝑏

=𝑒 𝛾+σ𝑖=1

𝑛 𝑙𝑜𝑔 𝑥𝑖 ⋅𝑎+ 𝛿+σ𝑖=1𝑛 𝑙𝑜𝑔 1−𝑥𝑖 ⋅𝑏

𝐵 𝑎, 𝑏𝜃+𝑛

for a > 0 and b > 0.

Page 20: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

Since 0 < xi < 1, and log(xi) and log(1–xi) will both be between – and 0 and the

sums σ𝑖=1𝑛 𝑙𝑜𝑔 𝑥𝑖 and σ𝑖=1

𝑛 𝑙𝑜𝑔 1 − 𝑥𝑖 will thus decrease towards – when the

sample size n increases.

This can be illustrated by sampling three sets of observations with 5, 100 and here

400 observations (1000 will cause numerical problems) respectively from a beta

distribution and graph the function

𝑒 𝛾+σ𝑖=1𝑛 𝑙𝑜𝑔 𝑥𝑖 ⋅𝑎+ 𝛿+σ𝑖=1

𝑛 𝑙𝑜𝑔 1−𝑥𝑖 ⋅𝑏

𝐵 𝑎,𝑏𝜃+𝑛

against a and b.

Note that this expression does not constitute a proper density since the normalising

constant is left out.

Graphs of such kind with a = b = 2, and with initial hyper-parameters set to = –1,

= –1 and = 0 are seen on next slide.

Page 21: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers
Page 22: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

y1 <- rbeta(5,shape1=2,shape2=2)

y2 <- rbeta(100,shape1=2,shape2=2)

y3 <- rbeta(400,shape1=2,shape2=2)

prifun <- function(a,b) {

return(exp((-1)*a+(-1)*b-

(0)*log(beta(a,b))))

}

postfun <- function(a,b,y) {

lsum1 <- sum(log(y))

lsum2 <- sum(log(1-y))

return(exp((-1+lsum1)*a+(-1+lsum2)*b-

(0+NROW(y))*log(beta(a,b))))

}

aa <- seq(0.10,4,by=0.02)

bb <- aa

aa_bb <- mesh(aa,bb)

pri_surf <- prifun(aa_bb$x,aa_bb$y)

post_surf1 <- postfun(aa_bb$x,aa_bb$y,y1)

post_surf2 <- postfun(aa_bb$x,aa_bb$y,y2)

post_surf3 <- postfun(aa_bb$x,aa_bb$y,y3)

The following R-script was used to create graphs:

Page 23: Meeting 10: Wrapping up Assignment 1732A66/info/Meeting10_20.pdf · 2020. 10. 9. · Assignment 1, Task 1 A bank official is concerned about the rate at which the bank’s tellers

persp(x=aa,y=bb,z=post_surf1,col="green",colvar=NULL,phi=40,the

ta=40,border="black",xlab="a",ylab="b",zlab="f'(a,b)",box=TRUE,

axes=TRUE,ticktype="detailed",nticks=2,main="prior",cex.main=1.

5,cex.lab=1.3)

persp(x=aa,y=bb,z=post_surf1,col="green",colvar=NULL,phi=40,the

ta=40,border="black",xlab="a",ylab="b",zlab="f''(a,b)",box=TRUE

,axes=TRUE,ticktype="detailed",nticks=2,main="sample

size=5",cex.main=1.5,cex.lab=1.3)

readline("For next graph...Press Enter")

persp(x=aa,y=bb,z=post_surf2,col="green",colvar=NULL,phi=40,the

ta=40,border="black",xlab="a",ylab="b",zlab="f''(a,b)",box=TRUE

,axes=TRUE,ticktype="detailed",nticks=2,main="sample

size=100",cex.main=1.5,cex.lab=1.3)

readline("For next graph...Press Enter")

persp(x=aa,y=bb,z=post_surf3,col="green",colvar=NULL,phi=40,the

ta=40,border="black",xlab="a",ylab="b",zlab="f''(a,b)",box=TRUE

,axes=TRUE,ticktype="detailed",nticks=2,main="sample

size=400",cex.main=1.5,cex.lab=1.3)

…but you can most probably write better scripts than I ☺