Probability inequalities --- Law of Large...

Post on 22-May-2020

6 views 0 download

Transcript of Probability inequalities --- Law of Large...

Probability inequalities

--- Law of Large Numbers

May 15, 2019

来嶋 秀治 (Shuji Kijima)

Dept. Informatics,

Graduate School of ISEE

Todays topics

• expectation,

• Markov’s inequality

• variance, covariance, moment

• Chebyshev’s inequality

• Law of large numbers

確率統計特論 (Probability & Statistics)

Lesson 4

Expectation, variance, moment

Today’s topic 2

Expectation3

Expectation (期待値) of a discrete random variable X is defined by

E 𝑋 =

𝑥∈Ω

𝑥 ⋅ 𝑓 𝑥

only when the right hand side is converged absolutely (絶対収束),

i.e., σ𝑥∈Ω 𝑥 ⋅ 𝑓 𝑥 < ∞ holds.

If it is not the case, we say “expectation does not exist.”

Expectation (期待値) of a continuous random variable X is defined by

E 𝑋 = න−∞

+∞

𝑥 ⋅ 𝑓 𝑥 d𝑥 .

Compute expectations of distributions4

*Ex 2.

Discrete

(*i) Bernoulli distribution B 1, 𝑝 .

(*ii) Binomial distribution B 𝑛, 𝑝 .

(iii) Geometric distribution Ge 𝑝 .

(iv) Poisson distribution Po 𝜆 .

Continuous

(v) Exponential distribution Ex 𝛼 .

(vi) Normal distribution N 𝜇, 𝜎2 .

Ex. Expectation of Geom. distr. 5

Thm.

The expectation of 𝑋 ∼ 𝐵 𝑛, 𝑝 is 𝑛𝑝

proof

𝑘=0

𝑛

𝑘𝑛

𝑘𝑝𝑘 1 − 𝑝 𝑛−𝑘 =

𝑘=0

𝑛

𝑘𝑛!

𝑘! 𝑛 − 𝑘 !𝑝𝑘 1 − 𝑝 𝑛−𝑘

=

𝑘=1

𝑛

𝑘𝑛!

𝑘! 𝑛 − 𝑘 !𝑝𝑘 1 − 𝑝 𝑛−𝑘

=

𝑘=1

𝑛𝑛!

(𝑘 − 1)! 𝑛 − 𝑘 !𝑝𝑘 1 − 𝑝 𝑛−𝑘

=

𝑘=1

𝑛

𝑛𝑝(𝑛 − 1)!

(𝑘 − 1)! 𝑛 − 𝑘 !𝑝𝑘−1 1 − 𝑝 𝑛−𝑘

= 𝑛𝑝

𝑘′=0

𝑛−1𝑛 − 1

𝑘′𝑝𝑘

′1 − 𝑝 𝑛−1−𝑘′

= 𝑛𝑝

Ex. Expectation of Geom. distr. 6

Thm.

The expectation of 𝑋 ∼ Ge 𝑝 is 1−𝑝

𝑝.

Proof

E 𝑋 = 0 𝑝 + 1 1 − 𝑝 𝑝 + 2 1 − 𝑝 2𝑝 + 3 1 − 𝑝 3𝑝 +⋯−) 1 − 𝑝 E 𝑋 = 0 1 − 𝑝 𝑝 + 1 1 − 𝑝 2𝑝 + 2 1 − 𝑝 3𝑝 +⋯

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−𝑝E 𝑋 = 1 − 𝑝 𝑝 + 1 − 𝑝 2𝑝 + 1 − 𝑝 3𝑝 +⋯

=1 − 𝑝 𝑝

1 − (1 − 𝑝)= 1 − 𝑝

Thus E 𝑋 =1−𝑝

𝑝.

Properties of Expectations7

Thm.

For an arbitrary constant c,

E 𝑐 = 𝑐E 𝑐𝑋 = 𝑐 ⋅ E 𝑋E 𝑋 + 𝑐 = E 𝑋 + 𝑐

Linearity of expectations (discrete random variables)8

Thm. (linearity of expectation; 期待値の線形性)

E

𝑖=1

𝑛

𝑋𝑖 =

𝑖=1

𝑛

E(𝑋𝑖)

proof.

E 𝑋 + 𝑌

= σ𝑥σ𝑦(𝑥 + 𝑦) Pr 𝑋 = 𝑥 ∩ 𝑌 = 𝑦

= σ𝑥σ𝑦 𝑥𝑓(𝑥, 𝑦) + σ𝑥σ𝑦 𝑦𝑓(𝑥, 𝑦)

= σ𝑥 𝑥 σ𝑦 𝑓(𝑥, 𝑦) + σ𝑦 𝑦σ𝑥 𝑓(𝑥, 𝑦)

= σ𝑥 𝑥𝑓(𝑥) + σ𝑦 𝑦𝑓(𝑦)

= E 𝑋 + E[𝑌]

= σ𝑥σ𝑦 𝑥 + 𝑦 𝑓(𝑥, 𝑦)

Linearity of expectations (continuous random variables)9

Thm. (linearity of expectation; 期待値の線形性)

E

𝑖=1

𝑛

𝑋𝑖 =

𝑖=1

𝑛

E(𝑋𝑖)

proof.

E 𝑋 + 𝑌

= ∞−+∞

∞−+∞

𝑥 + 𝑦 𝑓 𝑥, 𝑦 d𝑥d𝑦

= ∞−+∞

∞−+∞

𝑥𝑓 𝑥, 𝑦 d𝑥d𝑦 + ∞−+∞

∞−+∞

𝑦𝑓 𝑥, 𝑦 d𝑥d𝑦

= ∞−+∞

𝑥 ∞−+∞

𝑓 𝑥, 𝑦 d𝑦 d𝑥 + ∞−+∞

𝑦 ∞−+∞

𝑓 𝑥, 𝑦 d𝑥 d𝑦

= ∞−+∞

𝑥𝑓(𝑥)d𝑥 + ∞−+∞

𝑦𝑓(𝑦)d𝑦

= E 𝑋 + E[𝑌]

Application of linearity of expectation10

Thm.

The expectation of 𝑋 ∼ B(𝑛; 𝑝) is 𝑛𝑝

proof

Suppose 𝑋1, … , 𝑋𝑛 are i.i.d. B(1; 𝑝),

then 𝑌 ≔ 𝑋1 +⋯+ 𝑋𝑛 follows B(𝑛; 𝑝).

E 𝑋𝑖 = 1 ⋅ 𝑝 + 0 ⋅ (1 − 𝑝)

E 𝑌 = E σ𝑖𝑋𝑖 = σ𝑖 E 𝑋𝑖 = σ𝑖 𝑝 = 𝑝𝑛

Moment & Variance

Today’s topic 2

Motivation12

Consider the following three distributions.

Distr. 1.

• Pr 𝑋 = 0 = 1/3

• Pr 𝑋 = 1 = 1/3

• Pr 𝑋 = 2 = 1/3

Distr. 2.

• Pr 𝑋 = 𝑘 = 1/2(𝑘+1)

for 𝑘 = 0,1,2,…

Distr. 3.

•Pr 𝑋 = 0 = 2/3

• Pr 𝑋 = 1 = 0

• Pr 𝑋 = 2𝑘 = 1/4𝑘

for 𝑘 = 1,2,…

E 𝑋 = 1 E 𝑋 = 1 E 𝑋 = 1

Motivation13

Consider the following three distributions.

Distr. 1.

• Pr 𝑋 = 0 = 1/3

• Pr 𝑋 = 1 = 1/3

• Pr 𝑋 = 2 = 1/3

Distr. 2.

• Pr 𝑋 = 𝑘 = 1/2(𝑘+1)

for 𝑘 = 0,1,2,…

Distr. 3.

•Pr 𝑋 = 0 = 2/3

• Pr 𝑋 = 1 = 0

• Pr 𝑋 = 2𝑘 = 1/4𝑘

for 𝑘 = 1,2,…

E 𝑋 = 1

Pr 𝑋 > 1 = 1/3

Pr 𝑋 > 2 = 0

Pr 𝑋 > 1000 = 0

E 𝑋 = 1

Pr 𝑋 > 1 = 1/4

Pr 𝑋 > 2 = 1/8

Pr 𝑋 > 1000 = 1/512

E 𝑋 = 1

Pr 𝑋 > 1 = 1/3

Pr 𝑋 > 2 = 1/12

Pr 𝑋 > 1000 = 1/192

Definitions14

𝑘-th moment (𝑘次の積率) of 𝑋

E[𝑋𝑘]

variance (分散) of 𝑋

Var 𝑋 ≔ E 𝑋 − 𝐸 𝑋 2

standard deviation (標準偏差) of 𝑋

𝜎 𝑋 ≔ Var 𝑋

covariance (共分散) of 𝑋 and 𝑌

Cov 𝑋, 𝑌 ≔ E (𝑋 − E[𝑋])(𝑌 − E[𝑌])

Compute the variances of distributions15

*Ex 2.

Discrete

(*i) Bernoulli distribution B 1, 𝑝 .

(*ii) Binomial distribution B 𝑛, 𝑝 .

(iii) Geometric distribution Ge 𝑝 .

(iv) Poisson distribution Po 𝜆 .

Continuous

(v) Exponential distribution Ex 𝛼 .

(vi) Normal distribution N 𝜇, 𝜎2 .

Properties of variance and covariance16

Thm.

Var 𝑋 = E 𝑋2 − E 𝑋 2

Cov 𝑋, 𝑌 = E 𝑋𝑌 − E 𝑋 E 𝑌

Var 𝑋 + 𝑌 = Var 𝑋 + Var 𝑌 + 2Cov[𝑋, 𝑌]

E 𝑋 − E 𝑋 2 = E 𝑋2 − 2𝑋E 𝑋 + E 𝑋 2

= E 𝑋2 − 2E 𝑋 E 𝑋 + E 𝑋 2

= E 𝑋2 − E 𝑋 2

Cov 𝑋, 𝑌 = E 𝑋 − E 𝑋 𝑌 − E 𝑌= E 𝑋𝑌 − 𝑋E 𝑌 − 𝑌E 𝑋 + E 𝑋 E 𝑌= E 𝑋𝑌 − 2E 𝑋 E 𝑌 + E 𝑋 E 𝑌= E 𝑋𝑌 − E 𝑋 E[𝑌]

Properties of variance and covariance17

Thm.

Var 𝑋 = E 𝑋2 − E 𝑋 2

Cov 𝑋, 𝑌 = E 𝑋𝑌 − E 𝑋 E 𝑌

Var 𝑋 + 𝑌 = Var 𝑋 + Var 𝑌 + 2Cov[𝑋, 𝑌]

Var 𝑋 + 𝑌 = E 𝑋 + 𝑌 2 − E 𝑋 + 𝑌 2

= E 𝑋2 + 2𝑋𝑌 + 𝑌2 − E 𝑋 + E 𝑌 2

= E 𝑋2 − E 𝑋 2 + E 𝑌2 − E 𝑌 2 + 2E 𝑋𝑌 − 2E 𝑋 E 𝑌= Var 𝑋 + Var 𝑌 + 2Cov[𝑋, 𝑌]

Properties of var and cov (for independent 𝑋 and 𝑌)18

Thm. If 𝑋 and 𝑌 are independent,

E 𝑋𝑌 = E 𝑋 E 𝑌

Cov 𝑋, 𝑌 = 0

Var 𝑋 + 𝑌 = Var 𝑋 + Var 𝑌

𝐸 𝑋𝑌 =

𝑥

𝑦

𝑥𝑦Pr 𝑋 = 𝑥 ∧ 𝑌 = 𝑦

=

𝑥

𝑦

𝑥𝑦 Pr 𝑋 = 𝑥 Pr 𝑌 = 𝑦

=

𝑥

𝑥 Pr 𝑋 = 𝑥

𝑦

𝑦 Pr 𝑌 = 𝑦

= E 𝑋 E[𝑌]

Cov 𝑋, 𝑌 = E 𝑋𝑌 − E 𝑋 E 𝑌= 0

Properties of Var and Cov19

Thm. If 𝑋1, … , 𝑋𝑛 are mutually independent,

Var 𝑋1 +⋯+ 𝑋𝑛 = Var 𝑋1 +⋯+ Var 𝑋𝑛

Linearity of independent variance: binomial distr.20

Thm.

The variance of 𝑋 ∼ B(𝑛; 𝑝) is 𝑛𝑝(1 − 𝑝)

proof

Suppose 𝑋1, … , 𝑋𝑛 are independent and identically distr. B(1; 𝑝),

then 𝑌 ≔ 𝑋1 +⋯+ 𝑋𝑛 follows B(𝑛; 𝑝).

𝐸 𝑋𝑖2 = 12 ⋅ 𝑝 + 02 ⋅ 1 − 𝑝 = 𝑝

Var 𝑋𝑖 = 𝐸 𝑋𝑖2 − 𝐸 𝑋𝑖

2 = 𝑝 − 𝑝2 = 𝑝 1 − 𝑝

Var 𝑌 = Var σ𝑖=1𝑛 𝑋𝑖 = σ𝑖=1

𝑛 Var 𝑋𝑖 = σ𝑖=1𝑛 𝑝 1 − 𝑝 = 𝑛𝑝 1 − 𝑝

Since X and Y are indipendent

Expectation (contd.)

Ex. Coupon collector22

The are 𝑛 kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

•ビックリマンシール

•ポケモンカード

Ex. Coupon collector23

The are 𝑛 kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

Suppose you have already drawn 𝑘 − 1 kinds of coupon.

Let 𝑋𝑘 denote the number of draws from 𝑘 − 1 to 𝑘.

The probability is 𝑝𝑘 ≔𝑛−(𝑘−1)

𝑛

The expected number is

E 𝑋𝑘 =1

𝑝𝑘=

𝑛

𝑛 − 𝑘 + 1

•ビックリマンシール

•ポケモンカード

Thm.

𝑛 ln 𝑛 ≤ 𝐸 𝑋 ≤ 𝑛 1 + ln 𝑛

Ex. Coupon collector24

The are 𝑛 kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

•ビックリマンシール

•ポケモンカード

harmonic number

E 𝑋 = E

𝑖=1

𝑛

𝑋𝑖

=

𝑖

𝑛

E 𝑋𝑖

=

𝑖=1

𝑛𝑛

𝑛 − 𝑖 + 1

= 𝑛

𝑖′=1

𝑛1

𝑖′

ln 𝑛 = න1

𝑛 1

𝑥d𝑥 ≤

𝑘=1

𝑛1

𝑘

1 +

𝑘=2

𝑛1

𝑘≤ 1 +න

1

𝑛 1

𝑥d𝑥 = 1 + ln 𝑛

Ex. Coupon collector25

The are 𝑛 kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

What is the probability of completion after 𝑚 trials?

•ビックリマンシール

•ポケモンカード

Markov’s inequality

Today’s topic 1

Markov’s inequality27

Thm. Markov’s inequality

Let X be a nonnegative random variable, then

Pr 𝑋 ≥ 𝑎 ≤E 𝑋

𝑎holds for any a 0.

Markov’s inequality28

E𝑋

𝑎= න

0

∞ 𝑥

𝑎𝑓(𝑥)d𝑥 = න

0

𝑎 𝑥

𝑎𝑓(𝑥)d𝑥 + න

𝑎

∞ 𝑥

𝑎𝑓(𝑥)d𝑥

≥ න𝑎

∞ 𝑥

𝑎𝑓(𝑥)d𝑥 ≥ න

𝑎

𝑓(𝑥) d𝑥 = Pr[𝑋 ≥ 𝑎]

Pr 𝑋 ≥ 𝑎 ≤ E𝑋

𝑎=E 𝑋

𝑎

Thus,

Proof.

Thm. Markov’s inequality

Let X be a nonnegative random variable, then

Pr 𝑋 ≥ 𝑎 ≤E 𝑋

𝑎holds for any a 0.

Ex. Coupon collector29

The are 𝑛 kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

What is the probability of completion after 𝑚 trials?

•ビックリマンシール

•ポケモンカード

Using Markov’s inequality,

Pr 𝑋 ≥ 𝑚 ≤𝐸 𝑋

𝑚≤𝑛 1 + ln 𝑛

𝑚

e.g., n=100, m=1000,

Pr 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑖𝑜𝑛 ≥ 1 − Pr 𝑋 ≥ 1001 ≃ 0.44

e.g., n=100, m=10000,

Pr 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑖𝑜𝑛 ≥ 1 − Pr 𝑋 ≥ 10001 ≃ 0.94

too loose?

rem.

𝑛 ln 𝑛 ≤ 𝐸 𝑋 ≤ 𝑛 1 + ln 𝑛

Chebyshev’s inequality

Today’s topic 3

Chebyshev’s inequality31

Thm. Chebyshev’s inequality

For any a 0.

Pr 𝑋 − E 𝑋 ≥ 𝑎 ≤Var 𝑋

𝑎2

Remark that

Pr 𝑋 − E 𝑋 ≥ 𝑎 = Pr 𝑋 − E 𝑋 2 ≥ 𝑎2

Using Markov’s inequality,

Pr 𝑋 − E 𝑋 2 ≥ 𝑎2 ≤E 𝑋 − E 𝑋 2

𝑎2=Var 𝑋

𝑎2

proof.

Chebyshev’s inequality32

Cor. Chebyshev’s inequality

For any t 0.

Pr 𝑋 ≥ 1 + 𝑡 E 𝑋 ≤Var 𝑋

𝑡E 𝑋 2

proof.

Pr 𝑋 ≥ 1 + 𝑡 E 𝑋 = Pr 𝑋 − E 𝑋 ≥ 𝑡E[𝑋]

≤ Pr 𝑋 − 𝐸 𝑋 ≥ 𝑡E 𝑋

≤Var 𝑋

𝑡E 𝑋 2

Ex. Coupon collector33

The are n kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

What is the probability of completion after m trials?

•ビックリマンシール

•ポケモンカード

Using Markov’s inequality,

Pr 𝑋 ≥ 𝑚 ≤𝐸 𝑋

𝑚≤𝑛 1 + ln 𝑛

𝑚

e.g., n=100, m=1000,

Pr 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑖𝑜𝑛 = 1 − Pr 𝑋 ≥ 1001 ≃ 0.44

e.g., n=100, m=10000,

Pr 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑖𝑜𝑛 = 1 − Pr 𝑋 ≥ 10001 ≃ 0.94

too loose?

rem.

𝑛 ln 𝑛 ≤ 𝐸 𝑋 ≤ 𝑛 1 + ln 𝑛

Ex. Coupon collector34

The are n kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

What is the probability of completion after m trials?

•ビックリマンシール

•ポケモンカード

Using Chevyshev’s inequality,

Pr 𝑋 ≥ 1 + 𝑡 𝐸[𝑋] ≤Var 𝑋

𝑡E 𝑋 2

rem.

𝑛 ln 𝑛 ≤ 𝐸 𝑋 ≤ 𝑛 1 + ln 𝑛

Ex. Coupon collector35

The are n kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

What is the probability of completion after m trials?

•ビックリマンシール

•ポケモンカード

Var 𝑋

=

𝑖=1

𝑛

Var 𝑋𝑖 =

𝑖=1

𝑛1 − 𝑝𝑖

𝑝𝑖2

𝑖=1

𝑛1

𝑝𝑖2 =

𝑖=1

𝑛𝑛

𝑛 − 𝑖 + 1

2

= 𝑛2

𝑖=1

𝑛1

𝑖2≤ 𝑛2

𝜋2

6

Ex. 2.

Ex. Coupon collector36

The are n kinds of coupons.

How many coupons do you need to draw, in expectation,

before having drawn each coupon at least once ?

What is the probability of completion after m trials?

•ビックリマンシール

•ポケモンカード

Using Chevyshev’s inequality,

Pr 𝑋 ≥ 1 + 𝑡 𝐸[𝑋] ≤Var 𝑋

𝑡E 𝑋 2≤

𝑛2𝜋2

6𝑡2 𝑛 ln 𝑛 2

=𝜋2

6𝑡2 ln 𝑛 2

rem.

𝑛 ln 𝑛 ≤ 𝐸 𝑋 ≤ 𝑛 1 + ln 𝑛

e.g., n=100, m=1000 (𝑡 ≃𝑚

𝑛 ln 𝑛− 1 ≃ 1.1),

Pr[Completion] ≥ 1-Pr[X 1000] 0.95

still loose?

Chernoff’s bound

Law of Large number

Law of large numbers (大数の法則)38

Def.

A series {𝑌𝑛} converges 𝑌 in probability (𝑌に確率収束する), if

∀𝜀 > 0, lim𝑛→∞

Pr 𝑌𝑛 − 𝑌 < 𝜀 = 1

Thm. (law of large numbers; 大数の法則)

Let r.v. 𝑋1, … , 𝑋𝑛 are i.i.d., w/ expectation 𝜇, and variance 𝜎2,

then 𝑌𝑛: =𝑋1+⋯+𝑋𝑛

𝑛converges 𝜇 in probability;

i.e.,

∀𝜀 > 0, lim𝑛→∞

Pr𝑋1 +⋯+ 𝑋𝑛

𝑛− 𝜇 < 𝜀 = 1

independent and identically distributed

(独立同一分布)

39Thm. (low of large numbers; 大数の法則)

Let r.v. 𝑋1, … , 𝑋𝑛 are i.i.d., w/ expectation 𝜇, and variance 𝜎2,

then 𝑌𝑛: =𝑋1+⋯+𝑋𝑛

𝑛converges 𝜇 in probability;

i.e.,

∀𝜀 > 0, lim𝑛→∞

Pr𝑋1 +⋯+ 𝑋𝑛

𝑛− 𝜇 < 𝜀 = 1

E 𝑌 = E𝑋1 +⋯+ 𝑋𝑛

𝑛=E 𝑋1 +⋯+ E 𝑋𝑛

𝑛= 𝜇

Var 𝑌 = Var𝑋1 +⋯+ 𝑋𝑛

𝑛=Var 𝑋1 +⋯+ Var 𝑋𝑛

𝑛2=𝜎2

𝑛

Recall

Let r.v. X1,…,Xn are i.i.d., w/ expectation , and variance 2,

then (X1+…+Xn)/n converge in probability;

i.e.,

∀𝜀 > 0, lim𝑛→∞

Pr𝑋1 +⋯+ 𝑋𝑛

𝑛− 𝜇 < 𝜀 = 1

Thm. (low of large numbers; 大数の法則)40

Using Chebyshev’s inequality,

Pr𝑋1 +⋯+ 𝑋𝑛

𝑛− 𝜇 ≥ 𝜀 ≤

𝜎2

𝑛𝜖2

𝑛→∞0

Thm. Chebyshev’s inequality

For any a 0.

Pr 𝑋 − E 𝑋 ≥ 𝑎 ≤Var 𝑋

𝑎2

E 𝑌 = 𝜇

Var[𝑌] =𝜎2

𝑛