Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem...

32
Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018

Transcript of Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem...

Page 1: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Lecture 18:Central Limit TheoremLisa YanAugust 6, 2018

Page 2: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Announcements

PS5 due today

◦ Pain poll

PS6 out today

◦ Due next Monday 8/13 (1:30pm) (will not be accepted after Wed 8/15)

◦ Programming part: Java, C, C++, or Python

Lecture Notes published for all material

2

Page 3: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Summary from last time

! " ≥ $ ≤ & '( for all $ > 0 if " ≥ 0 (Markov’s inequality)

! |" − -| ≥ . ≤ /010 for all . > 0 (Chebyshev’s inequality)

if 2 " = -, Var " = 45

! " ≥ $ ≤ /0/06(0 for all $ > 0 (One-sided Chebyshev’s

if 2 " = 0, Var " = 45 inequality)

2 7 " ≥ 7 2 " if 7′′ 9 ≥ 0 for all x (Jensen’s inequality)

3

(and others)

Page 4: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Bounds on sample mean

Consider a sample of n i.i.d random variables X1, X2, ... Xn, whereXk has distribution F with E[Xk] = µ and Var(Xk) = s2

Sample mean !" = $%∑'($

% "'

) !" = * , Var !" = +,%

4

Page 5: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Summary from last time

Weak Law of Large Numberslim$→&' () − + ≥ - = 0

à “probability goes to zero in the limit”à Will still have a deviation in the limit (at most -)à We have an infinite number of non-zero probabilitiesStrong Law of Large Numbers

' lim$→& () = + = 1à Implies WLLNà After some number n, ' () − + ≥ - = 0à We have a finite number of non-zero probabilities

5

Page 6: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Where are we going with all of this?As engineers, we want to:

1. Model things that can be random.

2. Find average values of situations that let us make good decisions.We perform multiple experiments to help us with engineering.

Counts and averages are both sums of RVs.

6

(RVs)

(expectation)

Weeks 4 and 5:

Learning how to model multiple RVs

This week (week 6):• Finding expectation

of complex situations• Defining sampling and

using existing data• Mathematical bounds

w.r.t modeling the average

Week 7: bringing it all together

• Sampling + average= Central Limit Theorem

• Samples + modeling= finding the best model

parameters given data

Page 7: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Goals for today

Central Limit Theorem!◦Confidence intervals, re-defined◦ Estimates for sums of IID RVsIntroduction to Parameter estimation

7

Page 8: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Central Limit TheoremIf ! "# = %,Var "# = '(,

Sample means of IID RVs are normally distributed.

)" = 1+,#-.

/"# ∼ 1 %, '

(

+Sums of IID RVs are normally distributed.

,#-.

/"# ∼ 1 +%, +'(

8

Page 9: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

CLT explains a lot

9

Normal approximationto Binomial when

np(1-p) > 100.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

70 80 90 100 110 120 130 140 150

P(X

= x)

x

!~Bin # = 200, ( = 0.6

Page 10: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

A short history of the CLT

10

1700

1800

1900

2000

1733: CLT for X ~ Ber(1/2) postulated by Abraham de Moivre

1823: Pierre-Simon Laplace extends de Moivre’swork to approximating Bin(n,p) with Normal

1901: Alexandr Lyapunov provides precisedefinition and rigorous proof of CLT

2012: Drake releases “Started from the Bottom”• As of July 4th, he has a total of 190 songs• Mean quality of subsamples of songs is normally distributed

(thanks to the Central Limit Theorem)

Page 11: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Implications of CLT

Anything that is a sum/average of independent random variables is normal

…meaning in real life, many things are normally distributed:◦ Exam scores: sum of individual problems◦Movie ratings: averages of independent viewer scores◦ Polling:◦ Ask 100 people if they will vote for candidate X◦ p1 = # “yes”/100

◦ Sum of Bernoulli RVs (each person independently says “yes” w.p. p)

◦ Repeat this process with different groups to get p1, …, pn (different sample statistics)

◦ Normal distribution over sample means pk

◦ Confidence interval: “How likely is it that an estimate for true p is close?”

11

Page 12: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

No more convolution when ! → ∞!

For independent discrete random variables X and Y, and Z = X + Y,

$% & = ( ) + + = & =,-$.(0)$2(& − 0) =,

4$.(& − 5)$2(5)

pZ is defined as the convolution of pX and pY.

Sums of IID RVs are normally distributed.

,678

9)6 ∼ ; !<, !>?

12

(caveat: must be independent andidentically distributed)

Page 13: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

CLT vs. Water speed statWater Pokemon speed statistic, !" :• # !" = 65, Var !" = 510• 126 Water Pokemon• Let Y = average of 50 Water Pokemon.What is the distribution of Y?

13

* = +! = ,-∑"/,

- !"# * = # !" = 65Var * = Var 01

- = 2,323 = 10.2

+! = 167"/,

-!" ∼ 9 :, ;

<

6

Solution:

* ∼ 9 65,10.2

Page 14: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Confidence interval, fully defined

Consider a sample of IID RVs X1, X2, …, Xn, where n is large,

"# = %&∑()%

& #(, Var "# = *+& ,- = %

&.%∑()%& #( − "# -

, SE = 2&

def For large n, a 100(1–α)% confidence interval of 3 is defined as:

"# − 45/- 7 SE , "# + 45/- 7 SE"# ± 45/- 7 (SE)

where Φ(45/-) = 1 − (>/2).

ex: 95% confidence interval: "# ± 1.96(SE)> = 0.05, >/2 = 0.025 Φ 45/- = 0.975, 45/- = 1.96In other words: 95% of time that we sample, true 3 will be in interval "# ± 1.96(SE)

NOT: 3 is 95% likely to be in this particular interval. 14

Page 15: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

SE with confidence intervalIdle CPUs are the bane of our existence.• A large company monitors 225 computers (single CPU) for idle hours in order to

estimate the average number of idle hours per CPU.• Sample had !" = 11.6 hrs, '( = 16.81 hrs2 (so ' = 4.1 hrs)

Give the 90% confidence interval for the estimate of +, mean idle hrs/CPU.

15

, = 0.10, ,/2 = 0.05 Φ 34/( = 0.95, 34/( = 1.645Standard error, SE = 8

9 =:.;((< =

:.;;<

90% confidence interval: !" ± 34/((SE)à 11.6 ± 1.645 :.;;<

11.15, 12.05Interpret: 90% of time that such an interval is computed, the true + is it.

Solution:

Page 16: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

BreakAttendance: t inyurl.com/cs109summer2018

16

Page 17: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Sampling distribution

CLT tells us sampling distribution of !" is approx. normal when n is large.

“large” n: n > 30

(larger is better): n > 100

We can also use CLT to decide values of n for good confidence intervals

17http://onlinestatbook.com/stat_sim/sampling_dist/

Page 18: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Estimating clock running timeWant to test runtime of algorithm.• Run algorithm n times and measure average time• Know variance is !" = 4 sec2

• Want to estimate mean runtime: % = &How large should n be so that P(average time is within 0.5 of t) = 0.95?

18

'( = )*∑,-)

* (, ∼ / &, 1* (by Central Limit Theorem)

2 & − 0.5 ≤ '( ≤ & + 0.5 = 0.95= 2 −0.5 ≤ '( − & ≤ 0.5 ( '( − &) ∼ / 0, 1*= 2 −<.= *

" ≤ > ≤ <.= *"

'( = 1@A,-)

*(, ∼ / %, !

"

@

Solution:

(SD: "*)

> ='( − &2/ @~/(0,1)

Page 19: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Estimating clock running timeWant to test runtime of algorithm.• Run algorithm n times and measure average time• Know variance is !" = 4 sec2

• Want to estimate mean runtime: % = &

How large should n be so that P(average time is within 0.5 of t) = 0.95?

19

' & − 0.5 ≤ -. ≤ & + 0.5 = ' −0.5 ≤ -. − & ≤ 0.5 = ' −0.1 2

"≤ 3 ≤

0.1 2

"= 0.95

= Φ2

6− Φ

7 2

6= Φ

2

6− 1 − Φ

2

6= 2Φ

2

6− 1 = 0.95

⇒ Φ2∗

6= 0.975 Solve for n*:

2∗

6= 1.96 ⇒ >∗ = 7.84 " = 62

What say you, Chebyshev?

' -. − & ≤ 0.5 ≥ 0.95 ⇒ ' -. − & ≥ 0.5 ≤6/2

0.1 B ≤ .05

16/> ≤ 0.05 ⇒ > ≥ 320

-. =1

>DEFG

2

.E ∼ I %,!"

>

Solution:

-. =G

2∑EFG2 .E ∼ I &,

6

2

Thanks for playing, Pafnuty!

Page 20: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

The power of the central limit theorem

Sample means of IID RVs are normally distributed.!" = 1

%&'()

*"' ∼ , -, /

0

%• Can estimate confidence interval• Can give probabilities on sample mean

Sums of IID RVs are normally distributed.

&'()

*"' ∼ , %-, %/0

• Can estimate any sum of IID RVs (if -, /0 known)

20

our nextgoal

Page 21: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Crashing websiteLet X = number of visitors to a website/minute: X~Poi(100)• The server crashes if ≥ 120 requests/minute.What is P(crash in next minute)?

21

Recall: sum of IID Poissons is PoissonCLT says sums are normally distributed (as n à ∞)

Let:

" # ≥ 120 ≈ " ) ≥ 119.5 = " ./011011 ≥ 002.3/011

011 = 1 − Φ 1.95 ≈ 0.0256

Solution 1: (exact, using Poisson)

Solution 2: (using CLT)

7890

:

#8 ∼ < =>, =@A

" # ≥ 120 = 7B90A1

CD/011 100 B

E!

Poi(100)~7B90

:

Poi(100/=) Using CLT on discrete sum à continuity correction

Attempted solution 3: (using one-sided Chebyshev)

" # ≥ 120 = " # ≥ J # + L ≤@A

@A + LA =100

100 + 20A = 0.2

≈ 0.0282

Page 22: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

possible

continuity correction

Summary: Approximations

22

Binomial:n trials

P(success) = p,Independent trials

Normal:p medium, np(1–p)>10

Independent trials

Poisson:p < 0.05, n > 20 AND/OR

slight dependence

continuity

correction

Poisson:Poisson = sum of IID Poissons

where n is arbitrarily large Normal:(by Central Limit Theorem)

Sum of IID RVswith known E[X], Var(X)

continuity correction

Page 23: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Another dice gameYou will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10

Win if: X ≤ 25 or X ≥ 45What is P(win)?

23

Roll!!!!!!!

And now the truth (according to CLT)…

Page 24: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Another dice game

24

WTF: ! " ≤ 25 or " ≥ 45 = 1 − !(25 < " < 45)

By CLT, " = ∑/012 "/ approx. 3 45, 478 where E "/ = 5 = 3.5,Var "/ = 78 = <=18

" ∼ 3 10 3.5 = 35, 10 35/12 = 350/12

1 − ! 25 < " < 45 ≈ 1 − ! 25.5 ≤ " ≤ 44.5

= 1 − ! 8=.=B<=<=C/18

≤ DB<=<=C/18

≤ EE.=B<=<=C/18

≈ 1 − 2Φ 1.76 − 1 ≈ 2 1 − 0.9608 = 0.0784

Solution:

continuity correction

You will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10

Win if: X ≤ 25 or X ≥ 45What is P(win)?

Page 25: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Another dice game

25

! " ≤ 25 or " ≥ 45 = 1 − !(25 < " < 45) = 0.0784Solution:

You will roll 10 6-sided dice (X1, X2, …, X10)Let: X = total value of all 10 dice = X1 + X2 + … + X10

Win if: X ≤ 25 or X ≥ 45What is P(win)?

Page 26: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Finding Statistics

26

What do you know? What statistic

can you find?

Distribution

with parameters

Anything your

heart desiresRVs

Sample of

populationBootstrapping

Estimates of

E[X] or Var(X),

confidence

interval, p-values

E[X] or

Var(X)

Probability

bounds

CLT

Sum of IID RVs?

Page 27: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

Now…

27

What do you know? What statisticcan you find?

Distributionwith parameters

Anything yourheart desiresRVs

What if you don’t knowthe parameters of your distribution?

Page 28: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Parameter estimationdef parametric model – probability distribution that has parameters θ.

28

Distribution

• Ber(p)

• Poi(λ)

• Multinomial(p1, p2, …, pm)

• Unif(α, β)

• Normal(µ, σ2)

• etc.

Model

Bernoulli

Poisson

Multinomial

Uniform

Normal

Parameters θ! = #! = $

! = #%, #', … , #)! = *, +! = ,, -'

Note θ can bea vector of parameters!

Page 29: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

“Point estimate” of parameter:Find the best single value for parameter estimate (as opposed to distribution of !")• Better understand of process producing data• Future predictions based on model• Simulation of future processes

Why do we care?

def Parameter estimation

1. We observe data that has a known model with unknown parameter ".

2. Use data to estimate model parameters as !".Note: estimators !" are RVs estimating parameter "

(e.g., sample mean: !" = $% estimates population mean " = &)

29

Machine Learning:Learn model

of systemDo stuff

with model

Page 30: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Estimator biasdef bias of estimator: ! "# − #• “unbiased” estimator: bias = 0

Example Sample MeanEstimator: '( = )

*∑,-)* (, Estimating: .

Bias: ! '( = .Example 2 Sample Variance

Estimator: /0 = )*1)∑2-)

* (2 − '( 0 Estimating: 30

Bias: ! /0 = 30

30

(unbiased estimator)

(unbiased estimator)

Page 31: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Estimator consistencydef consistent estimator: lim$→&' () − ) < , = 1 for , > 0• As we get more data, the estimate () should deviate from the true )

by at most a small amount. • Actually known as “weak” consistency

Example Sample MeanEstimator: 12 = 3

$∑563$ 25 Estimating: 7

By strong law of large numbers (SLLN): ' lim$→& 12 = 7 = 1Implying weak law of large numbers (WLLN): lim$→&' 12 − 7 ≥ , = 0 for , > 0Equivalently: lim$→&' 12 − 7 < , = 1 for , > 0

31(consistent estimator)

Page 32: Lecture 18: Central Limit Theorem - Stanford University€¦ · Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018. Announcements PS5 due today Pain poll PS6 out today Due

!

Sample variance consistencydef biased estimator: ! "# − # ≠ 0def consistent estimator: lim*→,- "# − # < / for / > 0

As 1 → ∞, estimate "# should deviate from true # by at most a small amount.

Are the below two estimators for variance 34 biased? Consistent?

1. 54 = 7*87∑:;7

* <: − =< 4 (sample variance)

unbiased: ! 54 = 34consistent: 54 = *

*877* ∑:;7

* <:4 − =<4 ,

- lim*→,7*∑:;7

* <:4 = ! <4 = 1, - lim*→, =<4 = @4 = 1 (by SLLN, <: i.i.d.)

2. A = 7*∑:;7

* <: − <̄ 4

biased: ! A = ! *87* 54 = *87

* 34consistent: for similar reasons as above #1 , where now lim*→,

*87* = 1 32