Download - STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

STAT 311: Sampling Distributions

Y. Samuel Wang

Summer 2016

Y. Samuel Wang STAT 311: Sampling Distributions

Page 2: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Logistics

Midterm passed back at end of class

Homework posted later today, due next Friday

Page 3: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Long run behavior

Each individual outcome of a random variable is unpredictable

Probability models define the long run behavior

How can we apply these ideas to real world data which wegather

Theorem

Law of Large Numbers: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables with finitemean µ and finite variance σ2. Define

x̄n =X1 + X2 + . . .Xn

n

then for any ε, δ > 0 there exists some n such that

P(|X̄n − µ| < ε) = 1− δ

Page 4: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Long run behavior

Each individual outcome of a random variable is unpredictable

Probability models define the long run behavior

How can we apply these ideas to real world data which wegather

Theorem

Law of Large Numbers: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables with finitemean µ and finite variance σ2. Define

x̄n =X1 + X2 + . . .Xn

n

then for any ε, δ > 0 there exists some n such that

P(|X̄n − µ| < ε) = 1− δ

Page 5: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Implications of the LLN

Given a large enough sample, we can achieve any level ofprecision (determined by ε and δ)

Exactly how large n has to be depends on σ2, which inpractice we don’t always know

Typically, taking a larger sample will give us a more preciseestimate of the true parameters

Sample statistic will “converge” to the truth

LLN cannot make up for bad study design

Central Limit Theorem

Let n ≥ 1, and let X1,X2 . . .Xn be a sequence of mutuallyindependent random variables such that E (Xi ) = µi andVar(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn

Note that because we are summing random variables,E (Sn) = µ1 + µ2 . . . µn

Since the random variables are independentVar(Sn) = σ21 + σ22 + . . . σ2n

If Y = Sn − E (Sn), then E (Y ) = 0 and Var(Y ) = Var(Sn)

If Z = Y /√

Var(Sn), then E (Z ) = 0 and Var(Z ) = 1

Can we say more?

Page 7: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Let n ≥ 1, and let X1,X2 . . .Xn be a sequence of mutuallyindependent random variables such that E (Xi ) = µi andVar(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn

Note that because we are summing random variables,E (Sn) = µ1 + µ2 . . . µn

Since the random variables are independentVar(Sn) = σ21 + σ22 + . . . σ2n

If Y = Sn − E (Sn), then E (Y ) = 0 and Var(Y ) = Var(Sn)

If Z = Y /√

Var(Sn), then E (Z ) = 0 and Var(Z ) = 1

Can we say more?

Theorem

Central Limit Theorem: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables such thatE (Xi ) = µi and Var(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn

Zn =Sn − E (Sn)√

Var(Sn)→d N(0, 1)

The →d symbol means converges in distribution. This means thatthe distribution of my random variable Zn, which is simply formedfrom the Xi ’s, begins to look more and more like a standardnormal distribution as n increases.

What if all my random variables are identical?

E (Sn) = nµ

Var(Sn) = nσ2

Zn =Sn − nµ√

nσ2=

n( 1nSn − µ)√n√σ2

=√n

(x̄n − µ)

σ

The distribution of (almost) any sample mean goes to a standardnormal distribution when we account for the mean and standarddeviation of the population.

What if all my random variables are identical?

E (Sn) = nµ

Var(Sn) = nσ2

Zn =Sn − nµ√

nσ2=

n( 1nSn − µ)√n√σ2

=√n

(x̄n − µ)

σ

The distribution of (almost) any sample mean goes to a standardnormal distribution when we account for the mean and standarddeviation of the population.

Page 11: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Caveats

Note that this does say anything about individual randomvariables

This only makes a statement about the x̄n as n grows large

Imagine taking many samples of size n = 100 and plot thehistogram, that will look roughly normal; then take manysamples of n = 1000, that will look even more like a normaldistribution, etc.

How large n has to be for a “roughly normal” histogram of byx̄n depends on many things, but as long as n keeps growing, itwill eventually get there

Page 12: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Central Limit Theorem Example

For a given sample size n, take the sample average of nexponential random variables with rate = 1. Then repeat with anew sample 1000 times.

Page 13: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Page 14: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

Page 15: STAT 311: Sampling Distributions · Central Limit Theorem What if all my random variables are identical? E(S n) = n Var(S n) = n˙2 Z n = S nnn p n˙2 = (1 n S ) p n p ˙2 = p n (

For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.