STAT 311: Sampling Distributions
Y. Samuel Wang
Summer 2016
Y. Samuel Wang STAT 311: Sampling Distributions
Logistics
Midterm passed back at end of class
Homework posted later today, due next Friday
Y. Samuel Wang STAT 311: Sampling Distributions
Long run behavior
Each individual outcome of a random variable is unpredictable
Probability models define the long run behavior
How can we apply these ideas to real world data which wegather
Theorem
Law of Large Numbers: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables with finitemean µ and finite variance σ2. Define
x̄n =X1 + X2 + . . .Xn
n
then for any ε, δ > 0 there exists some n such that
P(|X̄n − µ| < ε) = 1− δ
Y. Samuel Wang STAT 311: Sampling Distributions
Long run behavior
Each individual outcome of a random variable is unpredictable
Probability models define the long run behavior
How can we apply these ideas to real world data which wegather
Theorem
Law of Large Numbers: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables with finitemean µ and finite variance σ2. Define
x̄n =X1 + X2 + . . .Xn
n
then for any ε, δ > 0 there exists some n such that
P(|X̄n − µ| < ε) = 1− δ
Y. Samuel Wang STAT 311: Sampling Distributions
Implications of the LLN
Given a large enough sample, we can achieve any level ofprecision (determined by ε and δ)
Exactly how large n has to be depends on σ2, which inpractice we don’t always know
Typically, taking a larger sample will give us a more preciseestimate of the true parameters
Sample statistic will “converge” to the truth
LLN cannot make up for bad study design
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem
Let n ≥ 1, and let X1,X2 . . .Xn be a sequence of mutuallyindependent random variables such that E (Xi ) = µi andVar(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn
Note that because we are summing random variables,E (Sn) = µ1 + µ2 . . . µn
Since the random variables are independentVar(Sn) = σ21 + σ22 + . . . σ2n
If Y = Sn − E (Sn), then E (Y ) = 0 and Var(Y ) = Var(Sn)
If Z = Y /√
Var(Sn), then E (Z ) = 0 and Var(Z ) = 1
Can we say more?
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem
Let n ≥ 1, and let X1,X2 . . .Xn be a sequence of mutuallyindependent random variables such that E (Xi ) = µi andVar(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn
Note that because we are summing random variables,E (Sn) = µ1 + µ2 . . . µn
Since the random variables are independentVar(Sn) = σ21 + σ22 + . . . σ2n
If Y = Sn − E (Sn), then E (Y ) = 0 and Var(Y ) = Var(Sn)
If Z = Y /√
Var(Sn), then E (Z ) = 0 and Var(Z ) = 1
Can we say more?
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem
Theorem
Central Limit Theorem: Let n ≥ 1, and let X1,X2 . . .Xn be asequence of mutually independent random variables such thatE (Xi ) = µi and Var(Xi ) = σ2i . Define Sn = X1 + X2 . . .Xn
Zn =Sn − E (Sn)√
Var(Sn)→d N(0, 1)
The →d symbol means converges in distribution. This means thatthe distribution of my random variable Zn, which is simply formedfrom the Xi ’s, begins to look more and more like a standardnormal distribution as n increases.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem
What if all my random variables are identical?
E (Sn) = nµ
Var(Sn) = nσ2
Zn =Sn − nµ√
nσ2=
n( 1nSn − µ)√n√σ2
=√n
(x̄n − µ)
σ
The distribution of (almost) any sample mean goes to a standardnormal distribution when we account for the mean and standarddeviation of the population.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem
What if all my random variables are identical?
E (Sn) = nµ
Var(Sn) = nσ2
Zn =Sn − nµ√
nσ2=
n( 1nSn − µ)√n√σ2
=√n
(x̄n − µ)
σ
The distribution of (almost) any sample mean goes to a standardnormal distribution when we account for the mean and standarddeviation of the population.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Caveats
Note that this does say anything about individual randomvariables
This only makes a statement about the x̄n as n grows large
Imagine taking many samples of size n = 100 and plot thehistogram, that will look roughly normal; then take manysamples of n = 1000, that will look even more like a normaldistribution, etc.
How large n has to be for a “roughly normal” histogram of byx̄n depends on many things, but as long as n keeps growing, itwill eventually get there
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Example
For a given sample size n, take the sample average of nexponential random variables with rate = 1. Then repeat with anew sample 1000 times.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Example
For a given sample size n, take the sample average of nexponential random variables with rate = 1. Then repeat with anew sample 1000 times.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Example
For a given sample size n, take the sample average of nexponential random variables with rate = 1. Then repeat with anew sample 1000 times.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Example
For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Example
For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Example
For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.
Y. Samuel Wang STAT 311: Sampling Distributions
Central Limit Theorem Example
For a given sample size n, take the sample average of nexponential random variables with rate = 1 and standardize thesample average. Then repeat with a new sample 1000 times.
Y. Samuel Wang STAT 311: Sampling Distributions
Normal Distributions in Nature
Why is height (approximately) normally distributed?
How tall are you today?
That is the sum of how tall you grew each day for the Xnumber of days you have been alive so far.
So the sum of those lengths is a sample of size X
Each person is a separate sample
Y. Samuel Wang STAT 311: Sampling Distributions
How else can we use the CLT?
We know a hypothetical distribution of a sample statistic x̄
In practice we don’t do many “resamples”
We can still estimate the hypothetical distribution if we didresample many times, if we know σ2
If we don’t know σ2, we can estimate it, but this changesthings slightly
Y. Samuel Wang STAT 311: Sampling Distributions
Sampling distributions
If the assumptions are satisfied,
x̄ ∼ N(0, σ2/n)
Var(x̄) = Var(1
n
∑i
xi ) =1
n2Var(
∑i
xi ) =1
n2nσ2 = σ2/n
Y. Samuel Wang STAT 311: Sampling Distributions
Sampling distributions
If the assumptions are satisfied,
x̄ ∼ N(0, σ2/n)
Var(x̄) = Var(1
n
∑i
xi ) =1
n2Var(
∑i
xi ) =1
n2nσ2 = σ2/n
Y. Samuel Wang STAT 311: Sampling Distributions
Top Related