Chapter 8: STATISTICAL INTERVALS FOR A SINGLE...
Transcript of Chapter 8: STATISTICAL INTERVALS FOR A SINGLE...
Chapter 8: STATISTICAL INTERVALSFOR A SINGLE SAMPLE
Part 3:• Summary of CI for µ• Confidence Interval for a Population
Proportion pSection 8-4
Summary for creating a 100(1-α)% CI for µ:
•When σ2 is known and parent population isnormal, use a z-value (this works for any n).
x̄ ± zα/2 · σ√n
•When σ2 is unknown and n is REALLY large,use a z-value and replace σ2 with the ob-served sample variance s2 (this works for anyparent population distribution because n islarge).
x̄ ± zα/2 · s√n
1
•When σ2 is unknown, n is relatively small, andPOPULATION IS NEARLY NORMAL,use a t-value and the sample variance.
x̄± tα/2,n−1 · s√n
• Before you make a CI, it is a random inter-val... it depends on the sample chosen, butX̄ will ALWAYS be at the center of the 2-sided CI.
Example of 16 different CI’s for µ each basedon a different sample:
2
• Example: Fuel rods in a reactor (problem8-43 in book)
An article in Nuclear Engineering Interna-tional gives the following measurements onthe percentage of enrichment of 12 fuel rodsin a reactor in Norway:
2.94 3.00 2.90 2.75 3.00 2.952.90 2.75 2.95 2.82 2.81 3.05
Calculate a 95% CI for the mean percentageof enrichment. Provide a normal probabilityplot to verify the assumption of normality.
ANS: This is a small sample with unknownσ2, so we will use the procedure with the t-value.
x̄± tα/2,n−1 · s√n
3
x̄=2.9017
s2=0.0098
s = 0.0993
tα/2,n−1 = t0.025,12−1 = t0.025,11 = 2.201
95% CI for µ:
2.9017± 2.201 · (0.0993/√
12)
2.9017± 0.0631
[2.8386, 2.9648]
4
Checking normality with a normal probabil-ity plot:
●
●
●
●
●
●
●
●
●
●
●
●
2.75 2.80 2.85 2.90 2.95 3.00 3.05
−1.
5−
1.0
−0.
50.
00.
51.
01.
5
normal probability plot
Sample Quantiles
The
oret
ical
Qua
ntile
s
Looks pretty good. The points fall randomlyaround the diagonal line. So, we can believethat the data were generated from a normaldistribution (i.e. the parent population wasat least ‘approximately’ normal).
5
Population Proportion Parameter p
•Moving on to another potential parameter ofinterest...
A population proportion is denoted p.
A sample proportion is denoted p̂.
• A population proportion is based on a yes/noor 0/1 type variable.
– What proportion of the population favorHillary Clinton?
yes/no
– What proportion of a manufactured goodis defective?
defective/not defective
6
– What proportion of the U.S. is republican?republican/not republican
– What proportion of students entering col-lege successfully complete a degree?
succeed/fail
• To estimate a population proportion, we willuse a sample proportion...
... and the bottom line for proportions is thatyou just plain need a large sample.
There’s sparse information in each observa-tion (just yes/no), we don’t have a nicecontinuum like in a continuous random vari-able measurement (like weight).
7
Large-Sample Confidence Interval fora Population Proportion pSection 8-4
• The construction of the CI for p relies on thefact that we took a large sample (large n).
• I will introduce this concept using somethingwe are familiar with, the binomial...
– We will let the category of interest be calledthe ‘success’ category (arbitrary).
– Let Xi = 1 if observation i falls into the‘success’ category.
– Let Xi = 0 if observation i falls into theother or ‘fail’ category.
–Xi is called an indicator variable.
8
– Thus,∑ni=1Xi = a count of all individuals in
the ‘success’ category.
– The sample proportion of individuals fallinginto the ‘success’ category is
P̂ =
∑ni=1Xin
=# in sample who are in ‘success’ category
n
– P̂ (upper case) is a random variable andis the point estimator for p.
– p̂ (lower case) is a realized point estimatefrom a observed sample.
9
• Note that p and n are actually the param-eters for a binomial distribution (i.e. proba-bility of a success and number of trials).
– There are n trials (i.e. n independent drawsof individuals to form the sample)
– The probability of getting a ‘success’ re-mains constant as p (assuming we have alarge population and n is not too large)
– Let Y =total number of successes.
– So Y ∼ Binomial(n, p)
– E(Y ) = np and V (Y ) = np(1− p)
– In our notation, Y =∑ni=1Xi, so
Y =∑ni=1Xi ∼ Binomial(n, p)
10
E(∑ni=1Xi) = np
and
V (∑ni=1Xi) = np(1− p)
• Thus, if n is large, we have
Z =(∑ni=1Xi)− np√np(1− p)
=
(∑ni=1Xi)n − p√p(1−p)n
=P̂ − p√p(1−p)n
where Z is approximately standard normal.
(NOTE: This is a normal approximation tothe binomial. And for the approximation tobe reasonable, we should have np ≥ 5 andn(1− p) ≥ 5.)
11
•Normal approximation for a sampleproportion P̂
If n is large, the distribution of
Z =P̂ − p√p(1−p)n
is approximately standard normal.
Or similarly, P̂ ∼ N(p,p(1−p)n )
This means the sampling distribution for P̂is normal...
See applet at:
http://www.rossmanchance.com/applets/OneProp/OneProp.htm?candy=1
12
Thus, we can can use a z-value to form a100(1-α)% CI for p:
P (−zα/2 ≤P̂ − p√p(1−p)n
≤ zα/2) = 1− α
rearranging...
P
(P̂ − zα/2
√p(1− p)
n≤ p ≤ P̂ + zα/2
√p(1− p)
n
)= 1− α
and we have the lower and upper bounds...
Lower bound: P̂ − zα/2
√p(1−p)n
Upper bound: P̂ + zα/2
√p(1−p)n
BUT WE DON’T KNOW p, SO WE CAN’TGET ACTUAL VALUES FOR THE BOUNDS!
Solution⇒ replace p with P̂ in the formulas.
13
•Approximate 100(1-α)% CIfor a population proportion p
If p̂ is the proportion of observations in arandom sample of size n that belongs to aclass of interest, an approximate 100(1-α)%CI on the proportion of p of the populationthat belongs to this class is
p̂± zα/2
√p̂(1− p̂)
n
where zα/2 is the upper α/2 percentage pointof the standard normal distribution.
• Things you need for the appropriate behaviorof P̂ :
– Population is large, and you don’t take toomany individuals for your sample. Maybeno more than 10% of the total population.
14
– The sample is a simple random sample.
– np ≥ 5 and n(1− p) ≥ 5.
This statement means that if you have areally rare event, you’re going to need avery large sample... just logical though.
• Example: Interpolation methods are usedto estimate heights above sea level for loca-tions where direct measurement are unavail-able.
After verifying the estimates, it was foundthat the interpolation method made “large”errors at 26 of 74 random sample test loca-tions.
Find a 90% CI for the overall proportionof locations at which this method will make“large” errors.
15
ANS:
16
•Choice of sample size forestimating p
For a specified error E = |p − P̂ | in yourestimate, the previously stated behavior ofP̂ suggests you should choose a sample sizeas:
n =
(zα/2
E
)2
· p(1− p)
But since we don’t know p (that’s what we’retrying to estimate!), we can’t compute a sam-ple size from this formula... unless weestimate p first.
Here, we choose to err on the conservativeside. It turns out the largest variance for P̂
[where V (P̂ ) =p(1−p)n ] occurs when p = 0.5
no matter what the n was:
17
Plot of variance of P̂ vs. p
0.0 0.2 0.4 0.6 0.8 1.0
0.00
00.
002
0.00
40.
006
0.00
8
p
p(1−
p)/n
So, if we don’t know p, we’ll plug-in p =0.5 to make sure we don’t under-estimate thevariance of our estimate (using ‘worst casescenario’ idea).
18
• Thus, the working sample size formula forestimating p with a 100(1− α)% CI and anerror of E is:
n =
(zα/2
E
)2
0.25
We use this method because, before we col-lect data, we don’t have any information onp, so there’s no observed p̂ to plug-in.
In contrast, when doing a CI for p, we DOhave an observed estimate p̂ for p, so in thatcase, so use our plug-in estimate.
19
• Example: In the interpolation example, howlarge of a sample would you need if you wantedto be at least 95% confident that the errorin your estimate (i.e. |p−p̂|) is less than 0.08?
ANS:
20
• Example: Gallup Poll
Between February 9-11, 2007, adults wererandomly sampled (by phone) and asked:
“Would you favor or oppose Congress tak-ing action to set a time-table for withdraw-ing all U.S. troops from Iraq by the end ofnext year.”
Let p = the proportion of the populationwanting a withdrawal time-table.
How large is their error if they sample 1000people and form a a 95% CI?
21
ANS:
See USA Today, February 12, 2007.
http://www.usatoday.com/news/polls/tables/live/2007-02-12-poll.htm
What can go wrong in the estimation?Bias in selecting the sample for one thing.“Gallup identifies flaws in 2012 elec-tion polls”http://www.usatoday.com/story/news/politics/2013/06/04/gallup-poll-
election-obama-romney/2388921/
22