Chapter 8: STATISTICAL INTERVALS FOR A SINGLE...

Chapter 8: STATISTICAL INTERVALSFOR A SINGLE SAMPLE

Part 3:• Summary of CI for µ• Confidence Interval for a Population

Proportion pSection 8-4

Summary for creating a 100(1-α)% CI for µ:

•When σ2 is known and parent population isnormal, use a z-value (this works for any n).

x̄ ± zα/2 · σ√n

•When σ2 is unknown and n is REALLY large,use a z-value and replace σ2 with the ob-served sample variance s2 (this works for anyparent population distribution because n islarge).

x̄ ± zα/2 · s√n

1

•When σ2 is unknown, n is relatively small, andPOPULATION IS NEARLY NORMAL,use a t-value and the sample variance.

x̄± tα/2,n−1 · s√n

• Before you make a CI, it is a random inter-val... it depends on the sample chosen, butX̄ will ALWAYS be at the center of the 2-sided CI.

Example of 16 different CI’s for µ each basedon a different sample:

2

• Example: Fuel rods in a reactor (problem8-43 in book)

An article in Nuclear Engineering Interna-tional gives the following measurements onthe percentage of enrichment of 12 fuel rodsin a reactor in Norway:

2.94 3.00 2.90 2.75 3.00 2.952.90 2.75 2.95 2.82 2.81 3.05

Calculate a 95% CI for the mean percentageof enrichment. Provide a normal probabilityplot to verify the assumption of normality.

ANS: This is a small sample with unknownσ2, so we will use the procedure with the t-value.

x̄± tα/2,n−1 · s√n

3

x̄=2.9017

s2=0.0098

s = 0.0993

tα/2,n−1 = t0.025,12−1 = t0.025,11 = 2.201

95% CI for µ:

2.9017± 2.201 · (0.0993/√

12)

2.9017± 0.0631

[2.8386, 2.9648]

4

Checking normality with a normal probabil-ity plot:

●

●

●

●

●

●

●

●

●

●

●

●

2.75 2.80 2.85 2.90 2.95 3.00 3.05

−1.

5−

1.0

−0.

50.

00.

51.

01.

5

normal probability plot

Sample Quantiles

The

oret

ical

Qua

ntile

s

Looks pretty good. The points fall randomlyaround the diagonal line. So, we can believethat the data were generated from a normaldistribution (i.e. the parent population wasat least ‘approximately’ normal).

5

Population Proportion Parameter p

•Moving on to another potential parameter ofinterest...

A population proportion is denoted p.

A sample proportion is denoted p̂.

• A population proportion is based on a yes/noor 0/1 type variable.

– What proportion of the population favorHillary Clinton?

yes/no

– What proportion of a manufactured goodis defective?

defective/not defective

6

– What proportion of the U.S. is republican?republican/not republican

– What proportion of students entering col-lege successfully complete a degree?

succeed/fail

• To estimate a population proportion, we willuse a sample proportion...

... and the bottom line for proportions is thatyou just plain need a large sample.

There’s sparse information in each observa-tion (just yes/no), we don’t have a nicecontinuum like in a continuous random vari-able measurement (like weight).

7

Large-Sample Confidence Interval fora Population Proportion pSection 8-4

• The construction of the CI for p relies on thefact that we took a large sample (large n).

• I will introduce this concept using somethingwe are familiar with, the binomial...

– We will let the category of interest be calledthe ‘success’ category (arbitrary).

– Let Xi = 1 if observation i falls into the‘success’ category.

– Let Xi = 0 if observation i falls into theother or ‘fail’ category.

–Xi is called an indicator variable.

8

– Thus,∑ni=1Xi = a count of all individuals in

the ‘success’ category.

– The sample proportion of individuals fallinginto the ‘success’ category is

P̂ =

∑ni=1Xin

=# in sample who are in ‘success’ category

n

– P̂ (upper case) is a random variable andis the point estimator for p.

– p̂ (lower case) is a realized point estimatefrom a observed sample.

9

• Note that p and n are actually the param-eters for a binomial distribution (i.e. proba-bility of a success and number of trials).

– There are n trials (i.e. n independent drawsof individuals to form the sample)

– The probability of getting a ‘success’ re-mains constant as p (assuming we have alarge population and n is not too large)

– Let Y =total number of successes.

– So Y ∼ Binomial(n, p)

– E(Y ) = np and V (Y ) = np(1− p)

– In our notation, Y =∑ni=1Xi, so

Y =∑ni=1Xi ∼ Binomial(n, p)

10

E(∑ni=1Xi) = np

and

V (∑ni=1Xi) = np(1− p)

• Thus, if n is large, we have

Z =(∑ni=1Xi)− np√np(1− p)

=

(∑ni=1Xi)n − p√p(1−p)n

=P̂ − p√p(1−p)n

where Z is approximately standard normal.

(NOTE: This is a normal approximation tothe binomial. And for the approximation tobe reasonable, we should have np ≥ 5 andn(1− p) ≥ 5.)

11

•Normal approximation for a sampleproportion P̂

If n is large, the distribution of

Z =P̂ − p√p(1−p)n

is approximately standard normal.

Or similarly, P̂ ∼ N(p,p(1−p)n )

This means the sampling distribution for P̂is normal...

See applet at:

http://www.rossmanchance.com/applets/OneProp/OneProp.htm?candy=1

12

Thus, we can can use a z-value to form a100(1-α)% CI for p:

P (−zα/2 ≤P̂ − p√p(1−p)n

≤ zα/2) = 1− α

rearranging...

P

(P̂ − zα/2

√p(1− p)

n≤ p ≤ P̂ + zα/2

√p(1− p)

n

)= 1− α

and we have the lower and upper bounds...

Lower bound: P̂ − zα/2

√p(1−p)n

Upper bound: P̂ + zα/2

√p(1−p)n

BUT WE DON’T KNOW p, SO WE CAN’TGET ACTUAL VALUES FOR THE BOUNDS!

Solution⇒ replace p with P̂ in the formulas.

13

•Approximate 100(1-α)% CIfor a population proportion p

If p̂ is the proportion of observations in arandom sample of size n that belongs to aclass of interest, an approximate 100(1-α)%CI on the proportion of p of the populationthat belongs to this class is

p̂± zα/2

√p̂(1− p̂)

n

where zα/2 is the upper α/2 percentage pointof the standard normal distribution.

• Things you need for the appropriate behaviorof P̂ :

– Population is large, and you don’t take toomany individuals for your sample. Maybeno more than 10% of the total population.

14

– The sample is a simple random sample.

– np ≥ 5 and n(1− p) ≥ 5.

This statement means that if you have areally rare event, you’re going to need avery large sample... just logical though.

• Example: Interpolation methods are usedto estimate heights above sea level for loca-tions where direct measurement are unavail-able.

After verifying the estimates, it was foundthat the interpolation method made “large”errors at 26 of 74 random sample test loca-tions.

Find a 90% CI for the overall proportionof locations at which this method will make“large” errors.

15

ANS:

16

•Choice of sample size forestimating p

For a specified error E = |p − P̂ | in yourestimate, the previously stated behavior ofP̂ suggests you should choose a sample sizeas:

n =

(zα/2

E

)2

· p(1− p)

But since we don’t know p (that’s what we’retrying to estimate!), we can’t compute a sam-ple size from this formula... unless weestimate p first.

Here, we choose to err on the conservativeside. It turns out the largest variance for P̂

[where V (P̂ ) =p(1−p)n ] occurs when p = 0.5

no matter what the n was:

17

Plot of variance of P̂ vs. p

0.0 0.2 0.4 0.6 0.8 1.0

0.00

00.

002

0.00

40.

006

0.00

8

p

p(1−

p)/n

So, if we don’t know p, we’ll plug-in p =0.5 to make sure we don’t under-estimate thevariance of our estimate (using ‘worst casescenario’ idea).

18

• Thus, the working sample size formula forestimating p with a 100(1− α)% CI and anerror of E is:

n =

(zα/2

E

)2

0.25

We use this method because, before we col-lect data, we don’t have any information onp, so there’s no observed p̂ to plug-in.

In contrast, when doing a CI for p, we DOhave an observed estimate p̂ for p, so in thatcase, so use our plug-in estimate.

19

• Example: In the interpolation example, howlarge of a sample would you need if you wantedto be at least 95% confident that the errorin your estimate (i.e. |p−p̂|) is less than 0.08?

ANS:

20

• Example: Gallup Poll

Between February 9-11, 2007, adults wererandomly sampled (by phone) and asked:

“Would you favor or oppose Congress tak-ing action to set a time-table for withdraw-ing all U.S. troops from Iraq by the end ofnext year.”

Let p = the proportion of the populationwanting a withdrawal time-table.

How large is their error if they sample 1000people and form a a 95% CI?

21

ANS:

See USA Today, February 12, 2007.

http://www.usatoday.com/news/polls/tables/live/2007-02-12-poll.htm

What can go wrong in the estimation?Bias in selecting the sample for one thing.“Gallup identifies flaws in 2012 elec-tion polls”http://www.usatoday.com/story/news/politics/2013/06/04/gallup-poll-

election-obama-romney/2388921/

22

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE...

Documents

Transcript of Chapter 8: STATISTICAL INTERVALS FOR A SINGLE...