Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test...

17
AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a sample to evaluate the truth of a claim about the population. The logic behind significance tests: 1. Start with a claim about a population parameter. For example, suppose Fred is running for Student Body President and he claims that 55% of students in the school support him. This claim is called the null hypothesis. 0 : 0.55 H p = 2. We suspect that Fred is exaggerating how much support he has and that in fact, less than 55% of students in the school support Fred. This is called our alternative hypothesis. : 0.55. a H p < 3. Take a random sample. Suppose that in our SRS, 36 out of 80 students support Fred ( ˆ 0.45 p = ). 4. Compare the sample results to the null value. In our sample, the proportion of students who support Fred was lower than the proportion of students he claims support him, so there is some evidence in favor of our alternative hypothesis (0.45 < 0.55). There are two possible reasons for this: a. The difference is due to sampling variability (chance), and we just happened to get a sample with an unusually low proportion. b. Fred’s claim is false and the true proportion of students who support him is lower than 0.55. 5. Evaluate the evidence. Is there enough evidence to rule out the “by chance” explanation? Assuming that the null hypothesis is true, determine the probability of getting a sample result as extreme or more extreme than the one we observed. In this case, I want to know the probability that I get a sample with a ˆ p as low or lower than 0.45 if, in fact, the true proportion is 0.55. p = Here’s a dotplot showing the results of 100 samples of size 80 from a population in which 55% favor Fred: 0.75 0.70 0.65 0.60 0.55 0.50 0.45 0.40 p-hat The proportion in favor of Fred was 0.45 or less in only 2 of the 100 samples. Based on the simulation, we estimate that ( ) ˆ 0.45 0.55 0.02. P p p = This means that it would be very unlikely for us to select a sample of students in which the proportion who favor Fred was this low or lower if his claim is true. It is much more likely that Fred’s claim is false. We reject the null hypothesis and conclude that there’s convincing evidence that less than 55% of students actually favor Fred. Basic Idea: A sample result that would rarely happen by chance if a claim is true provides convincing evidence that the claim is false. A sample result that wouldn’t be all that unusual if the claim is true doesn’t provide very convincing evidence that the claim is false. Null Hypothesis ( ) 0 : H A statement of “no difference from the claim/no difference from the status quo” or “nothing interesting/important/fishy is going on”. It usually takes the form 0 : parameter hypothesized value. H = The most common parameters to test are μ and . p Proportion in sample who favor Fred

Transcript of Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test...

Page 1: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

AP Statistics – Ch. 9 Notes

Significance Tests

A significance test (also called a hypothesis test) is a formal procedure for using observed data from a sample

to evaluate the truth of a claim about the population.

The logic behind significance tests:

1. Start with a claim about a population parameter. For example, suppose Fred is running for Student Body

President and he claims that 55% of students in the school support him. This claim is called the null

hypothesis. 0 : 0.55H p =

2. We suspect that Fred is exaggerating how much support he has and that in fact, less than 55% of

students in the school support Fred. This is called our alternative hypothesis. : 0.55.aH p <

3. Take a random sample. Suppose that in our SRS, 36 out of 80 students support Fred ( ˆ 0.45p = ).

4. Compare the sample results to the null value. In our sample, the proportion of students who support Fred

was lower than the proportion of students he claims support him, so there is some evidence in favor of

our alternative hypothesis (0.45 < 0.55). There are two possible reasons for this:

a. The difference is due to sampling variability (chance), and we just happened to get a sample with

an unusually low proportion.

b. Fred’s claim is false and the true proportion of students who support him is lower than 0.55.

5. Evaluate the evidence. Is there enough evidence to rule out the “by chance” explanation? Assuming that

the null hypothesis is true, determine the probability of getting a sample result as extreme or more

extreme than the one we observed. In this case, I want to know the probability that I get a sample with a

p̂ as low or lower than 0.45 if, in fact, the true proportion is 0.55.p =

Here’s a dotplot showing the results of 100 samples of size 80 from a population in which 55% favor

Fred:

0.750.700.650.600.550.500.450.40

p-hat

The proportion in favor of Fred was 0.45 or less in only 2 of the 100 samples. Based on the simulation,

we estimate that ( )ˆ 0.45 0.55 0.02.P p p≤ = ≈ This means that it would be very unlikely for us to select

a sample of students in which the proportion who favor Fred was this low or lower if his claim is true. It

is much more likely that Fred’s claim is false. We reject the null hypothesis and conclude that there’s

convincing evidence that less than 55% of students actually favor Fred.

Basic Idea: A sample result that would rarely happen by chance if a claim is true provides convincing evidence

that the claim is false. A sample result that wouldn’t be all that unusual if the claim is true doesn’t provide very

convincing evidence that the claim is false.

Null Hypothesis (((( ))))0:H A statement of “no difference from the claim/no difference from the status quo” or

“nothing interesting/important/fishy is going on”. It usually takes the form 0 : parameter hypothesized value.H =

The most common parameters to test are μ and .p

Proportion in sample who favor Fred

Page 2: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Alternative Hypothesis (((( )))) :a

H The claim we are trying to find evidence for (what we believe/suspect/hope to

be true). It can take the form : parameter hypothesized value,a

H > : parameter hypothesized value,a

H < or

: parameter hypothesized value.a

H ≠

The hypotheses should express the suspicions or hopes we have before we see the data. It is cheating to look at

the data first and then come up with hypotheses that fit what the data show.

One-sided alternative hypothesis: An alternative hypothesis that states that the parameter is larger than the null

value or that the parameter is smaller than the hypothesized value ( aH uses < or > ).

Two-sided alternative hypothesis: An alternative hypothesis that states that the parameter is different from the

null value – we haven’t decided whether we think it’s larger or smaller ( aH uses ≠ ).

Examples: Identify the parameter of interest in each setting and state appropriate hypotheses for performing a

significance test.

a) As part of its 2010 census marketing campaign, the U.S. Census Bureau advertised “10 questions, 10

minutes—that’s all it takes.” On the census form itself, we read, “The U.S. Census Bureau estimates

that, for the average household, this form will take about 10 minutes to complete, including the time for

reviewing the instructions and answers.” We suspect that the actual time it takes to complete the form

may be longer than advertised.

0 : 10 min

: 10 mina

H μ

H μ

=

>

where μ is the true mean amount of time needed to complete the census form.

b) Mike is an avid golfer who would like to improve his play. A friend suggests getting new clubs and lets

Mike try out his 4-iron. Based on years of experience, Mike has established that the mean distance that

balls travel when hit with his old 4-iron is 175 yardsμ = with a standard deviation of 15 yards.σ = He is

hoping that this new club will make his shots with a 4-iron more consistent (less variable), so he goes to

the driving range and hits 50 shots with the new 4-iron.

0 : 15 yds

: 15 ydsa

H σ

H σ

=

<

where σ is the true standard deviation of yardage for all shots hit by Mike with the new 4-iron.

c) According to a website, 85% of teens are getting less than eight hours of sleep each night. Janie wonders

whether this result is different in her large high school. She asks an SRS of 100 students at the school

whether they get less than 8 hours of sleep on a typical night.

0 : 0.85

: 0.85a

H p

H p

=

where p is the true proportion of students at Janie’s school who get less than 8 hours of sleep at night

Page 3: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

P-value: The probability of getting evidence for the alternative hypothesis aH as strong or stronger than the

observed evidence when the null hypothesis 0H is true. In other words, the P-value is the probability that by

random chance alone, the statistic (such as p̂ or x ) would take a value as extreme or more extreme (in the

direction specified by aH ) than the one actually observed if 0H is true. The smaller the P-value, the stronger

the evidence against 0.H

In the example we started with (about the proportion of students who support Fred), the P-value was about 0.02

because there was only about a 2% probability of observing a sample proportion as low or lower than the one

we observed by chance alone if Fred actually had the support of 55% of the students at the school.

The P-value is not the probability that the null hypothesis is true. It’s the probability of having at

least as much evidence as the sample provides – the probability that something at least as

weird/interesting as what actually happened would happen by chance if the null hypothesis is true.

(In a trial, it isn’t be the probability the defendant is guilty. It’s the probability we would have at

least this much evidence against the defendant if he or she were innocent.)

Examples: Interpret each P-value in context and determine whether the data provides convincing evidence

against the null hypothesis. Explain.

a) To evaluate the U.S. Census Bureau’s claim that it takes about 10 minutes to complete the census form,

we tested the hypotheses

0 : 10 minutes

: 10 minutesa

H μ

H μ

=

>

where μ = the mean amount of time that it takes to complete the census form. Suppose that in a sample

of 100 families, the average amount of time needed to complete the form was 10.5 minutesx = with a

standard deviation of 5.5 minutes.s = A significance test using this data resulted in a P-value of 0.1828.

If the true mean amount of time needed to complete the census form really is 10 minutes, the probability

of selecting a sample with a mean time of 10.5 minutes or longer by chance alone is about 0.1828.

(Since it would not be that unusual to obtain a sample mean at least this high by chance if the true mean

is 10 minutes, we don’t have good evidence that the Census Bureau’s claim is false.)

b) When Mike was testing a new 4-iron, the hypotheses were

0 : 15 yards

: 15 yardsa

H σ

H σ

=

<

where σ = the true standard deviation of the distances over which Mike hits golf balls using the new

club. Based on 50 shots with the new 4-iron, the standard deviation was 10.9s = yards. A hypothesis

test using this data had a P-value of 0.002.

If the true standard deviation of yardage for Mike’s new club is really 15 yards, the probability of a

sample of 50 shots having a standard deviation of 10.9 yards or lower by chance alone is only about

0.002.

(It would be incredibly unusual to have a sample standard deviation at least as low as the one observed

by chance alone if the true standard deviation for the new clubs is 15 yards. This is very strong evidence

that the true standard deviation for the new club is actually less than 15 yards.)

Page 4: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

c) In Janie’s hypothesis test about sleep deprivation, the hypotheses were

0 : 0.85

: 0.85a

H p

H p

=

where p = the proportion of students at Janie’s school who get less than 8 hours of sleep on a typical

night. In Janie’s sample, 75 of the 100 students reported getting less than 8 hours of sleep on a typical

night. The hypothesis test resulted in a P-value of 0.005.

75ˆ 0.75 -value 0.005

100p P= = =

If the true proportion of students at Janie’s school who get less than 8 hours of sleep on a typical night

really is 0.85, the probability of selecting a sample with a proportion at least as different from 0.85 as the

one observed (0.75) by chance alone is only about 0.005.

(A sample proportion this different from 0.85 is very unlikely to happen by chance alone if the true

proportion is 0.85. This is very strong evidence that the true proportion is different for 0.85.)

Statistical Significance

When we perform a hypothesis test, there are two possible conclusions – reject 0H or fail to reject 0

.H If our

sample result is too unlikely to have happened by chance assuming 0H is true, then we reject 0.H Otherwise,

we conclude that there’s not enough evidence against 0 ,H so we fail to reject 0.H

When we perform a hypothesis test, we choose a significance level (((( ))))α , which is a cut-off for the P-value that

we consider decisive. If the P-value is less than ,α we say that the data are statistically significant at level .α In

this case, we reject the null hypothesis 0H and conclude that there is convincing evidence in favor of the

alternative hypothesis .a

H The most commonly-used significance level is 0.05.α =

• If -valueP α< → reject 0H → convincing evidence for aH (in context)

“Since -value ,P α< we reject 0.H There is convincing evidence that

[alternative hypothesis in context].”

• If -valueP α≥ → fail to reject 0H → not convincing evidence for aH (in

context)

“Since -value ,P α> we fail to reject 0.H There is not convincing

evidence that [alternative hypothesis in context].”

NEVER EVER EVER EVER EVER ACCEPT A HYPOTHESIS—

EVER! If we are doing a hypothesis test, we have some evidence that the

null hypothesis is false. If the P-value is large, it simply means that the

evidence we have is not convincing enough to rule out random chance. It

does not mean that there is convincing evidence that the null hypothesis

is true.

Always state the official decision in terms of 0,H not .

aH

“Significant” does not mean “important.” It means “not likely to happen by chance.” Never use

the word significant on the AP test in a non-statistical sense.

If all else fails, use

“significant at an α > 0.05

level” and hope no one

notices.

XKCD COMICS

PERMANENT LINK TO THIS COMIC:

HTTP://XKCD.COM/1478/

Page 5: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Example: For his second semester project in AP Statistics, Zenon decided to investigate whether students at his

school prefer name-brand potato chips to generic potato chips. He randomly selected 50 students and had each

student try both types of chips, in random order. Overall, 34 of the 50 students preferred the name-brand chips.

Zenon performed a significance test using the hypotheses

0 : 0.5

: 0.5a

H p

H p

=

>

where p = the true proportion of students at his school who prefer name-brand chips. The resulting P-value was

0.0055. What conclusion would you make at each of the following significance levels?

a) 0.01α =

Since -value 0.01,P α< = we reject 0.H There is convincing evidence that more than half of the

students at Zenon’s school prefer brand name chips.

b) 0.001α = (Obviously Zenon is very hard to convince!)

Since -value 0.001,P α> = we fail to reject 0.H There is not convincing evidence that more than half of

the students at Zenon’s school prefer brand name chips. (Zenon is crazy!)

Type I and Type II Errors

When we draw a conclusion from a significance test, we hope our conclusion will be correct, but sometimes it

will be wrong. There are two types of mistakes we can make.

Type I Error: Rejecting 0H when 0H is true. In other words, it’s finding convincing evidence for aH when

aH is false. The probability of a Type I error equals the significance level .α

Type II Error: Failing to reject 0H when 0H is false. In other words, it’s not finding convincing evidence for

aH when in fact a

H is true. The probability of a Type II error is abbreviated .β

Another way of thinking about Type I and Type II errors:

• A Type I error is a false positive: jumping to a false conclusion that something significant is going on

when it really isn’t.

o Examples: Finding an innocent person guilty, concluding there’s a bomb in the building when

there really isn’t, concluding that a new medication works when it really doesn’t.

• A Type II error is a false negative: not coming to a true conclusion that something significant is going

on when it really is.

o Examples: Not convicting a guilty person, not acting on a bomb threat when there’s actually a

bomb in the building, not concluding that a new medication works when it really does. Notice

how all of these were phrased with the word “not”!

Truth about the population

0H true 0H false, aH true

Decision

based

on sample

Reject 0H ,

Conclude aH

Type I error

( )Type IP α=

Correct Decision Power of Test 1 β= −

Fail to reject 0H ,

Don’t conclude aH

Correct Decision

( )correct 1P α= −

Type II error

( )Type IIP β=

Page 6: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

• Always phrase Type II errors in a negative sense – not finding convincing evidence for the alternative

hypothesis when it actually is true, rather than concluding that the alternative hypothesis is false when

it actually is true. When we fail to reject 0,H we are saying that we don’t have convincing enough

evidence to reject the null hypothesis in favor of the alternative hypothesis, not that H0 is definitely

true, or equivalently, that aH is definitely false. NEVER ACCEPT THE NULL HYPOTHESIS!

A silly way to remember which error is which: ART is my BFF (A before B, I before II)

Alpha represents the probability of a Type I error:

Reject the null hypothesis when in fact the null hypothesis is

True.

Beta represents the probability of a Type II error:

Fail to reject the null hypothesis when in fact the null hypothesis is

False.

Example: Your company markets a computerized device for detecting high blood pressure. The device

measures an individual’s blood pressure once per hour at a randomly selected time throughout a 12-hour period.

Then it calculates the mean systolic (top number) pressure for the sample of measurements. Based on the

sample results, the device determines whether there is significant evidence that the individual’s actual mean

systolic pressure is greater than 130. If so, it recommends that the person seek medical attention.

a) State appropriate null and alternative hypotheses in this setting. Be sure to define your parameter.

0 : 130 (the person doesn't have high blood pressure)

: 130 (the person has high blood pressure)a

H μ

H μ

=

>

where μ is the individual’s true mean systolic blood pressure during the 12-hour period.

b) Describe a Type I and a Type II error, and explain the consequences of each.

Type I error (reject 0H when 0H is true): Finding convincing evidence that the individual has high

blood pressure when they actually don’t. Consequences could include an unnecessary visit to a doctor or

being told to take unnecessary medication.

Type II error (fail to reject 0H when 0H is true): Not finding convincing evidence that the individual

has high blood pressure when they actually do. Consequences could include serious health problems that

could have been avoided with treatment.

c) The blood pressure device can be adjusted to decrease one error probability at the cost of an increase in

the other error probability. Which error probability would you choose to make smaller, and why?

In this case, a Type II error is more serious because high blood pressure can lead to serious health

problems or even death if not treated, so it would be better to lower the Type II error probability, even if

that increases the probability of making a Type I error.

Page 7: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Tests about a Population Proportion

Remember that the sampling distribution of p̂ tells us how the value of p̂ varies from sample to sample for a

population with proportion p of successes. The mean of the sample proportions is p̂μ p= and the standard

deviation of the sample proportions is ˆ .p

pqσ

n=

We assume that the population proportion is 0p (the value in the null hypothesis) and we want to know how

many standard deviations away from this hypothesized value the proportion we saw in our sample ( )p̂ is. Then

we can use a Normal curve (as long as the Normal condition has been met) to determine the probability of

getting a sample with a p̂ at least as extreme as what we actual got if 0H is really true.

Example: Let’s return to our example about Fred, who is running for Student Body President and is convinced

that 55% of the student body supports him. We are testing the hypotheses

0 : 0.55

: 0.55a

H p

H p

=

<

where p = the true proportion of the student body who support Fred. Let’s use 0.05.α = Assume that we have

confirmed that all conditions needed to conduct the hypothesis test are met.

In our sample, 36 out of 80 students supported Fred, so ˆ 36 80 0.45p = = .

We draw a Normal curve to show what the proportion of students who

support Fred in different samples should look like if Fred’s claim that

0.55p = is true.

ˆ 0.55pμ = and ( )( )

ˆ

0.55 0.450.0556

80pσ = ≈

We want to find ( )ˆ 0.45P p ≤ given that 0.55.p =

We find that 0.45 0.55

1.798.0.0556

z−

= = − This value is called the test statistic. Using the normalcdf command on a

calculator, ( )1.798 0.0361.P z ≤ − = This is our P-value.

The probability we would get a sample where 45% or fewer of students support Fred if he actually has the

support of 55% of the students at the school is only about 3.6%. This seems unlikely, so either we just happened

to get a very unusual sample, or Fred is full of it and does not have the support he claims. Officially, we

conclude that since -value 0.05,P α< = we reject the null hypothesis and conclude that there is convincing

evidence that fewer than 55% of students at the school actually support Fred.

Test Statistic: The test statistic tells us how many standard deviations away from the hypothesized value the

value of the statistic from our sample is. That is, statistic parameter

test statistic .standard deviation of statistic

−=

Values of p-hat from different samples,

p-hat=0.45

= 0.03604

p=0.55

assuming p = 0.55

Area = P-value σ = 0.0556

Page 8: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

0.0202

z=2.05

0.0202

z=-2.05

Total Area=P-value=0.0404

0.0336

z=1.83

Area=

P-value=

z=-2.37

0.0089

Area=P-value=

One-Sample z Test for a Proportion

To test the hypothesis 0 0:H p p=

Test statistic: 0

0 0

ˆ.

p pz

p q

n

−=

P-value: Find the probability of getting a z statistic this large or larger in the direction specified by the

alternative hypothesis .a

H The P-value is the shaded area. For a two-tailed test, it is the total area in both tails.

Examples: Find the p-value for the given hypotheses and test statistic.

a) 0

: 0.3

: 0.3

2.37

a

H p

H p

z

=

<

= −

b) 0

: 0.575

: 0.575

1.83

a

H p

H p

z

=

>

=

c) 0

: 0.75

: 0.75

2.05

a

H p

H p

z

=

= −

( )normalcdf , 2.37−∞ − ( )normalcdf 1.83,∞ ( )2 normalcdf , 2.05⋅ −∞ −

Things you must write on hypothesis test problems for a proportion:

1. Name the procedure. (“One-sample z test for p”).

2. Write the appropriate null and alternative hypotheses. Make sure to define the parameter (define

p in context).

3. Check conditions:

• Random: The data must come from a random sample from the population of interest.

• Normal/Large Counts: The sampling distribution of p̂ must be approximately Normal. This will

be true if 0 10np ≥ and 0 10,nq ≥ where 0p is the hypothesized value from the null hypothesis.

(Make sure to use 0p and 0q instead of p̂ and ˆ!q ) DO NOT ROUND TO WHOLE #S!

• Independent: Observations must be independent (coin flips, die rolls, or sampling with

replacement). If sampling without replacement (which is usually the case), observations are not

independent, but this will have very little effect on the accuracy of the calculations as long as the

10% condition is met – the population must be at least 10 times as large as the sample.

4. Compute the test statistic and P-value. It is not necessary to show work on this part.

5. Report the results in context. Use the wording below. Don’t improvise.

• “Since -value ,P α< we reject 0.H There is convincing evidence that [alternative hypothesis in

context].”

• “Since -value ,P α> we fail to reject 0.H There is not convincing evidence that [alternative

hypothesis in context].”

Page 9: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Hypothesis Test for a Proportion on TI-83/TI-84 Calculators.

1. Choose “1-PropZTest” on the STAT → TESTS menu.

2. Enter the requested information:

p0: proportion in the hypotheses

x: number of successes in the sample (must be a whole number)

n: sample size

3. Specify whether the alternative hypothesis has a , , or < > ≠ sign.

4. Choose “Calculate” to see results, or “Draw” to see a shaded Normal curve.

Example: On shows like American Idol, contestants often wonder if there is an advantage to performing last.

To investigate this, a random sample of 600 American Idol fans is selected to view the audition tapes of 12

never-before-seen contestants. For each fan, the order of the 12 videos is randomly determined. Thus, if the

order of performance doesn’t matter, we would expect approximately 1/12 of the fans to prefer the last

contestant they view. In this study, 59 of the 600 fans preferred the last contestant they viewed.

a) Do these data provide convincing evidence at the 0.05α = level that there is an advantage to going last?

One-sample z test for p

Hypotheses: 0 : 1 12

: 1 12a

H p

H p

=

>

where p is the true proportion of all American Idol fans who prefer the last contestant they see.

• Random? The problem states that the fans were selected randomly and that the order of the

videos was randomly determined.

• Normal? ( )

( )

0

0

600 1 12 50 10

600 11 12 550 10

np

nq

= = ≥

= = ≥

• 10% Condition: It is safe to assume there are more than 10(600) = 6000 American Idol fans.

0

59 1ˆ 0.098 0.083

600 12p p= ≈ = ≈

Test statistic: ( )( )

0.098 0.0831.33

0.083 0.917

600

z−

= ≈

P-value: ( ) ( )1.33 normalcdf lower 1.33, upper , 0, 1 0.092P z μ σ> = = = ∞ = = ≈

Conclusion: Since -value 0.05,P α> = we fail to reject 0.H There is not convincing evidence that there

is an advantage to going last on American Idol.

b) Given your conclusion in part (a), which kind of mistake—a Type I or a Type II error—could you have

made? Explain what this mistake means in this context.

Since we failed to reject 0 ,H it’s possible that we made a Type II error, which would mean not

concluding that there’s an advantage to going last on American Idol when there actually is.

Page 10: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Example: According to a new study published in The Journal of the American Medical Association in April

2018, about 14.8% of teens admit to sending sexts. The counselor at a large high school worries that the actual

figure might be higher at her school. To find out, she gives an anonymous survey to a random sample of 250 of

the school’s 2800 students. All 250 respond, and 51 admit to sending sexts. Carry out a significance test at the

0.01α = significance level. What conclusion should the counselor draw?

One-sample z test for p

0 : 0.148

: 0.148a

H p

H p

=

>

p is the proportion of students at the school who would admit to having sent sexts.

• Random? The problem states that the counselor selected a random sample of students from the school.

• Normal? ( )( )

( )( )

0

0

250 0.148 37 10

250 0.852 213 10

np

nq

= = ≥

= = ≥

• 10% Condition: 250 is less than 10% of the 2800 students at the school.

51ˆ 0.204

250p = =

( )( )

0

0 0

ˆ 0.204 0.1482.4935

0.148 0.852

250

p pz

p q

n

− −= = ≈ ( )-value 2.4935 0.0063P P z= ≥ ≈

Since -value 0.01,P α< = we reject 0 .H There is convincing evidence that more than 14.8% of the students at the school

would admit to having sent sexts.

Example: When the accounting firm AJL and Associates audits a company’s financial records for fraud, they

often use a test based on Benford’s law. Benford’s law states that the distribution of first digits in many real-life

sources of data is not uniform. In fact, when there is no fraud, about 30.1% of the numbers in financial records

begin with the digit 1. However, if the proportion of numbers in the financial records that begin with the digit 1

is significantly different from 0.301 in a random sample of records, AJL and Associates does a much more

thorough investigation of the company. Suppose that a random sample of 300 expenses from a company’s

financial records results in only 68 expenses that begin with the digit 1. Should AJL and Associates do a more

thorough investigation of this company? Use 0.05.α =

One-sample z test for p

0 : 0.301

: 0.301a

H p

H p

=

≠ p is the proportion of the company’s reported expenses that begin with the digit 1.

• Random? AJL & Associates selected a random sample of expenses.

• Normal? ( )( )

( )( )

0

0

300 0.301 90.3 10

300 0.699 209.7 10

np

nq

= = ≥

= = ≥

• 10% Condition: It is reasonable to assume that the company’s financial records include more than

10(300) = 3000 expenses.

68ˆ 0.2267

300p = ≈

( )( )

0

0 0

ˆ 0.2267 0.3012.8069

0.301 0.699

300

p pz

p q

n

− −= = ≈ − ( )( )

( )

-value 2 2.8069

2 0.0025 0.0050

P P z= ≤ −

= =

Since -value 0.05,P α< = we reject 0 .H There is convincing evidence that the proportion of the company’s reported

expenses that begin with the digit 1 differs from 0.301. It appears that the true proportion is less than 0.301.

Page 11: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Confidence Intervals and Two-Sided Hypothesis Tests

There is a link between confidence intervals and two-sided tests. The 95% confidence interval gives an

approximate range of 0p ’s that would not be rejected by a two-sided test at the 0.05α = significance level.

With proportions, the link isn’t perfect because the standard error used for the confidence interval is based on

ˆ ,p but the standard deviation used to find the test statistic is based on the value of 0p from the null hypothesis.

One advantage of using a confidence interval rather than a hypothesis test is that the interval gives a range of

plausible values for .p The hypothesis test only rules out or fails to rule out one specific value of .p

• If the hypothesized value is included in the %C confidence interval, it is a plausible value for the

parameter at the 1α C= − significance level. Fail to reject the null hypothesis for a two-sided test.

• If the hypothesized value is not included in the %C confidence interval, it is not a plausible value for

the parameter at the 1α C= − significance level. Reject the null hypothesis for a two-sided test.

Example: For the previous example, find and interpret a 95% confidence interval for the true proportion of the

company’s reported expenses that begin with the digit 1. Use your interval to decide whether this company

should be investigated for fraud.

One-sample z interval for p

• Random? AJL & Associates selected a random sample of expenses.

• Normal? ( )( )

( )( )

ˆ 300 68 300 68 10

ˆ 300 232 300 232 10

np

nq

= = ≥

= = ≥

• 10% Condition: It is reasonable to assume that the company’s financial records include more than

10(300) = 3000 expenses.

( ) ( )*0.22667 0.77333ˆ ˆ

ˆ 0.22667 1.960 0.22667 0.04738300

pqp z

n± = ± = ± (0.17929, 0.27404)

We are 95% confident that the interval (0.17929, 0.27404) contains the true proportion of the company’s

reported expenses that begin with the digit 1. Since 0.301 is not in this interval, it is not a plausible value for the

true proportion. We should reject 0H and the company should be investigated for fraud.

Page 12: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

t=-2.01

0.02404

t=2.01

0.02404

P-value=Total Area=

0.04808

Tests about a Population Mean

One-Sample t Test for a Mean

To test the hypothesis 0 0:H μ μ=

Test statistic: 0x μ

ts

n

−= with 1n − degrees of freedom

P-value: Find the probability of getting a t statistic this large or larger in the direction specified by the

alternative hypothesis aH in a t distribution with df 1.n= − The P-value is the shaded area. For a two-tailed

test, it is the total area in both tails.

Examples: Give the p-value for the given hypotheses, test statistic, and sample size.

a) 0

: 10 sec

: 10 sec

1.56

12

a

H μ

H μ

t

n

=

<

= −

=

b) 0

: 160 lbs

: 160 lbs

0.72

35

a

H μ

H μ

t

n

=

>

=

=

c) 0

: 5°F

: 5°F

2.01

75

a

H μ

H μ

t

n

= −

≠ −

= −

=

( )tcdf , 1.56,df 11−∞ − = ( )tcdf 0.72, ,df 34∞ = ( )2*tcdf , 2.01,df 74−∞ − =

Things you must write on hypothesis test problems for a mean:

1. Name the procedure. (“One-sample t test for μ”)

2. Write the appropriate null and alternative hypotheses. Make sure to define the parameter (define

μ in context).

3. Check conditions:

• Random: The data must come from a random sample from the population of interest.

• Normal/Large Sample Size: The population distribution is Normal or the sample size is large

( 30).n ≥ If 30n < and the population distribution has an unknown shape, DRAW A GRAPH of

the sample data. (Don’t just look at it on your calculator – you have to actually draw it!) As long as

the graph doesn’t show strong skewness or outliers, it’s okay to use t procedures. (No strong

skewness or outliers means it’s plausible that the population distribution is Normal, but don’t go

into that much detail when writing down your check!)

• Independent: Observations must be independent (coin flips, die rolls, sampling with replacement,

etc). If sampling without replacement (which is usually the case), observations are not

independent, but this will have very little effect on the accuracy of the calculations as long as the

10% condition is met – the population must be at least 10 times as large as the sample.

t=-1.56

0.07352

P-value=

Area=0.2382

t=0.72

P-value=Area=

Page 13: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

3433323130292827262524232221

Speed (mph)

4. Compute the test statistic and P-value. Also state the number of degrees of freedom. It is not

necessary to show work on this part.

5. Report the results in context. Use the wording below. Don’t improvise.

• “Since -value ,P α< we reject 0.H There is convincing evidence that [alternative hypothesis in

context].”

• “Since -value ,P α> we fail to reject 0.H There is not convincing evidence that [alternative

hypothesis in context].”

One-Sample t Test on TI-83/TI-84 Calculators.

1. Choose “T-Test” on the STAT → TESTS menu.

2. Choose “Data” if you have a list of sample data. Choose “Stats” if you have values for x and .x

s

3. Enter the requested information:

µ0: value in the hypotheses

For “Data” option, input the sample values into a list and indicate which list they are in.

For “Stats” option,

x : sample mean

Sx: sample standard deviation

n: sample size

4. Specify whether the alternative hypothesis has a , , or < > ≠ sign.

5. Choose “Calculate” to see results, or “Draw” to see a shaded t curve.

Example: Every road has one at some point—construction zones that have much lower speed limits. To see if

drivers obey these lower speed limits, a police officer used a radar gun to measure the speed (in mph) of a

random sample of 10 drivers in a 25 mph construction zone. Here are the results:

27 33 32 21 30 30 29 25 27 34

a) Can we conclude that the average speed of drivers in this construction zone is greater than the posted 25

mph speed limit? One-sample t test for µ

0 : 25 mph

: 25 mpha

H μ

H μ

=

> µ is the mean speed of all drivers in this construction zone.

• Random? The problem states that the police officer selected a random sample of drivers.

• Normal? The dotplot does not show strong skewness or outliers, so

it’s safe to proceed.

• 10% condition: It’s reasonable to assume that more that 10(10) = 100

drivers use the construction zone.

28.8 mph

3.938 mph

10

x

x

s

n

=

=

=

0 28.8 253.051

3.938 10x

x μt

s n

− −= = = df 9= ( )-value 3.051 0.00688P P t= ≥ =

Since -value 0.05,P α< = we reject 0 .H There is convincing evidence that the mean speed of drivers in the construction

zone exceeds 25 mph.

b) Given your conclusion in part (a), which kind of mistake—a Type I or a Type II error—could you have

made? Explain what this mistake means in this context. Since we rejected

0 ,H we might have made a Type I error. This would mean that we concluded that the mean speed of

drivers in the construction zone exceeds 25 mph when it actually doesn’t.

Page 14: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Example: In the children’s game Don’t Break the Ice, small plastic ice cubes are

squeezed into a square frame. Each child takes turns tapping out a cube of “ice” with a

plastic hammer, hoping that the remaining cubes don’t collapse. For the game to work

correctly, the cubes must be big enough so that they hold each other in place in the

plastic frame but not so big that they are too difficult to tap out. The machine that

produces the plastic cubes is designed to make cubes that are 29.5 mm wide, but the

actual width varies a little. To ensure that the machine is working well, a supervisor

inspects a random sample of 50 cubes every hour and measures their width. The

Fathom output to the right summarizes the data from a sample taken during one hour.

a) Interpret the standard deviation and the standard error provided by the

computer output.

Standard deviation, 0.0935 mm:xs = The widths of the individual blocks in the sample

typically differ from the sample mean width by about 0.0935 mm.

Standard error, 0.0132 mm:xSE = The mean widths of samples of size 50 typically differ from the true mean width of the

population (all blocks produced in this hour) by about 0.0132 mm.

b) Do these data give convincing evidence that the mean width of cubes produced this hour is not 29.5

mm? Use 0.05.α =

One-sample t test for µ

0 : 29.5 mm

: 29.5 mma

H μ

H μ

=

≠ µ is the true mean width of cubes produced during this hour.

• Random? The supervisor inspects a random sample of 50 cubes each hour.

• Normal? 50 30n = ≥

• 10% condition? It’s reasonable to assume that the machine produces more than 10(50) = 500 cubes each hour.

0 29.4874 29.50.953

0.0934676 50x

x μt

s n

− −= = = − df 49= ( )-value 2 0.953 0.345P P t= ⋅ ≤ − =

Since -value 0.05,P α> = we fail to reject 0 .H There is not convincing evidence that the true mean width of cubes

produced during this hour differs from 29.5 mm.

c) Here is Fathom output for a 95% confidence interval for the

true mean width of plastic ice cubes produced this hour.

Interpret the confidence interval. Would you make the same

conclusion with the confidence interval as you did with the

significance test in part (b)? Explain.

We are 95% confident that the interval from 29.4609 mm to 29.514 mm

captures the true mean width of blocks produced during this hour. Since

29.5 mm is included in the 95% confidence interval, it is a plausible value for the true mean width. We fail to reject 0H at

the 0.05α = significance level.

d) Interpret the confidence level.

If we were to repeat the sampling method many times and construct a 95% confidence interval each time, about 95% of

the resulting intervals would capture the true mean width of the cubes produced during this hour.

Collection 1

Width

mm29.4874

50

mm0.0934676

mm0.0132183

mm29.2717

mm29.4225

mm29.4821

mm29.5544

mm29.7148

S1 = ( )meanS2 = ( )countS3 = ( )stdDevS4 = ( )stdErrorS5 = ( )minS6 = ( )Q1S7 = ( )medianS8 = ( )Q3S9 = ( )max

Estimate of Collection 1 Estimate Mean

Attribute (numeric): Width

Interval estimate for population mean of Width

Count: 50

Mean: 29.4874 mm

Std dev: 0.0934676 mm

Std error: 0.0132183 mm

Confidence level: 95.0 %

Estimate: 29.4874 mm +/- 0.0265632 mm

Range: 29.4609 mm to 29.514 mm

Page 15: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Power: The power of a test is the probability that the test will find convincing evidence for aH (correctly reject

0H ) when a specific alternative value of the parameter is true. Higher power gives a better chance of detecting

something interesting/fishy going on when it actually is going on.

Power and Type II Error: The power of a test to detect a specific alternative parameter value is related to the

probability of making a Type II error ( β ) for that alternative: Power 1 .β= −= −= −= −

Some different ways to think about power:

• Power is the probability of correctly rejecting 0H when that is the correct decision because the actual

value of the parameter is something different from the value in the hypotheses.

• Power is the probability of avoiding a Type II error.

• Power is the probability that a hypothesis test will detect a difference from the hypothesized value when

that difference is actually present.

These factors influence the power of a test:

• The larger the difference between the hypothesized value and the true value of the population parameter,

the higher the power will be – values of the parameter that are close to the hypothesized value are harder

to detect (lower power) that values that are far from the parameter.

• The larger the significance level, ,α the higher the power of the test – a test at a 5% significance level

will have a greater chance of rejecting the null hypothesis than a 1% test because the strength of

evidence required for rejection is less.

• The larger the sample size, the higher the power of the test – more data will make the sample statistic a

more precise estimate of the true value of the parameter (there will be less variation in the statistic from

sample to sample), so it is easier to detect small differences from the hypothesized value.

Relationships between Type I and Type II Errors and Power

• As α increases, β decreases (power increases).

• As α decreases, β increases (power decreases).

Example:

• Situation 1: A teacher is bound and determined to catch cheaters.

o The teacher is unlikely to let a guilty student get away with cheating (low β ).

o There’s a good chance the teacher will catch a student who really is cheating (high power).

o There is also a good chance that the teacher will accuse an innocent student of cheating (high α ).

• Situation 2: A teacher is hesitant to accuse any student of cheating.

o There’s a good chance that a guilty student will get away with cheating (high β ).

o It is less likely that the teacher will catch a student who actually is cheating (low power).

o The teacher is unlikely to accuse an innocent student of cheating. (low α ).

Ways to Increase Power:

1. Increase the significance level .α

2. Increase the sample size.

Sample Size, Errors, and Power:

The more information you have, the more precise and accurate your test is likely to be. With very large samples,

you can have high power and still use a lower value for .α

In general, to increase power, choose as high an α level as you are willing to risk based on how serious the

consequences of a Type I error are and use the largest sample size you can afford!

Page 16: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

Cautions:

Statistical significance is not the same thing as practical importance. When large samples are

available, even tiny deviations from the null hypothesis can be statistically significant, but that doesn’t

mean they are important. For example, if a weight loss medication results in an average weight loss of

0.1 lbs. over the course of a year, that may end up being significant, but it doesn’t mean anyone should

be running to the store to buy the medication.

The results of a hypothesis test don’t tell the whole story. Make sure to always plot your data and

look at a graph before attaching too much importance to your results. Are there outliers or other

deviations from a consistent pattern? A few outliers can produce highly significant results if you blindly

apply common significance tests. Outliers can also destroy the significance of otherwise-convincing

data.

Don’t ignore lack of significance. “Absence of evidence is not evidence of absence”. If a test fails to

reject the null hypothesis, that doesn’t mean that the null hypothesis is true. It may just mean that more

information is needed to prove it false.

Statistical inference is not valid for all sets of data. Badly designed surveys or experiments often

produce invalid results. Formal statistical inference cannot correct basic flaws in study design. Each test

is valid only in certain circumstances, with properly produced data being particularly important.

Beware of multiple analyses. If you perform the same study twenty times and only 1 of the 20 studies

yields a significant result, this shouldn’t overthrow the other 19 studies. At an 0.05α = significance

level, we would expect an average of 1 out of 20 tests to result in a Type I error when there is no

significant effect present.

Examples:

a) Suppose that you wanted to know the average GPA for students at your school who are enrolled in AP

Statistics. Since this isn’t a large population, you conduct a census and record the GPA for each student.

Is it appropriate to construct a one-sample t interval for the mean GPA of AP Statistics students at your

school? Why or why not?

Since we have data about the entire population, it’s not necessary to perform inference. We know the

actual value of µ , the mean GPA of AP Statistics students at our school.

b) Suppose that 20 significance tests were conducted and in each case the null hypothesis was true. What is

the probability that we make at least one Type I error in the 20 tests if we use an 0.05α = significance

level in each test?

( )

( ) ( )

( ) ( )

20

don't make a Type I error in 1 test 0.95

no Type I errors in 20 tests 0.95 0.3585

at least 1 Type I error 1 no Type I errors 1 0.3585 0.6415

P

P

P P

=

= ≈

= − ≈ − =

There is about a 64% chance of making at least one Type I error in 20 tests.

Page 17: Ch. 9 Notes (finished) · AP Statistics – Ch. 9 Notes Significance Tests A significance test (also called a hypothesis test) is a formal procedure for using observed data from a

“So, uh, we did the green study again and got no link. It was probably a – ” “RESEARCH CONFLICTED ON

GREEN JELLY BEAN/ACNE LINK; MORE STUDY RECOMMENDED!”

XKCD COMICS

Permanent link to this comic: http://xkcd.com/882/