Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

48
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statisti cal Tests

Transcript of Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Page 1: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 1

Chapter 16Statistical Tests

Page 2: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 2

16.1 Concepts of Statistical Tests

A manager is evaluating software to filter SPAM e-mails (cost $15,000). To make it profitable, the software must reduce SPAM to less than 20%. Should the manager buy the software?

Use a statistical test to answer this question Consider the plausibility of a specific claim

(claims are called hypotheses)

Page 3: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 3

16.1 Concepts of Statistical Tests

Null and Alternative Hypotheses

Statistical hypothesis: claim about a parameter of a population.

Null hypothesis (H0): specifies a default course of action, preserves the status quo.

Alternative hypothesis (Ha): contradicts the assertion of the null hypothesis.

Page 4: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 4

16.1 Concepts of Statistical Tests

SPAM Software ExampleLet p = email that slips past the filter

H0: p ≥ 0.20

Ha: p < 0.20

These hypotheses lead to a one-sided test.

Page 5: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 5

16.1 Concepts of Statistical Tests

One- and Two-Sided Tests

One-sided test: the null hypothesis allows any value of a parameter larger (or smaller) than a specified value.

Two-sided test: the null hypothesis asserts a specific value for the population parameter.

Page 6: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 6

16.1 Concepts of Statistical Tests

Type I and II Errors

Reject H0 incorrectly

(buying software that will not be cost effective)

Retain H0 incorrectly

(not buying software that would have been cost effective)

Page 7: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 7

16.1 Concepts of Statistical Tests

Type I and II Errors

indicates a correct decision

Page 8: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 8

16.1 Concepts of Statistical Tests

Other Tests

Visual inspection for association, normal quantile plots and control charts all use tests of hypotheses.

For example, the null hypothesis in a visual test for association is that there is no association between two variables shown in the scatterplot.

Page 9: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 9

16.1 Concepts of Statistical Tests

For Example, in a Normal Quantile PlotH0: Data are a sample from a normally distributed population

There is only a 5% chance of any point lying outside limits.Data are close enough to line; we do not reject H0

Page 10: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 10

16.1 Concepts of Statistical Tests

Test Statistic

Statistical tests rely on the sampling distribution of the test statistic that estimates the parameter specified in the null and alternative hypotheses.

Key question: What is the chance of getting a test statistic this far from H0 if H0 is true?

Page 11: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 11

16.2 Testing the Proportion

SPAM Software Example

Apparent savings of licensing the software depends on the sample proportion.

Page 12: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 12

16.2 Testing the Proportion

SPAM Software Example

The analysis of profitability indicates the manager should reject H0 and license the software only if is

is small enough (less than a threshold).p̂

Page 13: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 13

16.2 Testing the Proportion

SPAM Software Exampleα Level

The threshold for rejecting H0 depends on manager’s willingness to take a chance on licensing software that won’t be profitable

Based on the probability of making a Type I error (designated as α – level of significance)

Page 14: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 14

16.2 Testing the Proportion

SPAM Software Example

Sampling distributions (n=100) for different values of p.

When p = 0.2, there are the most small values of ; therefore, α is set at 5% for this value of p (which is p0).

Page 15: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 15

16.2 Testing the Proportion

SPAM Software Examplez-Test

Assuming p=0.2, find the threshold C such that the probability that a sample with falls below it is 0.05 (shaded area is called rejection region).

Page 16: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 16

16.2 Testing the Proportion

SPAM Software Examplez-Test

P (Z < -1.645) = 0.05

Based on n=100 and SE( ) = 0.04 (note that the hypothesized value p0 = 0.20 is used to calculate SE), then C = 0.2 – 1.645 (0.04) = .01342.

Page 17: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 17

16.2 Testing the Proportion

z–Test for SPAM Software Example (review of 100 e-mails showed 12% spam)

= -2

npp

ppz

/)1(

ˆ

00

0

100/)20.01(20.0

20.012.0

z

Page 18: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 18

16.2 Testing the Proportion

SPAM Software Example

z-Test: test of H0 based on a count of the standard errors separating H0 from the test statistic.

The observed sample proportion is 2 standard errors below p0. Since z < -1.645 the managers rejects H0; the result is statistically significant.

Page 19: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 19

16.2 Testing the Proportion

SPAM Software Example

p-Value: the smallest α level at which H0 can be rejected.

Statistical software commonly reports the p-value of a test.

Page 20: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 20

16.2 Testing the Proportion

SPAM Software Example

The p-value is the area to the left of the observed statistic p̂

Page 21: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 21

16.2 Testing the Proportion

p–Value for SPAM Software Example

Interpret the p-value as a weight of evidence against H0; small values mean that H0 is not

plausible.

02275.0)2()( ZPzZP

Page 22: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 22

16.2 Testing the Proportion

p–Value for SPAM Software Example

Statistically significant: data contradict the null hypothesis and lead us to reject H0 (p-value < α).

The p-value in the SPAM example is less than the typical α of 0.05; should buy the software.

Page 23: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 23

16.2 Testing the Proportion

Type II Error

Power: probability that a test can reject H0.

If a test has little power when H0 is false, it is likely to miss meaningful deviations from the null hypothesis and produce a Type II error.

Page 24: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 24

16.2 Testing the Proportion

Type II Error

Probability of a Type II error if p = 0.15.

Page 25: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 25

16.2 Testing the Proportion

Summary

Page 26: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 26

16.2 Testing the Proportion

Checklist

SRS condition: the sample is a simple random sample from the relevant population.

Sample size condition (for proportion): both np0 and n(1 - p0 ) are larger than 10.

Page 27: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 27

4M Example 16.1: DO ENOUGH HOUSEHOLDS WATCH?

Motivation

The Burger King ad featuring Coq Roq won critical acclaim. In a sample of 2,500 homes, MediaCheck found that only 6% saw the ad. An ad must be viewed by 5% or more of households to be effective. Based on these sample results, should the local sponsor run this ad?

Page 28: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 28

4M Example 16.1: DO ENOUGH HOUSEHOLDS WATCH?

Method

Set up the null and alternative hypotheses.

H0: p ≤ 0.05Ha: p > 0.05

Use α = 0.05. Note that p is the population proportion who watch this ad. Both SRS and sample size conditions are met.

Page 29: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 29

4M Example 16.1: DO ENOUGH HOUSEHOLDS WATCH?

Mechanics

Perform a one-sided z-test for a proportion.

z = 2.3 with p-value of 0.011Reject H0.

500,2/)05.01(05.0

05.006.0

z

Page 30: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 30

4M Example 16.1: DO ENOUGH HOUSEHOLDS WATCH?

Message

The results are statistically significant. We can conclude that more than 5% of households watch this ad. The Burger King Coq Roq ad is cost effective and should be run.

Page 31: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 31

16.3 Testing the Mean

Similar to Tests of Proportions

The hypothesis test of µ replaces with .

Unlike the test of proportions, σ is not specified. Use s from the sample as an estimate of σ to calculate the estimated standard error of .

p̂ X

X

Page 32: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 32

16.3 Testing the Mean

Example: San Francisco Rental Properties

A firm is considering expanding into an expensive area in downtown San Francisco. In order to cover costs, the firm needs rents in this area to average more than $1,500 per month. Are rents in San Francisco high enough to justify the expansion?

Page 33: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 33

16.3 Testing the Mean

Null and Alternative Hypotheses

Let µ = mean monthly rent for all rental properties in the San Francisco area

Set up hypotheses as:H0: µ ≤ µ0 = $1,500

Ha: µ > µ0 = $1,500

Page 34: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 34

16.3 Testing the Mean

t - Statistic

Used is the t-test for µ (since s estimates σ)

The t-statistic, with n-1 df, is

ns

xt

/0

Page 35: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 35

16.3 Testing the Mean

Example: San Francisco Rental Properties Rents obtained for a sample of size n=115; the

average rent was $1,657 with s = $581.

Page 36: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 36

16.3 Testing the Mean

Example: San Francisco Rental Properties

Computing the t-statistic:

t = 2.898 with 114 df; p-value = 0.0023Reject H0 ; mean rent exceeds break-even value.

115/581

500,1657,1 t

Page 37: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 37

16.3 Testing the Mean

Finding the p-Value in the t-Table

Use df = 100 (closest to 114 without going over)t = 2.898 falls between 2.626 and 3.174

Page 38: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 38

16.3 Testing the Mean

Summary

Page 39: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 39

16.3 Testing the Mean

Checklist

SRS condition: the sample is a simple random sample from the relevant population.

Sample size condition. Unless it is known that the population is normally distributed, a normal model can be used to approximate the sampling distribution of if n is larger than 10 times the absolute value of kurtosis, .

X410Kn

Page 40: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 40

4M Example 16.2: COMPARING RETURNS ON INVESTMENTS

Motivation

Does stock in IBM return more, on average, than T-Bills? From 1990 through 2011, T-Bills returned 0.3% each month.

Page 41: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 41

4M Example 16.2: COMPARING RETURNS ON INVESTMENTS

Method

Let µ = mean of all future monthly returns for IBM stock. Set up the hypotheses as

H0: µ ≤ 0.003Ha: µ > 0.003

Sample consists of monthly returns on IBM for 264 months (January 1990 – December 2011)

Page 42: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 42

4M Example 16.2: COMPARING RETURNS ON INVESTMENTS

Mechanics

Sample yields = 0.0126 with s = 0.0827.

t = 1.886 with 263 df; p-value = 0.0302

x

ns

xt

/

0

264/0827.0

003.00126.0 t

Page 43: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 43

4M Example 16.2: COMPARING RETURNS ON INVESTMENTS

Message

Monthly IBM returns from 1990 through 2011 earned statistically significantly higher gains than comparable investments in U.S. Treasury Bills during this period (about 1.3%versus 0.3%).

Page 44: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 44

16.4 Significance vs Importance

Statistical significance does not mean that you have made an important or meaningful discovery.

The size of the sample affects the p-value of a test. With enough data, a trivial difference from H0 leads to a statistically significant outcome.

Page 45: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 45

16.5 Confidence Interval or Test?

A confidence interval provides a range of parameter values that are compatible with the observed data.

A test provides a precise analysis of a specific hypothesized value for a parameter.

Most people understand the implications of confidence intervals more readily than tests.

Page 46: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 46

Best Practices

Pick the hypotheses before looking at the data.

Choose the null hypothesis on the basis of profitability.

Pick the α-level first, taking into account both types of error.

Think about whether α = 0.05 is appropriate for each test.

Page 47: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 47

Best Practices (Continued)

Make sure to have an SRS from the right population.

Use a one-sided test.

Report a p–value to summarize the outcome of a test.

Page 48: Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 16 Statistical Tests.

Copyright © 2014, 2011 Pearson Education, Inc. 48

Pitfalls

Do not confuse statistical significance with substantive importance.

Do not think that the p–value is the probability that the null hypothesis is true.

Avoid cluttering a test summary with jargon.