Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables...

30
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1

Transcript of Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables...

Page 1: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning

More about Inference

for Categorical Variables

Chapter 15

1

Page 2: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 2

Principle Question:

Is there a relationship between the two variables, so that the category into which individuals fall for one variable seems to depend on the category they are in for the other variable?

Page 3: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 3

15.1 Chi-Square Test for Two-Way Tables

• Data displayed in a contingency or two-way table.• Each combination of row/column is a cell of table.• Two types of conditional percents: row and column.• Row percents: percents across a row, based on total

number in the row.• Column percents: percents down a column, based

on total number in the column.• If one variable is explanatory, use it to define rows

and use row percents.

Page 4: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 4

Recall:Five steps for assessing

statistical significance.

Step 1: Null and alternative hypotheses

H0: The two variables are not related.

Ha: The two variables are related.

Sometimes associated is used instead of related.

Page 5: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 5

Example 15.1 Ear Infections and Xylitol

Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol LozengeResponse = Did child have an ear infection?

Only 16.2% of children in Xylitol Gum group had infection.

Page 6: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 6

Example 15.1 Infections and Xylitol

H0: p1 = p2 = p3(no relationship between trt and

outcome)

Ha: p1, p2 , p3 are not all the same (there is a relationship)

Let p1 = proportion who would get an ear infection

in the population given placebo gum p2 = proportion who would get an ear infection

in the population given xylitol gum p3 = proportion who would get an ear infection

in the population given xylitol lozenges

Page 7: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 7

Example 15.2 Making FriendsQ: With whom do you find it easiest to make friend – opposite sex or same sex or no difference?

H0: No difference in distribution of responses of men and women (no relationship between gender and response)

Ha: There is a difference in distribution of responses of men and women (is a relationship between gender and response)

Page 8: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 8

Tech Note: Homogeneity and Independence

Two variations of the general hypotheses statements which depend on the method of sampling.

• If samples have been taken from separate populations, the null hypothesis statement is a statement of homogeneity (sameness) among the populations.

• If a sample has been taken from a single population, and two categorical variables measured for each individual, the statement of no relationship is a statement of independence between the two variables.

Page 9: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 9

Step 2: Chi-square Statistic and Necessary Conditions

Compute expected count for each cell:Expected count = Row total Column total

Total n

Compute test statistic by totaling over all cells: (Observed – Expected)2

Expected 2

Chi-square statistic measures the difference between the observed counts and the counts that would be expected if there were no relationship (i.e. if null hypothesis were true).

Page 10: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 10

More on the Chi-square Statistic

Large difference evidence of a relationship.

Guidelines for large sample:1. All expected counts should be greater than 1.

2. At least 80% of the cells should have an expected count greater than 5.

Page 11: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 11

Example 15.3 Infections and Xylitol

Expected count for “Placebo Gum, Yes Infection” cell:

Expected Counts:

Page 12: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 12

Example 15.3 Infections and Xylitol

Chi-square Test Statistic:

Page 13: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 13

Step 3: p-value of Chi-square Test

p-value = probability the chi-square test statistic could have been as large or larger if the null hypothesis were true.

Large test statistic evidence of a relationship.So how large is enough to declare significance?

Chi-square probability distribution used to find p-value.

Degrees of freedom df = (Rows – 1)(Columns – 1) = (r – 1)(c – 1)

Page 14: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 14

Chi-square Distributions

• Skewed to the right distributions.• Minimum value is 0.• Indexed by the degrees of freedom (df).

Page 15: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 15

Example 15.4 Infections and Xylitol

Chi-square statistic was 6.69 df = (3-1)(2-1) = 2

p-value = 0.035

Page 16: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 16

Finding the p-value from Table A.5:

• If value of statistic falls between two table entries, p-value is between values of p (column headings) for these entries.

• If value of statistic is larger than entry in rightmost column (labeled p = 0.001), p-value is less than 0.001 (p < 0.001).

• If value of statistic is smaller than entry in leftmost column (labeled p = 0.50), p-value is greater than 0.50 (p > 0.50).

Look in corresponding “df” row of Table A.5. Scan across until you find where the statistic falls.

Page 17: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 17

Example 15.5 Infections and Xylitol

There is a statistically significant relationship between the risk of an ear infection and the preventative treatment.

Chi-square statistic was 6.69 df = (3-1)(2-1) = 2

.025 < p-value < .05

Page 18: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 18

Example 15.6 A Moderate p-Value

Table has three rows and three columns.The computed chi-square statistic is 8.12. Degrees of freedom are df = (3 – 1)(3 – 1) = 4.

Finding the p-value:Scan the df = 4 row in Table A.5 and the value of 8.12 is between the entries 7.78 (p = 0.10) and 8.50 (p = 0.075). Thus, the p-value is between 0.075 and 0.10.

0.075 < p-value < 0.10

Page 19: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 19

Steps 4 and 5:Making a Decision andReporting a Conclusion

Two equivalent rules: Reject H0 when …

• p-value 0.05

• Chi-square statistic is greater than the entry in the 0.05 column of Table A.5 (the critical value).

Large test statistic small p-value evidence a real relationship exists in population.

Note: For 2x2 tables, a test statistic of 3.84 or larger is significant.

Page 20: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 20

Reporting a Conclusion

Ways to write “do not reject H0”

• The relationship between smoking and drinking alcohol is not statistically significant.

• The proportions of smokers who never drink, drink occasionally, and drink often are not significantly different from the proportions of non-smokers who do so.

• There is insufficient evidence to conclude that there is a relationship in the population between smoking and drinking alcohol.

Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Page 21: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 21

Reporting a Conclusion

Ways to write “reject H0”

• There is a statistically significant relationship between smoking and drinking alcohol.

• The proportions of smokers who never drink, drink occasionally, and drink often are not the same as the proportions of non-smokers who do so.

• Smokers have significantly different drinking behavior than non-smokers.

Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Page 22: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 22

Example 15.8 Making FriendsQ: With whom do you find it easiest to make friend –

opposite sex or same sex or no difference?

df = (2 – 1)(3 – 1) = 2. Table A.5: value of 8.515 falls between entries in 0.025 column (7.38) and 0.01 column (9.21). 0.01 < p-value < 0.025

There is statistically significant relationship at the 0.05 level.

There appears to be a a difference in distribution of responses of men and women if the populations were asked this question.

Page 23: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 23

Supporting Analyses

• Description of row (or column) percents.

• Bar chart of counts or percents.

• Examination each cell’s “contribution to chi-square.” Cells with largest values have contributed most to significance of relationship deserve attention in any description of relationship.

• Confidence intervals for important proportions or for differences between proportions.

To learn about the specific nature of the relationship:

Page 24: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 24

Chi-Square Test or Z-Test forDifference in Two Proportions?

Does it make a difference?

• If desired Ha has no specific direction (two-sided), the two tests give exactly the same p-value. The squared value of the z-statistic equals the chi-square statistic.

• If desired Ha has a direction (one-sided), the z-test should be used.

Page 25: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 25

15.3 Testing Hypotheses about One Categorical Variable: GOF

Step 1: Determine the null and alternative hypotheses.

H0: The probabilities for k categories are p1, p2, . . . , pk.

Ha: Not all probabilities specified in H0 are correct.

Note: Probabilities in the null hypothesis must sum to 1.

Goodness of Fit (GOF) Test

Page 26: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 26

Goodness of Fit (GOF) Test

Step 2: Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic.

If at least 80% of the expected counts are greater than 5 and none are less than 1, compute

where the expected count for the ith category is computed as npi.

(Observed – Expected)2

Expected 2

Page 27: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 27

Goodness of Fit (GOF) Test

Step 3: Assuming the null hypothesis is true, find the p-value. Use chi-square distribution with df = k – 1.

Step 4: Decide whether or not the result is statistically significant based on the p-value. The result is statistically significant if the p-value .

Step 5: Report the conclusion in the context of the situation.

Page 28: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 28

Example 15.15 Pennsylvania Daily Number

State lottery game: Three-digit number made by drawing a digit between 0 and 9 from each of three different containers.

Focus = draws from the first container. If numbers randomly selected, each value would be equally likely to occur.

H0: p = 1/10 for each of the 10 possible digitsHa: Not H0

Page 29: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 29

Example 15.15 Daily Number

Data: n = 500 days between 7/19/99 and 11/29/00

Page 30: Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Copyright ©2011 Brooks/Cole, Cengage Learning 30

Example 15.15 Daily NumberChi-square goodness of fit statistic:

From Table A.5: df = k – 1 = 10 – 1 = 9 p-value > 0.50

Result is not statistically significant; the null hypothesis is not rejected.