© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data...

53
© 2011 Pearson Education, Inc

Transcript of © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data...

Page 1: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Page 2: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Statistics for Business and Economics

Chapter 9

Categorical Data Analysis

Page 3: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Contents

9.1 Categorical Data and the Multinomial Experiment

9.2 Testing Category Probabilities: One-Way Table

9.3 Testing Category Probabilities: Two-Way Contingency Table

9.4 A Word of Caution about Chi-Square Tests

Page 4: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Learning Objectives

1. Discuss qualitative (i.e., categorical) data with more than two outcomes

2. Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable–called a one-way analysis

3. Present a chi-square hypothesis test for relating two qualitative variables–called a two-way analysis

Page 5: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

9.1

Categorical Data andMultinomial Experiment

Page 6: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Qualitative Data

• Qualitative random variables yield responses that can be classified

– Example: gender (male, female)

• Qualitative data that fall in more than two categories often result from a multinomial experiment

Page 7: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Properties of theMultinomial Experiment

1. The experiment consists of n identical trials.

2. There are k possible outcomes to each trial. These outcomes are called classes, categories, or cells.

3. The probabilities of the k outcomes, denoted by p1, p2,…, pk, remain the same from trial to trial,wherep1 + p2 + … + pk = 1.

4. The trials are independent.

5. The random variables of interest are the cell counts, n1, n2, …, nk, of the number of observations that fall in each of the k classes.

Page 8: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

9.2

Testing Category Probabilities: One-Way Table

Page 9: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Multinomial Experiment

In this section, we consider a multinomial experiment with k outcomes that correspond to categories of a single qualitative variable. The results of such an experiment are summarized in a one-way table. The term one-way is used because only one variable is classified. Typically, we want to make inferences about the true proportions that occur in the k categories based on the sample information in the one-way table.

Page 10: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Chi-Square (2) Test for k Proportions

• Tests equality (=) of proportions only– Example: p1 = .2, p2=.3, p3 = .5

• One variable with several levels

• Uses one-way contingency table

Page 11: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

One-Way Contingency Table

Shows number of observations in k independent groups (outcomes or variable levels)

Outcomes (k = 3)

Number of responses

Candidate

Tom Bill Mary Total

35 20 45 100

Page 12: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

A Test of a Hypothesis about Multinomial Probabilities: One-Way Table

H0: p1 = p1,0, p2 = p2,0, …, pk = pk,0

where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities.

Ha: At least one of the multinomial probabilities does not equal its hypothesized value.

Test statistic: 2 = =

ni −Ei⎡⎣ ⎤⎦

2

Ei

Page 13: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

A Test of a Hypothesis about Multinomial Probabilities: One-Way Table

where Ei = npi,0 is the expected cell count–that is, the expected number of outcomes of type i assuming that H0 is true. The total sample size is n.

where has (k – 1) df.

Rejection region: 2 > α2

α2

Page 14: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Conditions Required for a Valid Test: One-way Table

1. A multinomial experiment has been conducted. This is generally satisfied by taking a random sample from the population of interest.

2. The sample size n is large. This is satisfied if for every cell, the expected cell count Ei will be equal to 5 or more.

Page 15: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test Basic Idea

1. Compares observed count to expected count assuming null hypothesis is true

2. Closer observed count is to expected count, the more likely the H0 is true

• Measured by squared difference relative to expected count— Reject large values

Page 16: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Finding Critical Value Example

What is the critical 2 value if k = 3, and α =.05?

20

Upper Tail AreaDF .995 … .95 … .051 ... … 0.004 … 3.8412 0.010 … 0.103 … 5.991

2 Table (Portion)

If ni = E(ni), 2 = 0.

Do not reject H0

df = k - 1 = 2

5.991

Reject H0

α = .05

Page 17: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions?

2 Test for k Proportions Example

Page 18: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

• H0:

• Ha:

• α =

• n1 = n2 = n3 =

• Critical Value(s):

p1 = p2 = p3 = 1/3

At least 1 is different

.05

63 45 72

α = .05

20

Reject H0

5.991

2 Test for k Proportions Solution

Page 19: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

( )( ) ( ) ( ) ( )

,0

1 2 3 180 1 3 60

i iE n np

E n E n E n

=

= = = =

2 Test for k Proportions Solution

( )( )

[ ] [ ] [ ]

2

2

all cells

2 2 263 60 45 60 72 60

6.360 60 60

i i

i

n E n

E nχ

⎡ ⎤−⎣ ⎦=

− − −= + + =

Page 20: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

• H0:

• Ha:

• α =

• n1 = n2 = n3 =

• Critical Value(s):

Test Statistic:

Decision:

Conclusion:

p1 = p2 = p3 = 1/3

At least 1 is different

.05

63 45 72

α = .05

20

Reject H0

5.991

2 Test for k Proportions Solution

2 = 6.3

Reject at α = .05

There is evidence of a difference in proportions

Page 21: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

9.3

Testing Category Probabilities: Two-Way (Contingency) Table

Page 22: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test of Independence

• Shows if a relationship exists between two qualitative variables

– One sample is drawn– Does not show causality

• Uses two-way contingency table

Page 23: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test of Independence Contingency Table

Shows number of observations from one sample jointly in sample qualitative variables

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

Levels of variable 2

Levels of variable 1

Page 24: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Finding Expected Cell Counts fora Two-Way Contingency Table

The estimate of the expected number of observations falling into the cell in row i and column j is given by

where Ri = total for row i, Cj = total for column j, and n = sample size.

E

ij=RiC j

n

Page 25: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

General Form of a Contingency Table Analysis: 2 -Test for Independence

H0: The two classifications are independent.

Ha: The two classifications are dependent.

Test statistic: 2 = =

nij −Eij⎡⎣

⎤⎦2

Eij

E

ij=RiC j

nwhere

Rejection region: where has (r – 1)(c – 1) df.

2 > α

2 , α2

Page 26: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Conditions Required for a Valid 2-Test: Contingency Table

1. A multinomial experiment has been conducted . We may then consider this to be a multinomial experiment with r c possible outcomes.

2. The sample size n is large. This is satisfied if for every cell, the expected count Ei will be equal to 5 or more.

Page 27: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test of Independence Expected Counts

1. Statistical independence means joint probability equals product of marginal probabilities

2. Compute marginal probabilities and multiply for joint probability

3. Expected count is sample size times joint probability

Page 28: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

112 160

Marginal probability =

Expected Count Example

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Page 29: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

78 160

Marginal probability =

Expected Count Example112 160

Marginal probability =

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Page 30: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Expected Count Example

78 160

Marginal probability =

112 160

Marginal probability = Joint probability = 112 160

78 160

Location Urban Rural

House Style Obs. Obs. Total

Split–Level 63 49 112

Ranch 15 33 48

Total 78 82 160

Expected count = 160· 112 160

78 160

= 54.6

Page 31: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Expected Count Calculation

E

ij =

RiC j

n House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63

112·78 160

54.6 49

112·82 160

57.4 112

Ranch 15

48·78 160

23.4 33

48·82 160

24.6 48

Total 78 78 82 82 160

Page 32: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship?

2 Test of Independence Example

House Location House Style Urban Rural Total

Split-Level 63 49 112 Ranch 15 33 48 Total 78 82 160

Page 33: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test of Independence Solution

• H0:

• Ha:

• α = • df = • Critical Value(s):

No Relationship

Relationship

.05(2 – 1)(2 – 1) = 1

20

Reject H0

3.841

α = .05

Page 34: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Eij 5 in all cells

2 Test of Independence Solution

House Location Urban Rural

House Style Obs. Exp. Obs. Exp. Total

Split-Level 63 54.6 49 57.4 112

Ranch 15 23.4 33 24.6 48

Total 78 78 82 82 160

112·82 160

48·78 160

48·82 160

112·78 160

Page 35: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

[ ] [ ] [ ]

[ ] [ ] [ ]

2

2

all cells

2 2 2

11 11 12 12 22 22

11 12 22

2 2 263 54.6 49 57.4 33 24.6

8.4154.6 57.4 24.6

ij ij

ij

n E

E

n E n E n E

E E E

χ⎡ ⎤−⎣ ⎦=

− − −= + + +

− − −= + + + =

L

L

2 Test of Independence Solution

Page 36: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test of Independence Solution

• H0:

• Ha:

• α = • df = • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

No Relationship

Relationship

.05(2 – 1)(2 – 1) = 1

20

Reject H0

3.841

α = .05

2 = 8.41

Reject at α = .05

There is evidence of a relationship

Page 37: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship?

2 Test of Independence Thinking Challenge

Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170Total 132 154 286

Page 38: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test of Independence Solution

• H0:

• Ha:

• α = • df = • Critical Value(s):

No Relationship

Relationship

.05(2 – 1)(2 – 1) = 1

20

Reject H0

3.841

α = .05

Page 39: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Diet Pepsi No Yes

Diet Coke Obs. Exp. Obs. Exp. Total

No 84 53.5 32 62.5 116

Yes 48 78.5 122 91.5 170

Total 132 132 154 154 286

Eij 5 in all cells

170·132 286

170·154 286

116·132 286

154(116) 286

2 Test of Independence Solution*

Page 40: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

[ ] [ ] [ ]

[ ] [ ] [ ]

2

2

all cells

2 2 2

11 11 12 12 22 22

11 12 22

2 2 284 53.5 32 62.5 122 91.5

54.2953.5 62.5 91.5

ij ij

ij

n E

E

n E n E n E

E E E

χ⎡ ⎤−⎣ ⎦=

− − −= + + +

− − −= + + + =

L

L

2 Test of Independence Solution

Page 41: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

2 Test of Independence Solution

• H0:

• Ha:

• α = • df = • Critical Value(s):

Test Statistic:

Decision:

Conclusion:

No Relationship

Relationship

.05(2 – 1)(2 – 1) = 1

20

Reject H0

3.841

α = .05

2 = 54.29

Reject at α = .05

There is evidence of a relationship

Page 42: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors?

2 Test of Independence Thinking Challenge 2

Diet PepsiDiet Coke No Yes TotalNo 84 32 116Yes 48 122 170Total 132 154 286

Page 43: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Low Income

You Re-Analyze the Data

High IncomeDiet Pepsi

Diet Coke No Yes Total No 4 30 34 Yes 40 2 42 Total 44 32 76

Diet Pepsi Diet Coke No Yes Total

No 80 2 82 Yes 8 120 128 Total 88 122 210

Page 44: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

True Relationships

Apparent relation

Underlying causal relation

Control or intervening variable (true cause)

Diet Coke

Diet Pepsi

Page 45: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Moral of the Story

© 1984-1994 T/Maker Co.

Numbers don’t think - People do!

Page 46: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

9.4

A Word of Caution aboutChi-Square Tests

Page 47: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Caution about the 2 Test

The 2 is one of the most widely applied statistical tools and also one of the most abused statistical tool.

Be certain the experiment satisfies the assumptions.

Be certain the sample is drawn from the correct population.

Avoid using when the expected counts are very small.

Page 48: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Caution about the 2 Test• If the 2 value does not exceed the established

critical value of 2 , do not accept the hypothesis of independence. You risk a Type II error. Avoid concluding that two classifications are independent, even when 2 is small.

• If a contingency table 2 value does exceed the critical value, we must be careful to avoid inferring that a causal relationship exists between the classifications. The existence of a causal relationship cannot be established by a contingency table analysis.

Page 49: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Key Ideas

Multinomial Data

Qualitative data that fall into more than two categories (or classes)

Page 50: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Key Ideas

Properties of a Multinomial Experiment1. n identical trials2. k possible outcomes

3. probabilities of the k outcomes (p1, p2, …, pk) remain the same from trial to trial, wherep1 + p2 + … + pk = 1

4. trials are independent5. variables of interest: cell counts (i.e., number of

observations falling into each outcome category), denoted n1, n2, …, nk

Page 51: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Key Ideas

One-Way Table

Summary table for a single qualitative variable

Two-Way (Contingency) Table

Summary table for two qualitative variables

Page 52: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Key Ideas

Chi-Square ( 2) Statistic

used to test category probabilities in one-way and two-way tables

Chi-Square tests for independence

should not be used to infer a causal relationship between 2 Qualitative Variables

Page 53: © 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 9 Categorical Data Analysis.

© 2011 Pearson Education, Inc

Key Ideas

Conditions Required for Valid 2-Tests

1. multinomial experiment2. sample size n is large (expected cell counts

are all greater than or equal to 5)