1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

25
1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western

Transcript of 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

Page 1: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

1

Chapter 14

Preprocessing the Data,

And Cross-Tabs

© 2005 Thomson/South-Western

Page 2: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

2

Figure 1: Histogram and Frequency Polygon of Incomes of Families in Car Ownership Study

0

5

10

15

20

250

k

15

k

25

k

35

k

45

k

55

k

65

k

75

k

85

k

95

k

10

5k

Page 3: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

3

Figure 2: Cumulative Distribution of Incomesof Families in Car Ownership Study

0

20

40

60

80

100

120

0k

15

k

25

k

35

k

45

k

55

k

65

k

75

k

85

k

95

k

10

5k

Page 4: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

4

Family Income and Number of Cars Family Owns

Number of Cars

Income

Less than $37,500

More than $37,500

TOTAL

1 or None 2 or More Total

48

27

75

6

19

25

54

46

100

Page 5: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

5

Number of Cars by Family Income

Number of Cars

Income

Less than $37,500More than $37,500

1 or None 2 or More Total

89%

59%

11%

41%

100%

100%

# of Cases

54

46

Page 6: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

6

Family Income by Number of Cars

Number of Cars

Income

Less than $37,500

More than $37,500

Total

1 or None 2 or More

64%

36%

100%

(75)

24%

76%

100%

(25)(Number of Cases)

Page 7: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

7

Number of Cars and Size of Family

Number of Cars

Size of Family

4 or Less

5 or More

Total

1 or None 2 or More Total

70

5

75

8

17

25

78

22

100

Page 8: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

8

Number of Cars by Size of Family

Number of Cars

Size of Family

4 or Less

5 or More

1 or None 2 or More Total

90%

23%

10%

77%

100%

100%

# of Cases

(78)

(22)

Page 9: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

9

Number of Cars by Income and Size of Family

Income

Less than $37,500

More than $37,500

TOTAL

44

26

70

2

6

8

46

32

78

1 orNone

2 orMore

Total

4

1

5

4

13

17

8

14

22

1 orNone

2 orMore

Total

48

27

75

6

19

25

54

46

100

1 orNone

2 orMore

Total

Four Members or Less:

Total Number of Cars

Number of Cars Number of Cars

Five Members or More:

Page 10: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

10

Number of Cars by Income and Size of Family

Income

Less than $37,500

More than $37,500

96%

81%

4%

19%

100% (46)

100% (32)

1 orNone

2 orMore

Total

50%

7%

50%

93%

100% (8)

100% (14)

1 orNone

2 orMore

Total

89%

59%

11%

41%

100% (54)

100% (46)

1 orNone

2 orMore

Total

Four Members or Less:

Total Number of Cars

Number of Cars Number of Cars

Five Members or More:

Page 11: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

11

Car Ownership for Small, Below Average Income Families

Number of Cars

Income

Less than $37,500

1 or None 2 or More Total

96%

4% 100% (46)

Page 12: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

12

Percentage of Families Owning Two or More Cars by Income

Number of Cars

Income

Less than $37,500

4 or Less 5 or More Total

4% 50%

11% (6)

More than $37,500

19%

93%

41% (19)

Page 13: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

13

Conditions That Can Arise with the Introduction of an Additional Variable into a Cross Tabulation

With the Additional VariableInitialSituation Change

ConclusionRetainConclusion

SomeRelationship

Refine Explanation

Reveal SpuriousExplanationProvide LimitingConditions

A.

B.

C.II

IVNoRelationship

I

III

Page 14: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

14

The Researcher’s Dilemma

True Situation

Researcher’sConclusion

NoRelationship

SomeRelationship

NoRelationship

SomeRelationship

CorrectDecision

SpuriousCorrelation

Correct Decisionif ConcludedRelationship isof Proper Form

SpuriousNoncorrelation

Page 15: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

15

Source:

Appendix 14A

Chi-Square Tests

Page 16: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

16

Measures of Association for Nominal Data

Measures Appropriate for Nominal Data

* Contingency Table (Chi-Square)

* Contingency Coefficient

* Index of Predictive Association

Page 17: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

17

Family Size: 4 or less

5 or more

#Cars: 0 or 1 2+

70 8

5 17

75 25

78

22

100

Frequencies of Combinations of Row (i) and Column (j)

Cross Tabulations

Page 18: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

18

H0: Row variable independent of column variable; No association between family size & #cars

analogous to: “no correlation”

Cross-Tabs & Chi-Squares

Family Size: 4 or less

5 or more

#Cars: 0 or 1 2+

75 2575% 25%

78 78%

22 22%

100

Page 19: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

19

We’d EXPECT frequencies to be distributed “randomly”; i.e., in proportion to the margins

If Family Size & #Cars are Independent:

Family Size: 4 or less

5 or more

#Cars: 0 or 1 2+

75 2575% 25%

78 78%

22 22%

100

58.5 19.5

16.5 5.5

Page 20: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

20

•If A & B are independent: P(A1B1) = P(A1)P(B1)

Using the Statistical Definition of “Independence” to Calculate the Expected Frequencies

•e11 = nP(A1B1) = 100 (78/100) (75/100) = (78 x 75) 100

Page 21: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

21

Chi-square measures how much our data differ from what we’d expect (given the hypothesis of independence)

Are the row and column variables associated ?

c

j ij

ijijr

i e

eoX

1

2

1

2 )(

Chi-Square Formula

Page 22: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

22

X 2 = ( 70-58.5 ) 2 + ( 8-19.5 ) 2 + (5-16.5 ) 2 + (17-5.5 ) 2

58.5 19.5 16.5 5.5

= 2.261 + 6.782 + 8.015 + 24.046 = 41.104

Is this large?

Chi-Square for Our Data

df= degrees of freedom = ( r-1) ( c-1)

For our 2x2 table, df=1

critical value for X 2 with 1 df = 3.84 (.05)

X 2 = 41.104 exceeds 3.84.

Page 23: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

23

Three-way table:

Example: Family size x #Cars x household income

Log Linear Models

Extension Beyond 2-Way Tables

Page 24: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

24

Equation:

Degrees of Freedom: ( r-1 )

When would you use this statistic? e.g., compare sample to population characteristics, or to previous

study’s benchmark to investigate the great M&M caper:

One-Way Chi-Square

r

i i

ii

e

eoX

1

22 )(

Page 25: 1 Chapter 14 Preprocessing the Data, And Cross-Tabs © 2005 Thomson/South-Western.

25

PLAIN PEANUTei’s oi’s ei’s oi’s

blue brown green orange red yellow

critical chi-square on 5 df = 11.07

The Case of the Blue M&M’s: