Lecture slides stats1.13.l19.air

47
Statistics One Lecture 19 Chi-square tests

description

Lecture slides stats1.13.l19.air

Transcript of Lecture slides stats1.13.l19.air

Page 1: Lecture slides stats1.13.l19.air

Statistics One

Lecture 19 Chi-square tests

Page 2: Lecture slides stats1.13.l19.air

Two segments

•  Chi-square goodness of fit •  Chi-square test of independence

2

Page 3: Lecture slides stats1.13.l19.air

Lecture 19 ~ Segment 1

Chi-square goodness of fit

Page 4: Lecture slides stats1.13.l19.air

Chi-square tests

•  All of the analyses covered thus far in the course have assumed that the outcome variable is a normally distributed continuous variable •  Interval variable •  Ratio variable

4

Page 5: Lecture slides stats1.13.l19.air

Chi-square tests

•  What if the outcome variables is categorical? – For example, nominal variables •  Diagnosis (positive, negative) •  Verdict (guilty, innocent) •  Vote (candidate A, candidate B, candidate C)

5

Page 6: Lecture slides stats1.13.l19.air

Chi-square tests

•  Chi-square goodness of fit statistic •  Chi-square test of independence

•  Both can be used in either experimental or correlational research

6

Page 7: Lecture slides stats1.13.l19.air

Chi-square tests

•  Chi-square goodness of fit statistic – Determines how well a distribution of

proportions “fits” an expected distribution

–  In election polls, is there a statistically significant difference in voter preference among candidates?

7

Page 8: Lecture slides stats1.13.l19.air

Chi-square tests

•  Chi-square test of independence – Determines whether there is a relationship

between two categorical variables

–  In election polls, is there a relationship between voter gender and preference among candidates?

8

Page 9: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

•  New York City mayoral election – Assume a small poll was conducted (N=60) – Do you intend to vote for: •  Christine Quinn •  Joseph Lhota •  Other

9

Page 10: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

Quinn Lhota Other

23 12 25

10

Page 11: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

•  Null hypothesis – Equal proportions

•  Alternative hypothesis – Unequal proportions

11

Page 12: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

χ2 = Σ [(O - E)2 / E] O = Observed E = Expected df = # of categories – 1 p-value depends on χ2 and df

12

Page 13: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

13

Page 14: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

To estimate effect size Cramér’s V (or Phi)

Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of categories

14

Page 15: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

Quinn Lhota Other

20 (E) 20 (E) 20 (E)

23 (O) 12 (O) 25 (O)

15

Page 16: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

χ2 = Σ [(O - E)2 / E] df = # of categories – 1

16

Page 17: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

O E (O – E) (O – E)2 (O – E)2 / E

Quinn 23 20 3 9 0.45

Lhota 12 20 -8 64 3.20

Other 25 20 5 25 1.25

Total 60 60 0 98 4.90

17

Page 18: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

χ2 = Σ [(O - E)2 / E] χ2 = 4.90, df = 2 p = .09 ∴ Retain the null hypothesis and conclude that the slight preferences observed here are not statistically significant

18

Page 19: Lecture slides stats1.13.l19.air

Chi-square goodness of fit

To estimate effect size Cramer’s V (or Phi)

Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(4.90 / 60(3 – 1)) = 0.20

19

Page 20: Lecture slides stats1.13.l19.air

Dataframe in R (Election) Voter.ID   Candidate   Gender  

1   Quinn   M  

2   Quinn   F  

3   Other   F  

4   Lhota   M  

5   Other   M  

….   …   …  

20

Page 21: Lecture slides stats1.13.l19.air

Chi-square goodness of fit in R

> Observed <-- table(Election$Candidate) > chisq.test(Observed)

21

Page 22: Lecture slides stats1.13.l19.air

Chi-square goodness of fit in R

22

Page 23: Lecture slides stats1.13.l19.air

Segment summary

•  Chi-square tests are used when outcome and predictor variables are all categorical

•  Chi-square goodness of fit is an NHST •  Cramér’s V estimates effect size

23

Page 24: Lecture slides stats1.13.l19.air

END SEGMENT

Page 25: Lecture slides stats1.13.l19.air

Lecture 19 ~ Segment 2

Chi-square test of independence

Page 26: Lecture slides stats1.13.l19.air

Chi-square test of independence

•  Determines whether there is a relationship between two categorical variables

–  In election polls, is there a relationship between voter gender and preference among candidates?

26

Page 27: Lecture slides stats1.13.l19.air

Chi-square test of independence

•  New York City mayoral election – Assume a small poll was conducted (N=200) – More males than females (n = 140, n = 60) – Do you intend to vote for: •  Christine Quinn •  Joseph Lhota •  Other

27

Page 28: Lecture slides stats1.13.l19.air

Chi-square test of independence

Quinn Lhota Other

Female 40 10 10

Male 90 40 10

28

Page 29: Lecture slides stats1.13.l19.air

Chi-square test of independence

•  Null hypothesis – There is no relationship between voter gender

and voter preference •  Alternative hypothesis – There is a relationship between voter gender and

voter preference

29

Page 30: Lecture slides stats1.13.l19.air

Chi-square test of independence

χ2 = Σ [(O - E)2 / E] df = (# of rows - 1) * (# of columns - 1) p-value depends on χ2 and df

30

Page 31: Lecture slides stats1.13.l19.air

Chi-square test of independence

31

Page 32: Lecture slides stats1.13.l19.air

Chi-square test of independence

To estimate effect size Cramér’s V (or Phi)

Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of rows or # of categories (whichever is less)

32

Page 33: Lecture slides stats1.13.l19.air

Chi-square test of independence

•  Compute the expected frequencies – The proportion of male and female voters for

each candidate should be the same as the overall voter preference rates

33

Page 34: Lecture slides stats1.13.l19.air

Chi-square test of independence

•  Compute the expected frequencies E = (R/N)*C E: Expected frequency R: # of entries in the cell’s row N: total # of entries C: # of entries in the cell’s column

34

Page 35: Lecture slides stats1.13.l19.air

Chi-square test of independence

Quinn Lhota Other Sum (R)

Female 40 10 10 60

Male 90 40 10 140

Sum (C) 130 50 20 200

35

Page 36: Lecture slides stats1.13.l19.air

Chi-square test of independence

Quinn Lhota Other Sum (R)

Female (60/200)*130 39

(60/200)*50 15

(60/200)*20 6

60

Male (140/200)*130 91

(140/200)*50 35

(140/200)*20 14

140

Sum (C) 130 50 20 200

36

E = (R/N)*C

Page 37: Lecture slides stats1.13.l19.air

Chi-square test of independence

O E (O – E) (O – E)2 (O – E)2 / E

F / Quinn 40   39   1   1  0.03

F / Lhota 10   15   -­‐5   25  1.67

F / Other 10   6   4   16  2.67

M / Quinn 90   91   1   1  0.01

M / Lhota 40   35   5   25  0.71

M / Other 10 14 -4 16 1.14

Sum 200 200 0 84 6.23

37

Page 38: Lecture slides stats1.13.l19.air

Chi-square test of independence χ2 = Σ [(O - E)2 / E] χ2 = 6.23, df = 2 p = .04 ∴ Reject the null hypothesis and conclude that the there is a significant relationship between gender of the defendant and verdict 38

Page 39: Lecture slides stats1.13.l19.air

Chi-square test of independence

To estimate effect size Cramér’s V (or Phi)

Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(6.23 / 200(2 – 1)) = .18

39

Page 40: Lecture slides stats1.13.l19.air

Dataframe in R (Election) Voter.ID   Candidate   Gender  

1   Quinn   M  

2   Quinn   F  

3   Other   F  

4   Lhota   M  

5   Other   M  

….   …   …  

40

Page 41: Lecture slides stats1.13.l19.air

Chi-square test in R

> Observed = table(Election$Candidate, Election$Gender) > chisq.test(Observed)

41

Page 42: Lecture slides stats1.13.l19.air

Chi-square test in R

42

Page 43: Lecture slides stats1.13.l19.air

Assumptions

•  Adequate expected cell counts – A common rule is 5 or more in all cells of a 2-

by-2 table, and 5 or more in 80% of cells in larger tables, and no cells with zero.

– When this assumption is not met, Fisher’s exact test, a non-parametric test, is recommended.

43

Page 44: Lecture slides stats1.13.l19.air

Assumptions

•  Independence – The observations are assumed to be independent of

each other. – This means chi-squared cannot be used to test

correlated data (like matched pairs or panel data). –  In such cases McNemar’s test of dependent

proportions is recommended.

44

Page 45: Lecture slides stats1.13.l19.air

Segment summary

•  Chi-square tests are used when outcome and predictor variables are all categorical

•  Chi-square test of independence is an NHST •  Cramér’s V estimates effect size •  Assumptions – Adequate expected cell counts –  Independence

45

Page 46: Lecture slides stats1.13.l19.air

END SEGMENT

Page 47: Lecture slides stats1.13.l19.air

END LECTURE 19