Lecture slides stats1.13.l19.air

Statistics One

Lecture 19 Chi-square tests

Two segments

•  Chi-square goodness of fit •  Chi-square test of independence

2

Lecture 19 ~ Segment 1

Chi-square goodness of fit

Chi-square tests

•  All of the analyses covered thus far in the course have assumed that the outcome variable is a normally distributed continuous variable •  Interval variable •  Ratio variable

4

Chi-square tests

•  What if the outcome variables is categorical? – For example, nominal variables •  Diagnosis (positive, negative) •  Verdict (guilty, innocent) •  Vote (candidate A, candidate B, candidate C)

5

Chi-square tests

•  Chi-square goodness of fit statistic •  Chi-square test of independence

•  Both can be used in either experimental or correlational research

6

Chi-square tests

•  Chi-square goodness of fit statistic – Determines how well a distribution of

proportions “fits” an expected distribution

–  In election polls, is there a statistically significant difference in voter preference among candidates?

7

Chi-square tests

•  Chi-square test of independence – Determines whether there is a relationship

between two categorical variables

–  In election polls, is there a relationship between voter gender and preference among candidates?

8


•  New York City mayoral election – Assume a small poll was conducted (N=60) – Do you intend to vote for: •  Christine Quinn •  Joseph Lhota •  Other

9


Quinn Lhota Other

23 12 25

10


•  Null hypothesis – Equal proportions

•  Alternative hypothesis – Unequal proportions

11


χ2 = Σ [(O - E)2 / E] O = Observed E = Expected df = # of categories – 1 p-value depends on χ2 and df

12


13


To estimate effect size Cramér’s V (or Phi)

Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of categories

14


Quinn Lhota Other

20 (E) 20 (E) 20 (E)

23 (O) 12 (O) 25 (O)

15


χ2 = Σ [(O - E)2 / E] df = # of categories – 1

16


O E (O – E) (O – E)2 (O – E)2 / E

Quinn 23 20 3 9 0.45

Lhota 12 20 -8 64 3.20

Other 25 20 5 25 1.25

Total 60 60 0 98 4.90

17


χ2 = Σ [(O - E)2 / E] χ2 = 4.90, df = 2 p = .09 ∴ Retain the null hypothesis and conclude that the slight preferences observed here are not statistically significant

18


To estimate effect size Cramer’s V (or Phi)

Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(4.90 / 60(3 – 1)) = 0.20

19

Dataframe in R (Election) Voter.ID Candidate Gender

1 Quinn M

2 Quinn F

3 Other F

4 Lhota M

5 Other M

…. … …

20

Chi-square goodness of fit in R

> Observed <-- table(Election$Candidate) > chisq.test(Observed)

21

Chi-square goodness of fit in R

22

Segment summary

•  Chi-square tests are used when outcome and predictor variables are all categorical

•  Chi-square goodness of fit is an NHST •  Cramér’s V estimates effect size

23

END SEGMENT

Lecture 19 ~ Segment 2

Chi-square test of independence


•  Determines whether there is a relationship between two categorical variables

–  In election polls, is there a relationship between voter gender and preference among candidates?

26


•  New York City mayoral election – Assume a small poll was conducted (N=200) – More males than females (n = 140, n = 60) – Do you intend to vote for: •  Christine Quinn •  Joseph Lhota •  Other

27


Quinn Lhota Other

Female 40 10 10

Male 90 40 10

28


•  Null hypothesis – There is no relationship between voter gender

and voter preference •  Alternative hypothesis – There is a relationship between voter gender and

voter preference

29


χ2 = Σ [(O - E)2 / E] df = (# of rows - 1) * (# of columns - 1) p-value depends on χ2 and df

30


31



Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of rows or # of categories (whichever is less)

32


•  Compute the expected frequencies – The proportion of male and female voters for

each candidate should be the same as the overall voter preference rates

33


•  Compute the expected frequencies E = (R/N)*C E: Expected frequency R: # of entries in the cell’s row N: total # of entries C: # of entries in the cell’s column

34


Quinn Lhota Other Sum (R)

Female 40 10 10 60

Male 90 40 10 140

Sum (C) 130 50 20 200

35


Quinn Lhota Other Sum (R)

Female (60/200)*130 39

(60/200)*50 15

(60/200)*20 6

60

Male (140/200)*130 91

(140/200)*50 35

(140/200)*20 14

140

Sum (C) 130 50 20 200

36

E = (R/N)*C


O E (O – E) (O – E)2 (O – E)2 / E

F / Quinn 40 39 1 1 0.03

F / Lhota 10 15 -‐5 25 1.67

F / Other 10 6 4 16 2.67

M / Quinn 90 91 1 1 0.01

M / Lhota 40 35 5 25 0.71

M / Other 10 14 -4 16 1.14

Sum 200 200 0 84 6.23

37

Chi-square test of independence χ2 = Σ [(O - E)2 / E] χ2 = 6.23, df = 2 p = .04 ∴ Reject the null hypothesis and conclude that the there is a significant relationship between gender of the defendant and verdict 38



Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(6.23 / 200(2 – 1)) = .18

39

Dataframe in R (Election) Voter.ID Candidate Gender

1 Quinn M

2 Quinn F

3 Other F

4 Lhota M

5 Other M

…. … …

40

Chi-square test in R

> Observed = table(Election$Candidate, Election$Gender) > chisq.test(Observed)

41

Chi-square test in R

42

Assumptions

•  Adequate expected cell counts – A common rule is 5 or more in all cells of a 2-

by-2 table, and 5 or more in 80% of cells in larger tables, and no cells with zero.

– When this assumption is not met, Fisher’s exact test, a non-parametric test, is recommended.

43

Assumptions

•  Independence – The observations are assumed to be independent of

each other. – This means chi-squared cannot be used to test

correlated data (like matched pairs or panel data). –  In such cases McNemar’s test of dependent

proportions is recommended.

44

Segment summary

•  Chi-square tests are used when outcome and predictor variables are all categorical

•  Chi-square test of independence is an NHST •  Cramér’s V estimates effect size •  Assumptions – Adequate expected cell counts –  Independence

45

END SEGMENT

END LECTURE 19

Lecture slides stats1.13.l19.air

Education

Transcript of Lecture slides stats1.13.l19.air