Lecture slides stats1.13.l19.air
description
Transcript of Lecture slides stats1.13.l19.air
Statistics One
Lecture 19 Chi-square tests
Two segments
• Chi-square goodness of fit • Chi-square test of independence
2
Lecture 19 ~ Segment 1
Chi-square goodness of fit
Chi-square tests
• All of the analyses covered thus far in the course have assumed that the outcome variable is a normally distributed continuous variable • Interval variable • Ratio variable
4
Chi-square tests
• What if the outcome variables is categorical? – For example, nominal variables • Diagnosis (positive, negative) • Verdict (guilty, innocent) • Vote (candidate A, candidate B, candidate C)
5
Chi-square tests
• Chi-square goodness of fit statistic • Chi-square test of independence
• Both can be used in either experimental or correlational research
6
Chi-square tests
• Chi-square goodness of fit statistic – Determines how well a distribution of
proportions “fits” an expected distribution
– In election polls, is there a statistically significant difference in voter preference among candidates?
7
Chi-square tests
• Chi-square test of independence – Determines whether there is a relationship
between two categorical variables
– In election polls, is there a relationship between voter gender and preference among candidates?
8
Chi-square goodness of fit
• New York City mayoral election – Assume a small poll was conducted (N=60) – Do you intend to vote for: • Christine Quinn • Joseph Lhota • Other
9
Chi-square goodness of fit
Quinn Lhota Other
23 12 25
10
Chi-square goodness of fit
• Null hypothesis – Equal proportions
• Alternative hypothesis – Unequal proportions
11
Chi-square goodness of fit
χ2 = Σ [(O - E)2 / E] O = Observed E = Expected df = # of categories – 1 p-value depends on χ2 and df
12
Chi-square goodness of fit
13
Chi-square goodness of fit
To estimate effect size Cramér’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of categories
14
Chi-square goodness of fit
Quinn Lhota Other
20 (E) 20 (E) 20 (E)
23 (O) 12 (O) 25 (O)
15
Chi-square goodness of fit
χ2 = Σ [(O - E)2 / E] df = # of categories – 1
16
Chi-square goodness of fit
O E (O – E) (O – E)2 (O – E)2 / E
Quinn 23 20 3 9 0.45
Lhota 12 20 -8 64 3.20
Other 25 20 5 25 1.25
Total 60 60 0 98 4.90
17
Chi-square goodness of fit
χ2 = Σ [(O - E)2 / E] χ2 = 4.90, df = 2 p = .09 ∴ Retain the null hypothesis and conclude that the slight preferences observed here are not statistically significant
18
Chi-square goodness of fit
To estimate effect size Cramer’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(4.90 / 60(3 – 1)) = 0.20
19
Dataframe in R (Election) Voter.ID Candidate Gender
1 Quinn M
2 Quinn F
3 Other F
4 Lhota M
5 Other M
…. … …
20
Chi-square goodness of fit in R
> Observed <-- table(Election$Candidate) > chisq.test(Observed)
21
Chi-square goodness of fit in R
22
Segment summary
• Chi-square tests are used when outcome and predictor variables are all categorical
• Chi-square goodness of fit is an NHST • Cramér’s V estimates effect size
23
END SEGMENT
Lecture 19 ~ Segment 2
Chi-square test of independence
Chi-square test of independence
• Determines whether there is a relationship between two categorical variables
– In election polls, is there a relationship between voter gender and preference among candidates?
26
Chi-square test of independence
• New York City mayoral election – Assume a small poll was conducted (N=200) – More males than females (n = 140, n = 60) – Do you intend to vote for: • Christine Quinn • Joseph Lhota • Other
27
Chi-square test of independence
Quinn Lhota Other
Female 40 10 10
Male 90 40 10
28
Chi-square test of independence
• Null hypothesis – There is no relationship between voter gender
and voter preference • Alternative hypothesis – There is a relationship between voter gender and
voter preference
29
Chi-square test of independence
χ2 = Σ [(O - E)2 / E] df = (# of rows - 1) * (# of columns - 1) p-value depends on χ2 and df
30
Chi-square test of independence
31
Chi-square test of independence
To estimate effect size Cramér’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) N = sample size k = # of rows or # of categories (whichever is less)
32
Chi-square test of independence
• Compute the expected frequencies – The proportion of male and female voters for
each candidate should be the same as the overall voter preference rates
33
Chi-square test of independence
• Compute the expected frequencies E = (R/N)*C E: Expected frequency R: # of entries in the cell’s row N: total # of entries C: # of entries in the cell’s column
34
Chi-square test of independence
Quinn Lhota Other Sum (R)
Female 40 10 10 60
Male 90 40 10 140
Sum (C) 130 50 20 200
35
Chi-square test of independence
Quinn Lhota Other Sum (R)
Female (60/200)*130 39
(60/200)*50 15
(60/200)*20 6
60
Male (140/200)*130 91
(140/200)*50 35
(140/200)*20 14
140
Sum (C) 130 50 20 200
36
E = (R/N)*C
Chi-square test of independence
O E (O – E) (O – E)2 (O – E)2 / E
F / Quinn 40 39 1 1 0.03
F / Lhota 10 15 -‐5 25 1.67
F / Other 10 6 4 16 2.67
M / Quinn 90 91 1 1 0.01
M / Lhota 40 35 5 25 0.71
M / Other 10 14 -4 16 1.14
Sum 200 200 0 84 6.23
37
Chi-square test of independence χ2 = Σ [(O - E)2 / E] χ2 = 6.23, df = 2 p = .04 ∴ Reject the null hypothesis and conclude that the there is a significant relationship between gender of the defendant and verdict 38
Chi-square test of independence
To estimate effect size Cramér’s V (or Phi)
Φc = SQRT(χ2 / N(k – 1)) Φc = SQRT(6.23 / 200(2 – 1)) = .18
39
Dataframe in R (Election) Voter.ID Candidate Gender
1 Quinn M
2 Quinn F
3 Other F
4 Lhota M
5 Other M
…. … …
40
Chi-square test in R
> Observed = table(Election$Candidate, Election$Gender) > chisq.test(Observed)
41
Chi-square test in R
42
Assumptions
• Adequate expected cell counts – A common rule is 5 or more in all cells of a 2-
by-2 table, and 5 or more in 80% of cells in larger tables, and no cells with zero.
– When this assumption is not met, Fisher’s exact test, a non-parametric test, is recommended.
43
Assumptions
• Independence – The observations are assumed to be independent of
each other. – This means chi-squared cannot be used to test
correlated data (like matched pairs or panel data). – In such cases McNemar’s test of dependent
proportions is recommended.
44
Segment summary
• Chi-square tests are used when outcome and predictor variables are all categorical
• Chi-square test of independence is an NHST • Cramér’s V estimates effect size • Assumptions – Adequate expected cell counts – Independence
45
END SEGMENT
END LECTURE 19