More Contingency Tables & Paired Categorical Data Lecture 8.
-
Upload
vivien-stokes -
Category
Documents
-
view
217 -
download
0
Transcript of More Contingency Tables & Paired Categorical Data Lecture 8.
More Contingency Tables &Paired Categorical Data
Lecture 8
A Larger Contingency Table
A 4-by-2 contingency table.
(Made-up data filled into empty cells from last class.)
Exercise Level Cold/Flu No Cold/Flu Total
No Exercise* 79 138 217
Light Exercise 96 126 222
Mod. Exercise 30 43 73
Heavy Exercise* 115 123 238
Totals (Marginal) 320 430 750
Estimated Distributions
The Conditional Distributions are the distributions of the response within each level of the predictor.For example, No Exercise: 79/217=.364 experienced cold/flu
138/217=.636 didn’t Light Exercise: 96/222=.432, 126/222=.568 Etc.
The Marginal Distribution is the distribution of the responses if we ignore information about the predictor.Colds/flu: 320/750 = .427 No cold/flu: 430/750 = .573
To Summarize Distributions in a Table
Exercise Level Cold/Flu No Cold/Flu Total
No Exercise* 79/217 = .364 138/217 = .636
Light Exercise 96/222 = .432 126/222 = .568
Mod. Exercise 30/73 = .411 43/73 = .589
Heavy Exercise* 115/238 = .483 123/238 = .517
Totals (Marginal) 320/750 = .427 430/750 = .573
Expected Values Under the Null
Exercise Level Cold/Flu No Cold/Flu Total
No Exercise* 217*.427 ≈ 92.59 217*.573 ≈ 124.41 217
Light Exercise 222*.427 ≈ 94.72 222*.573 ≈ 127.28 222
Mod. Exercise 73*.427 ≈ 31.15 73*.573 ≈ 41.85 73
Heavy Exercise* 238*.427 ≈ 101.55 238*.573 ≈ 136.45 238
Totals (Marginal) 320 430 750
The approximate values are due to round-off error in the estimated probabilities. Note that we avoided some round-off error by calculating 92.59 directly from the totals as 217*320/750.
Test Statistic and Sampling Distribution
A test of independence of the two variables (Exercise Level and Cold/Flu) will be carried out using a chi-square test statistic with
(r-1)(c-1)=(4-1)(2-1)=3 degrees of freedom.
The test statistic is calculated as
69.6 45.136
)45.136123(
55.101
)55.101115(
85.41
)85.4143(
15.31
)15.3130(
28.127
)28.127126(
72.94
)72.9496(
41.124
)41.124138(
59.92
)59.9279(
2222
22222
Hypothesis Test
AssumptionsRandom Independent SampleGroups collected independently“Large Sample”
Hypotheses
H0: conditional distributions equal
HA: conditional distributions not all equalTest StatisticChi-square = 6.69 compared to chi-square dist’n with 3 d.f.
Hypothesis Test, cont.
P-value/Rejection RegionCritical Values are 7.815 for.05 significance, 7.407 for .06, 7.060 for .07 and 6.251 for .10.Since 6.69 < 7.815, we fail to reject at the 0.05 level. The p-value is between .07 and .10.
ConclusionAt the type 1 error rate of .05, we fail to reject the null hypothesis. There is not enough evidence to say that the probability of whether or not someone gets a cold depends on the exercise level.
Matched Categorical Data
Data may be matched/paired with respect to the risk factor or the responseMatching on risk factor (not directly discussed in text) Differences of proportions, relative risks, and odds ratios
are all appropriate. The formulas and the set-up of the contingency table will be different. We will focus on odds ratios, which will be calculated in the same way as for the matched case-control study.
Matching on response (Matched Case-Control Study) Only the odds ratio is an appropriate measure of the
association between the risk factor and the response.
In both cases, inference focuses on the pair.
A Matched Case-Control Study on CAD
Each of 59 adults with Coronary Artery Disease (CAD) were matched with an adult who did not develop CAD but was of the same gender, age, ethnicity, and socio-economic status.
Of interest was whether drinking 2 or more glasses of red wine (on average) per week was associated with development of CAD.
Table for Matched Case-Control Data
Do NOT use the standard contingency table that summarizes information about the individual subjects.
Instead, use the following table to summarize information about the pairs.
Cases (CAD)
>= 2 < 2
Controls >= 2 15 14
< 2 10 20
Physician Adherence Study- Matching on Predictor
Suppose that investigators were interested in whether a particular educational intervention had an effect on whether physicians prescribe a particular treatment plan for their asthma patients. 75 physicians are rated on whether they prescribe the treatment plan both before and after the educational intervention.
Before
Yes No
After Yes 22 25
No 12 16
Estimation and Inference for Matched Categorical Data
CANNOT use formulas for CI of odds ratio given before because the two groups of subjects (whether “exposure” groups or case/control groups) are not chosen independently.
Inferences will be based on the discordant pairs, that is, the pairs in which the members “disagree” on the predictor variable for case-control studies on the response variable when subjects are matched
with respect to the predictor
Labeling Cell (Pair) Counts & Estimation of Odds Ratio
Odds ratio is estimated as R/SInterpretation: The odds that a person in group 2 is
“exposed” is R/S times the odds that a group 1 member is “exposed.”
Or: The odds that an “exposed” person is in group 2 is R/S times the odds that an “unexposed” person is in group
Group 1
Yes No
Group 2 Yes R
No S
CI for Odds Ratio
The 95% confidence interval for the (natural) log of the odds ratio is
SROR
1196.1)ln(
CAD Example – Odds Ratio
There are more pairs in which a case drinks less than 2 and a control drinks more than 2 than pairs in which a case drinks more than 2 and a control drinks less than 2. Thus, >=2 has a “protective effect. ”The odds ratio is 14/10=1.4 The odds of someone who has at least two drinks per week not developing
CAD is 1.4 times the odds of someone
The odds of developing CAD for those who drink less than two drinks per week are 1.4 times the odds for someone who drinks more than 2 drinks per week.
Cases (CAD)
>= 2 < 2
Controls >= 2 15 14
< 2 10 20
CAD Eg. – CI for Odds Ratio
The 95% CI for the log of the OR islog(1.4) +/- 1.96*sqrt(1/14 + 1/10)
= (-.475, 1.148)95% CI for OR is (.622, 3.152)With 95% confidence, the odds of developing CAD for those who drink less than two drinks per week are between .622 and 3.152 times the odds for someone who drinks more than 2 drinks per week.This interval includes 1, therefore, the effect of drinking at least two drinks per week is not a significant effect!However, the interval is very wide, so…
Physician Intervention: Odds Ratio
Note that there are more pairs in which the physician prescribes the treatment plan after the intervention but not before than in which the physician prescribes the treatment plan before but not after.
The odds ratio is calculated as 25/12=2.083
The odds that a physician will prescribe the treatment plan after the intervention are 2.083 times the odds that a physician will prescribe it before the intervention.
Before
Yes No
After Yes 22 25
No 12 16
Physicians – CI for Odds Ratio
The 95% CI for log of the odds ratio is ln(2.083) +/- 1.96*sqrt(1/25 + 1/12)
= (.046, 1.422)
The 95% CI for the odds ratio is
(1.047, 4.145)
There is a significant effect of the intervention since 1 is not included in the interval.
Hypothesis Testing in Matched Designs
Again, the test involves comparing the discordant pairs.
In particular, if the predictor and response are independent, one would expect the population proportion of each type of discordant pairs to be equal. If there is inequality in the sample, is it possible that the inequality is just due to chance?
Hypothesis Test – The StepsAssumptions Random, independent selection of pairs Large Sample (R+S > 10)
HypothesesH0: Predictor and Response are independent variables
HA: Predictor and Response are associated
Test Statistic
With Yates’ continuity correction,
P-value: Compare to chi-square dist’n with 1 d.f.Conclusion: per usual
SR
SR
2
2 )(
SR
SR
2
2 )1(
CAD – Hypothesis Test
Assumptions Random, independent selection of pairs Large Sample (R+S=24 > 10)
Hypotheses
H0: Drinking and CAD are independent variables
HA: Drinking and CAD are associatedTest Statistic (14-10)2/(14+10) = 16/24 = .667P-value: Table A5.7: p-value is between .4386 and .4028. Conclusion: Insufficient evidence to reject the null that says that Drinking is not associated with CAD.
Physician – Hypothesis TestAssumptions Random, independent selection of pairs Large Sample (R+S = 37 > 10)
HypothesesH0: Participation in intervention and prescription of treatment plan are independent variables
HA: Participation in intervention and prescription of treatment plan are associated
Test Statistic (25-12)2/(25+12) = 4.568
P-value: between .0339 and .0320.
Conclusion: At the 0.05 significance level, reject the null in favor of the alternative that the intervention does have an effect on whether physicians prescribe the treatment plan.
Homework
Textbook Reading Chapter 29, first two sections Repeat Chapter 9 (has info about OR for paired
case-control studies) (Last time: Chapter 8, Chapter 26) When doing calculations for this class, you may
ignore the Yates’ continuity correction.
Homework Problems