More Contingency Tables & Paired Categorical Data Lecture 8.

24
More Contingency Tables & Paired Categorical Data Lecture 8

Transcript of More Contingency Tables & Paired Categorical Data Lecture 8.

Page 1: More Contingency Tables & Paired Categorical Data Lecture 8.

More Contingency Tables &Paired Categorical Data

Lecture 8

Page 2: More Contingency Tables & Paired Categorical Data Lecture 8.

A Larger Contingency Table

A 4-by-2 contingency table.

(Made-up data filled into empty cells from last class.)

Exercise Level Cold/Flu No Cold/Flu Total

No Exercise* 79 138 217

Light Exercise 96 126 222

Mod. Exercise 30 43 73

Heavy Exercise* 115 123 238

Totals (Marginal) 320 430 750

Page 3: More Contingency Tables & Paired Categorical Data Lecture 8.

Estimated Distributions

The Conditional Distributions are the distributions of the response within each level of the predictor.For example, No Exercise: 79/217=.364 experienced cold/flu

138/217=.636 didn’t Light Exercise: 96/222=.432, 126/222=.568 Etc.

The Marginal Distribution is the distribution of the responses if we ignore information about the predictor.Colds/flu: 320/750 = .427 No cold/flu: 430/750 = .573

Page 4: More Contingency Tables & Paired Categorical Data Lecture 8.

To Summarize Distributions in a Table

Exercise Level Cold/Flu No Cold/Flu Total

No Exercise* 79/217 = .364 138/217 = .636

Light Exercise 96/222 = .432 126/222 = .568

Mod. Exercise 30/73 = .411 43/73 = .589

Heavy Exercise* 115/238 = .483 123/238 = .517

Totals (Marginal) 320/750 = .427 430/750 = .573

Page 5: More Contingency Tables & Paired Categorical Data Lecture 8.

Expected Values Under the Null

Exercise Level Cold/Flu No Cold/Flu Total

No Exercise* 217*.427 ≈ 92.59 217*.573 ≈ 124.41 217

Light Exercise 222*.427 ≈ 94.72 222*.573 ≈ 127.28 222

Mod. Exercise 73*.427 ≈ 31.15 73*.573 ≈ 41.85 73

Heavy Exercise* 238*.427 ≈ 101.55 238*.573 ≈ 136.45 238

Totals (Marginal) 320 430 750

The approximate values are due to round-off error in the estimated probabilities. Note that we avoided some round-off error by calculating 92.59 directly from the totals as 217*320/750.

Page 6: More Contingency Tables & Paired Categorical Data Lecture 8.

Test Statistic and Sampling Distribution

A test of independence of the two variables (Exercise Level and Cold/Flu) will be carried out using a chi-square test statistic with

(r-1)(c-1)=(4-1)(2-1)=3 degrees of freedom.

The test statistic is calculated as

69.6 45.136

)45.136123(

55.101

)55.101115(

85.41

)85.4143(

15.31

)15.3130(

28.127

)28.127126(

72.94

)72.9496(

41.124

)41.124138(

59.92

)59.9279(

2222

22222

Page 7: More Contingency Tables & Paired Categorical Data Lecture 8.

Hypothesis Test

AssumptionsRandom Independent SampleGroups collected independently“Large Sample”

Hypotheses

H0: conditional distributions equal

HA: conditional distributions not all equalTest StatisticChi-square = 6.69 compared to chi-square dist’n with 3 d.f.

Page 8: More Contingency Tables & Paired Categorical Data Lecture 8.

Hypothesis Test, cont.

P-value/Rejection RegionCritical Values are 7.815 for.05 significance, 7.407 for .06, 7.060 for .07 and 6.251 for .10.Since 6.69 < 7.815, we fail to reject at the 0.05 level. The p-value is between .07 and .10.

ConclusionAt the type 1 error rate of .05, we fail to reject the null hypothesis. There is not enough evidence to say that the probability of whether or not someone gets a cold depends on the exercise level.

Page 9: More Contingency Tables & Paired Categorical Data Lecture 8.

Matched Categorical Data

Data may be matched/paired with respect to the risk factor or the responseMatching on risk factor (not directly discussed in text) Differences of proportions, relative risks, and odds ratios

are all appropriate. The formulas and the set-up of the contingency table will be different. We will focus on odds ratios, which will be calculated in the same way as for the matched case-control study.

Matching on response (Matched Case-Control Study) Only the odds ratio is an appropriate measure of the

association between the risk factor and the response.

In both cases, inference focuses on the pair.

Page 10: More Contingency Tables & Paired Categorical Data Lecture 8.

A Matched Case-Control Study on CAD

Each of 59 adults with Coronary Artery Disease (CAD) were matched with an adult who did not develop CAD but was of the same gender, age, ethnicity, and socio-economic status.

Of interest was whether drinking 2 or more glasses of red wine (on average) per week was associated with development of CAD.

Page 11: More Contingency Tables & Paired Categorical Data Lecture 8.

Table for Matched Case-Control Data

Do NOT use the standard contingency table that summarizes information about the individual subjects.

Instead, use the following table to summarize information about the pairs.

Cases (CAD)

>= 2 < 2

Controls >= 2 15 14

< 2 10 20

Page 12: More Contingency Tables & Paired Categorical Data Lecture 8.

Physician Adherence Study- Matching on Predictor

Suppose that investigators were interested in whether a particular educational intervention had an effect on whether physicians prescribe a particular treatment plan for their asthma patients. 75 physicians are rated on whether they prescribe the treatment plan both before and after the educational intervention.

Before

Yes No

After Yes 22 25

No 12 16

Page 13: More Contingency Tables & Paired Categorical Data Lecture 8.

Estimation and Inference for Matched Categorical Data

CANNOT use formulas for CI of odds ratio given before because the two groups of subjects (whether “exposure” groups or case/control groups) are not chosen independently.

Inferences will be based on the discordant pairs, that is, the pairs in which the members “disagree” on the predictor variable for case-control studies on the response variable when subjects are matched

with respect to the predictor

Page 14: More Contingency Tables & Paired Categorical Data Lecture 8.

Labeling Cell (Pair) Counts & Estimation of Odds Ratio

Odds ratio is estimated as R/SInterpretation: The odds that a person in group 2 is

“exposed” is R/S times the odds that a group 1 member is “exposed.”

Or: The odds that an “exposed” person is in group 2 is R/S times the odds that an “unexposed” person is in group

Group 1

Yes No

Group 2 Yes R

No S

Page 15: More Contingency Tables & Paired Categorical Data Lecture 8.

CI for Odds Ratio

The 95% confidence interval for the (natural) log of the odds ratio is

SROR

1196.1)ln(

Page 16: More Contingency Tables & Paired Categorical Data Lecture 8.

CAD Example – Odds Ratio

There are more pairs in which a case drinks less than 2 and a control drinks more than 2 than pairs in which a case drinks more than 2 and a control drinks less than 2. Thus, >=2 has a “protective effect. ”The odds ratio is 14/10=1.4 The odds of someone who has at least two drinks per week not developing

CAD is 1.4 times the odds of someone

The odds of developing CAD for those who drink less than two drinks per week are 1.4 times the odds for someone who drinks more than 2 drinks per week.

Cases (CAD)

>= 2 < 2

Controls >= 2 15 14

< 2 10 20

Page 17: More Contingency Tables & Paired Categorical Data Lecture 8.

CAD Eg. – CI for Odds Ratio

The 95% CI for the log of the OR islog(1.4) +/- 1.96*sqrt(1/14 + 1/10)

= (-.475, 1.148)95% CI for OR is (.622, 3.152)With 95% confidence, the odds of developing CAD for those who drink less than two drinks per week are between .622 and 3.152 times the odds for someone who drinks more than 2 drinks per week.This interval includes 1, therefore, the effect of drinking at least two drinks per week is not a significant effect!However, the interval is very wide, so…

Page 18: More Contingency Tables & Paired Categorical Data Lecture 8.

Physician Intervention: Odds Ratio

Note that there are more pairs in which the physician prescribes the treatment plan after the intervention but not before than in which the physician prescribes the treatment plan before but not after.

The odds ratio is calculated as 25/12=2.083

The odds that a physician will prescribe the treatment plan after the intervention are 2.083 times the odds that a physician will prescribe it before the intervention.

Before

Yes No

After Yes 22 25

No 12 16

Page 19: More Contingency Tables & Paired Categorical Data Lecture 8.

Physicians – CI for Odds Ratio

The 95% CI for log of the odds ratio is ln(2.083) +/- 1.96*sqrt(1/25 + 1/12)

= (.046, 1.422)

The 95% CI for the odds ratio is

(1.047, 4.145)

There is a significant effect of the intervention since 1 is not included in the interval.

Page 20: More Contingency Tables & Paired Categorical Data Lecture 8.

Hypothesis Testing in Matched Designs

Again, the test involves comparing the discordant pairs.

In particular, if the predictor and response are independent, one would expect the population proportion of each type of discordant pairs to be equal. If there is inequality in the sample, is it possible that the inequality is just due to chance?

Page 21: More Contingency Tables & Paired Categorical Data Lecture 8.

Hypothesis Test – The StepsAssumptions Random, independent selection of pairs Large Sample (R+S > 10)

HypothesesH0: Predictor and Response are independent variables

HA: Predictor and Response are associated

Test Statistic

With Yates’ continuity correction,

P-value: Compare to chi-square dist’n with 1 d.f.Conclusion: per usual

SR

SR

2

2 )(

SR

SR

2

2 )1(

Page 22: More Contingency Tables & Paired Categorical Data Lecture 8.

CAD – Hypothesis Test

Assumptions Random, independent selection of pairs Large Sample (R+S=24 > 10)

Hypotheses

H0: Drinking and CAD are independent variables

HA: Drinking and CAD are associatedTest Statistic (14-10)2/(14+10) = 16/24 = .667P-value: Table A5.7: p-value is between .4386 and .4028. Conclusion: Insufficient evidence to reject the null that says that Drinking is not associated with CAD.

Page 23: More Contingency Tables & Paired Categorical Data Lecture 8.

Physician – Hypothesis TestAssumptions Random, independent selection of pairs Large Sample (R+S = 37 > 10)

HypothesesH0: Participation in intervention and prescription of treatment plan are independent variables

HA: Participation in intervention and prescription of treatment plan are associated

Test Statistic (25-12)2/(25+12) = 4.568

P-value: between .0339 and .0320.

Conclusion: At the 0.05 significance level, reject the null in favor of the alternative that the intervention does have an effect on whether physicians prescribe the treatment plan.

Page 24: More Contingency Tables & Paired Categorical Data Lecture 8.

Homework

Textbook Reading Chapter 29, first two sections Repeat Chapter 9 (has info about OR for paired

case-control studies) (Last time: Chapter 8, Chapter 26) When doing calculations for this class, you may

ignore the Yates’ continuity correction.

Homework Problems