Lecture 6 Comparison of logistic regression and stratified analyses

47
Biost 536 Thompson Part 2 1 Lecture 6 Comparison of logistic regression and stratified analyses Sm oking frequency -cigarettesperday Blood pressure Behaviour Type 0 1-20 21-30 >30 ≥140m m A 29/184 21/97 7/52 12/55 ≥140m m B 8/179 9/71 3/34 7/21 < 140m m A 41/600 24/301 27/167 17/133 <140m m B 20/689 16/336 13/152 3/83 . cs chd type [freq=count], or by(smoke) smoke | OR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- 0 | 2.941176 1.882394 4.594855 12.10169 (Cornfield) 1 | 1.947875 1.174752 3.229047 10.96273 (Cornfield) 2 | 1.952703 1.047788 3.636268 7.308642 (Cornfield) 3 | 1.714465 .8097201 3.6227 5.445205 (Cornfield) -----------------+------------------------------------------------- Crude | 2.372929 1.804034 3.121147 M-H combined | 2.248977 1.705996 2.964777 -----------------+------------------------------------------------- Test of homogeneity (M-H) chi2(3) = 2.354 Pr>chi2 = 0.5022 Test that combined OR = 1: Mantel-Haenszel chi2(1) = 34.70 Pr>chi2 = 0.0000

description

Lecture 6 Comparison of logistic regression and stratified analyses. . lincom _Itype_1+ _ItypXsmo_1_1 ( 1) _Itype_1 + _ItypXsmo_1_1 = 0 ------------------------------------------------------------------------------ - PowerPoint PPT Presentation

Transcript of Lecture 6 Comparison of logistic regression and stratified analyses

Page 1: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 1

Lecture 6Comparison of logistic regression

and stratified analyses

Smoking frequency - cigarettes per day Blood pressure

Behaviour Type

0 1-20 21-30 >30

≥140mm A 29/184 21/97 7/52 12/55 ≥140mm B 8/179 9/71 3/34 7/21 < 140mm A 41/600 24/301 27/167 17/133 <140mm B 20/689 16/336 13/152 3/83

. cs chd type [freq=count], or by(smoke) smoke | OR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- 0 | 2.941176 1.882394 4.594855 12.10169 (Cornfield) 1 | 1.947875 1.174752 3.229047 10.96273 (Cornfield) 2 | 1.952703 1.047788 3.636268 7.308642 (Cornfield) 3 | 1.714465 .8097201 3.6227 5.445205 (Cornfield) -----------------+------------------------------------------------- Crude | 2.372929 1.804034 3.121147 M-H combined | 2.248977 1.705996 2.964777 -----------------+------------------------------------------------- Test of homogeneity (M-H) chi2(3) = 2.354 Pr>chi2 = 0.5022 Test that combined OR = 1: Mantel-Haenszel chi2(1) = 34.70 Pr>chi2 = 0.0000

Page 2: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 2

. logistic chd type [freq=count] Logit estimates Number of obs = 3154 LR chi2(1) = 40.90 Prob > chi2 = 0.0000 Log likelihood = -870.17212 Pseudo R2 = 0.0230 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- type | 2.372929 .3326993 6.163 0.000 1.802773 3.123406 ------------------------------------------------------------------------------

. xi: logistic chd i.type*i.smoke [freq=count] i.type Itype_0-1 (naturally coded; Itype_0 omitted) i.smoke Ismoke_0-3 (naturally coded; Ismoke_0 omitted) i.type*i.smoke ItXs_#-# (coded as above) Logit estimates Number of obs = 3154 LR chi2(7) = 67.55 Prob > chi2 = 0.0000 Log likelihood = -856.84711 Pseudo R2 = 0.0379 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Itype_1 | 2.941176 .6744921 4.704 0.000 1.876365 4.610254 Ismoke_1 | 1.963351 .5536599 2.392 0.017 1.129695 3.412201 Ismoke_2 | 2.823529 .9161781 3.199 0.001 1.49484 5.333226 Ismoke_3 | 3.191489 1.225894 3.021 0.003 1.503265 6.775655 ItXs_1_1 | .6622776 .2296718 -1.188 0.235 .3356226 1.30686 ItXs_1_2 | .6639189 .2620509 -1.038 0.299 .3062973 1.439087 ItXs_1_3 | .5829182 .2632838 -1.195 0.232 .2405191 1.412751 ------------------------------------------------------------------------------

. est store A

Page 3: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 3

. lincom _Itype_1+ _ItypXsmo_1_1

( 1) _Itype_1 + _ItypXsmo_1_1 = 0

------------------------------------------------------------------------------ chd | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- (1) | 1.947875 .5067205 2.56 0.010 1.169848 3.243343------------------------------------------------------------------------------

. lincom _Itype_1+ _ItypXsmo_1_2

( 1) _Itype_1 + _ItypXsmo_1_2 = 0

------------------------------------------------------------------------------ chd | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- (1) | 1.952703 .6272995 2.08 0.037 1.040376 3.665067------------------------------------------------------------------------------

. lincom _Itype_1+ _ItypXsmo_1_3

( 1) _Itype_1 + _ItypXsmo_1_3 = 0

------------------------------------------------------------------------------

chd | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- (1) | 1.714465 .6671239 1.39 0.166 .7996751 3.675732------------------------------------------------------------------------------

Page 4: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 4

. xi: logistic chd type i.smoke [freq=count] i.smoke Ismoke_0-3 (naturally coded; Ismoke_0 omitted) Logit estimates Number of obs = 3154 LR chi2(4) = 65.17 Prob > chi2 = 0.0000 Log likelihood = -858.03461 Pseudo R2 = 0.0366 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- type | 2.259353 .3191108 5.771 0.000 1.713013 2.979942 Ismoke_1 | 1.492975 .2443402 2.449 0.014 1.083291 2.057597 Ismoke_2 | 2.140764 .3956325 4.119 0.000 1.49025 3.075237 Ismoke_3 | 2.170188 .440754 3.815 0.000 1.457548 3.231261 ------------------------------------------------------------------------------

. est store B Compare models with and without interaction:

. lrtest A B Likelihood-ratio test LR chi2(3) = 2.38 (Assumption: B nested in A) Prob > chi2 = 0.4983

What null hypothesis is this LRT assessing?

Page 5: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 5

Some Stata language for recoding variables:Categorical variable “age” coded 1,2,3,4,5,6

. generate agegp=recode(age,2,4,6)

. * All obsns with age <= 2 have agegp=2, all with age >2 and <=4

. * have agegp=4 and all with age > 4 and <=6 have agegp=6

. * Change the coding to 1,2,3

. recode agegp 2=1 4=2 6=3

. table age ----------+----------- Age in | years | Freq. ----------+----------- 25-34 | 116 35-44 | 199 45-54 | 213 55-64 | 242 65-74 | 161 75+ | 44 ----------+-----------

. table agegp ----------+----------- agegp | Freq. ----------+----------- 1 | 315 2 | 455 3 | 205 ----------+-----------

Page 6: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 6

. drop agegp

. gen agegp=recode(age,2,4)

. table agegp-------+----------- agegp | Freq.-------+----------- 2 | 315 4 | 660-------+-----------. * All observations that are not <= a number in the list are given the last value in the list

. drop agegp

. gen agegp=1+(age>2)+(age>4)

. table agegp----------+----------- agegp | Freq.----------+----------- 1 | 315 2 | 455 3 | 205----------+-----------

Page 7: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 7

Effect of linear transformations of covariatesConsider the model: logit(p) = β0 + β1 X1 + β2 X2 + β3 X1 X2 Now, say we convert X1 to another scale (e.g. weight in lbs to kg): X*

1 = a X1 +b, or X1 = (X*1 - b)/a

then logit(p) = β0 + β1 (X

*1 -b)/a + β2 X2 + β3 (X

*1 -b)/a X2

= (β0 - β1 b/a) + (β1 /a)X*

1 + (β2 - β3 b/a) X2 + (β3 /a)X*1 X2

Coefficient of main effect for X1 and the interaction of X1 and X2 change by a factor of (1/a) Intercept and coefficient for X2 change to accommodate change in X1 Overall model fit (deviance) does not change What would happen if the original model was : logit(p) = β0 + β1 X1 + β2 X2 and X1 was converted to X*

1 ?

Page 8: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 8

Dose response modelsConsider the role of alcohol in the esophageal cancer study, with age as a potential

confounder (alcohol consumption 0-39, 40-79, 80-119, 120+ g/day; age 25-34, 35-44,

45-54, 55-64,65-74,75+) .

1. Dummy variable coding

Alcohol consumption (g/day)

Age 0-39 40-79 80-119 120+

25-34 β0 β0 + β1 β0 + β2 β0 + β3

35-44 β0 + β4 β0 + β1 + β4 β0 + β2+ β4 β0 + β3+ β4

45-54 β0 + β5 β0 + β1 + β5 β0 + β2 + β5 β0 + β3 + β5

55-64 β0 + β6 β0 + β1 + β6 β0 + β2 + β6 β0 + β3 + β6

65-74 β0 + β7 β0 + β1 + β7 β0 + β2 + β7 β0 + β3 + β7

75+ β0 + β8 β0 + β1 + β8 β0 + β2 + β8 β0 + β3 + β8

logit(p) = β0 + β1 XAL2 +β2 XAL3+ β3 XAL4+ β4 XA2 + β5 XA3 + β6 XA4 + β7 XA5 + β8 XA6

What is the interpretation of β1,β2, β3 ?How do we state the assumption of no association between alcohol consumption and disease risk in terms of model parameters?What does H0 : β2 =0 mean?

Page 9: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 9

Dose response models

2. Grouped linear dose response

Alcohol coded as XALC = 0 0-39 g/day logit(p) = β0 + β1 XALC + β2 XA2 + β3 XA3 + β4 XA4 + β5 XA5 + β6 XA6

1 40-79 2 80-119 3 120+

Alcohol consumption (g/day)

Age 0-39 40-79 80-119 120+

25-34 β0 β0 + β1 β0 + 2β1 β0 + 3β1

35-44 β0 + β2 β0 + β1 + β2 β0 + 2β1 + β2 β0 + 3β1 + β2

45-54 β0 + β3 β0 + β1 + β3 β0 + 2β1 + β3 β0 + 3β1 + β3

55-64 β0 + β4 β0 + β1 + β4 β0 + 2β1 + β4 β0 + 3β1 + β4

65-74 β0 + β5 β0 + β1 + β5 β0 + 2β1 + β5 β0 + 3β1 + β5

75+ β0 + β6 β0 + β1 + β6 β0 + 2β1 + β6 β0 + 3β1 + β6

What is the interpretation of β1?

How would you put H0 : β1=0 into words?

Page 10: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 10

Dose response modelsComparing dummy variable and grouped linear dose-response

The two models are nested. The dummy variable model is a reparameterization of a model

that adds terms to the grouped linear model.

Hence we can test whether a grouped linear dose-response is adequate by performing the LRT which compares: logit(p) = …. + β1 XAL2 +β2 XAL3+ β3 XAL4 + ….. (dummy variable) with logit(p) = …. + β1 XALC + ….. (grouped linear)

Proof: Note that XALC = XAL2 +2 XAL3+ 3XAL4

Hence: β1 XAL2 +β2 XAL3+ β3 XAL4

= β1 (XAL2 +2 XAL3+ 3XAL4)+(β2 - 2 β1 )XAL3+ (β3 -3 β1) XAL4 = β1 XALC + β*

2 XAL3+ β*3 XAL4

Page 11: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 11

Dose response modelsConsider the following coding in a model

where smoking status (cigs/day) is a risk

factor:

XSMOKE = 1 smoker XDOSE = 0 <10 0 non-smoker 1 11-20

2 21-30 3 31-40 4 41+

logit(p) = β0 + β1 XSMOKE + β2 XDOSE

Smoking status Logit(p)

Non-smoker β0

1-10 β0 + β1

11-20 β0 + β1 + β2

21-30 β0 + β1+ 2β2

31-40 β0 + β1+ 3β2

40+ β0 + β1+ 4β2

What is the interpretation of H0 : β1=0?

What is the interpretation of H0 : β2=0?

Note: the grouped linear model for smoking is nested in this model, comparing the two models provides a test of H0: β1 = β2

Page 12: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 12

Stata analysisIn the esophageal cancer study, consider modeling the dose response for alcohol in two ways: (i) A grouped linear model (ii) A dummy variable model. xi: logistic case i.age alcohol i.age Iage_1-6 (naturally coded; Iage_1 omitted) Logit estimates Number of obs = 975 LR chi2(6) = 255.91 Prob > chi2 = 0.0000 Log likelihood = -366.78769 Pseudo R2 = 0.2586 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Iage_2 | 4.834044 5.198578 1.465 0.143 .5873814 39.78332 Iage_3 | 27.22215 28.08983 3.202 0.001 3.60238 205.7099 Iage_4 | 45.89127 47.16723 3.723 0.000 6.121554 344.0317 Iage_5 | 67.68976 69.97617 4.077 0.000 8.924207 513.4241 Iage_6 | 74.3312 80.53048 3.977 0.000 8.891574 621.3891 alcohol | 2.985376 .3083736 10.588 0.000 2.438228 3.655305 ------------------------------------------------------------------------------ . logit Logit estimates Number of obs = 975 LR chi2(6) = 255.91 Prob > chi2 = 0.0000 Log likelihood = -366.78769 Pseudo R2 = 0.2586 ------------------------------------------------------------------------------ case | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Iage_2 | 1.575683 1.07541 1.465 0.143 -.532081 3.683448 Iage_3 | 3.304031 1.031874 3.202 0.001 1.281595 5.326467 Iage_4 | 3.826275 1.027804 3.723 0.000 1.811816 5.840734 Iage_5 | 4.214935 1.033778 4.077 0.000 2.188767 6.241102 Iage_6 | 4.308531 1.083401 3.977 0.000 2.185104 6.431957 alcohol | 1.093726 .1032947 10.588 0.000 .8912717 1.29618 _cons | -6.983926 1.050337 -6.649 0.000 -9.042548 -4.925304 ------------------------------------------------------------------------------

Page 13: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 13

. * ORs comparing each of the two upper alcohol exposure groups to the lowest exposure group: . lincom 2*alcohol ( 1) 2.0 alcohol = 0.0 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- (1) | 8.912469 1.841222 10.588 0.000 5.944957 13.36126 ------------------------------------------------------------------------------ . lincom 3*alcohol ( 1) 3.0 alcohol = 0.0 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- (1) | 26.60707 8.245112 10.588 0.000 14.49516 48.83948 ------------------------------------------------------------------------------ . est store A

. * What would be the effect of changing the coding for alcohol?

. gen alc0=alcohol-1

. xi: logistic case i.age alc0 i.age Iage_1-6 (naturally coded; Iage_1 omitted) Logit estimates Number of obs = 975 LR chi2(6) = 255.91 Prob > chi2 = 0.0000 Log likelihood = -366.78769 Pseudo R2 = 0.2586 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Iage_2 | 4.834044 5.198578 1.465 0.143 .5873814 39.78332 Iage_3 | 27.22215 28.08983 3.202 0.001 3.60238 205.7099 Iage_4 | 45.89127 47.16723 3.723 0.000 6.121554 344.0317 Iage_5 | 67.68976 69.97617 4.077 0.000 8.924207 513.4241 Iage_6 | 74.3312 80.53048 3.977 0.000 8.891574 621.3891 alc0 | 2.985376 .3083736 10.588 0.000 2.438228 3.655305 ------------------------------------------------------------------------------ . logit ------------------------------------------------------------------------------ case | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Iage_2 | 1.575683 1.07541 1.465 0.143 -.532081 3.683448 Iage_3 | 3.304031 1.031874 3.202 0.001 1.281595 5.326467 Iage_4 | 3.826275 1.027804 3.723 0.000 1.811816 5.840734 Iage_5 | 4.214935 1.033778 4.077 0.000 2.188767 6.241102 Iage_6 | 4.308531 1.083401 3.977 0.000 2.185104 6.431957 alc0 | 1.093726 .1032947 10.588 0.000 .8912717 1.29618 _cons | -5.8902 1.028717 -5.726 0.000 -7.90645 -3.873951 ------------------------------------------------------------------------------

Page 14: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 14

. xi: logistic case i.age i.alcohol i.age Iage_1-6 (naturally coded; Iage_1 omitted) i.alcohol Ialcoh_1-4 (naturally coded; Ialcoh_1 omitted) Logit estimates Number of obs = 975 LR chi2(8) = 262.07 Prob > chi2 = 0.0000 Log likelihood = -363.70808 Pseudo R2 = 0.2649 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Iage_2 | 5.109602 5.518316 1.510 0.131 .6153163 42.43026 Iage_3 | 30.74859 31.9451 3.298 0.001 4.013298 235.5858 Iage_4 | 51.59663 53.38175 3.812 0.000 6.791573 391.9876 Iage_5 | 78.00528 81.22778 4.184 0.000 10.13347 600.4678 Iage_6 | 83.44844 91.07367 4.054 0.000 9.827359 708.5975 Ialcoh_2 | 4.196747 1.027304 5.859 0.000 2.597472 6.780704 Ialcoh_3 | 7.441782 2.065952 7.230 0.000 4.318873 12.82282 Ialcoh_4 | 39.64689 14.92059 9.779 0.000 18.9614 82.8987 ------------------------------------------------------------------------------ . logit Logit estimates Number of obs = 975 LR chi2(8) = 262.07 Prob > chi2 = 0.0000 Log likelihood = -363.70808 Pseudo R2 = 0.2649 ------------------------------------------------------------------------------ case | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Iage_2 | 1.631121 1.079989 1.510 0.131 -.4856189 3.747862 Iage_3 | 3.425844 1.038912 3.298 0.001 1.389613 5.462075 Iage_4 | 3.943456 1.034598 3.812 0.000 1.915683 5.97123 Iage_5 | 4.356777 1.041311 4.184 0.000 2.315844 6.397709 Iage_6 | 4.424229 1.091377 4.054 0.000 2.28517 6.563288 Ialcoh_2 | 1.43431 .2447857 5.859 0.000 .9545385 1.914081 Ialcoh_3 | 2.00711 .2776153 7.230 0.000 1.462994 2.551226 Ialcoh_4 | 3.680012 .376337 9.779 0.000 2.942405 4.417619 _cons | -6.147191 1.041853 -5.900 0.000 -8.189185 -4.105198 ------------------------------------------------------------------------------

.est store B

Page 15: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 15

. xi: logistic case i.age i.age Iage_1-6 (naturally coded; Iage_1 omitted) Logit estimates Number of obs = 975 LR chi2(5) = 121.04 Prob > chi2 = 0.0000 Log likelihood = -434.22195 Pseudo R2 = 0.1223 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Iage_2 | 5.447368 5.777946 1.598 0.110 .6812858 43.55562 Iage_3 | 31.67665 32.24812 3.394 0.001 4.307063 232.9685 Iage_4 | 52.6506 53.37904 3.910 0.000 7.218137 384.0445 Iage_5 | 59.66981 60.74305 4.017 0.000 8.114154 438.7995 Iage_6 | 48.22581 50.98864 3.666 0.000 6.071737 383.0417 ------------------------------------------------------------------------------

. est store C

Test that grouped linear OR=1 . lrtest A C Likelihood-ratio test LR chi2(1) = 134.87 (Assumption: C nested in A) Prob > chi2 = 0.0000

Fit a model without alcohol:

Page 16: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 16

Test significance of dummy variable model ORs

. lrtest B C Likelihood-ratio test LR chi2(3) = 141.03 (Assumption: C nested in B) Prob > chi2 = 0.0000

Compare dummy variable and grouped linear models

. lrtest B A Likelihood-ratio test LR chi2(2) = 6.16 (Assumption: A nested in B) Prob > chi2 = 0.0460

Create plots of the fitted values for grouped linear and dummy variable models:. quietly xi: logistic case i.age alcohol . predict lpg, xb twoway (scatter lpg alcohol, msymbol(point) msize(vtiny) mlabel(age) mlabposition(0)), ylabel(-6(1)2) xlabel(, valuelabel) scheme(s1mono) title("Grouped linear model") . quietly xi: logistic case i.age i.alcohol . predict lpd, xb . twoway (scatter lpd alcohol, msymbol(point) msize(vtiny) mlabel(age) mlabposition(0)), ylabel(-6(1)2) xlabel(, valuelabel) scheme(s1mono) title("Dummy variable model")

Page 17: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 17

1

1

1111

1

11111111

11

1

1

1

1111

1

11111

11

1

1

11

1

11111

1

1

1

1

11

11

11

11

11

1

1

1

1

11111

111

11

1

1111111

1

1

111

1

11

1

1

1

11

1

1

1

1

11

11

11

11

1

1

1

1

11

1

1

1

1

11

1

2

2

2222

22

22

22

22

2

2

2

2

2

22

2

22222

2

2

2

2

2

2222

2

22

22

2

2

22

222

2

2

2222

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2222

2

2

22

22222

2

2

222

2222

2

2

2

22

2

2

2

2

2

22222

222

22

2

22

222

2

22

2

2

2

22222

2

2

2

2

222

22

2

2

222222

2

2

2

2

2

2

2

22

2

2

2

2

222

222

2

2

2

2

2

2

2

2

2

22

22

2

22

222222

22

2

2

3

3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

333

3

3

3

3

3

3

3

333

3

3

33

3

3

3

3

33

33

3

3

333333

3

3

3

3

3

3

3

3

33

33333

33

3

3

3

3

3

3

3333

3

33

3

3

3

3

3

333

3

3

3333

3

33

3

3

3

33

33

33

3

3

3

333

3

3

333

3

3

3333

3

33

3

3

333

3333333

3333

33

33333

3

3

3

3

3

33333

3

33

3

3

3

33

3

33

3

3

333

3

3

3

3

3

3

33

3

33

333

33

3

3333

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

3

444

4

4

4

4

44

4

4

4

44

4

4

444

444

4

4

4

4

444

4

4

4444

4

4

4

44

44

4

4

44

44

4

4

44444

4

4

4

4

4

4

4

4

4

4

44

44

4

44

4

44

44

4

44

4

4

4

4

44444

444

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

444

4

4

4

44

4

444

4

44

4

4

44

4

4

4

4

4

4

4

4

4

4

4

4

4

44

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

444

4

44

4

4

4

444

4

4

4

444

4

4

4

4

4

44

4

444

4

44

4

4

444

4

4

4

4

44

4

4

4

4

4

44

4

4

4

44

44

4

44

4

4

4

4

4

4

4

4

4

444

44

4

4

4

4

4

4

445

55

5

55

5

5

5

5

5

5

55

55

5

5

55

5

5

5

5

5

5

5

5

5

5555

5

55

55

5

55

55

5

55

5

5

5555555

5

55

5

5

5

5

5

5

5

5

5555

5

5

5

5

5

55

5

5

5

5

5555

555

555

5555

5

5

5

5555

5

5

55

5

55

5

55

5

55

5

55

5

5

5

5

5

5

55

55

5

5

555

5

55

5

55

5

5

5

555

5

55

55

55

555

5

55

55

56

6

66

66

6

6

666

6

666

6

666

6

666

6

6666666

6

6

6

6

6

6

6

6

6

666

6

-6-5

-4-3

-2-1

01

2Li

near

pre

dict

ion

0-39 40-79 80-119 120+Alcohol g/day

Grouped linear model

1

1

1111

1

11111111

11

1

1

1

1111

1

11111

11

1

1

11

1

11111

1

1

1

1

11

11

11

11

11

1

1

1

1

11111

111

11

1

1111111

1

1

111

1

11

1

1

1

11

1

1

1

1

11

11

11

11

1

1

1

1

11

1

1

1

1

11

1

2

2

2222

22

22

22

22

2

2

2

2

2

22

2

22222

2

2

2

2

2

2222

2

22

22

2

2

22

222

2

2

2222

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2222

2

2

22

22222

2

2

222

2222

2

2

2

22

2

2

2

2

2

22222

222

22

2

22

222

2

22

2

2

2

22222

2

2

2

2

222

22

2

2

222222

2

2

2

2

2

2

2

22

2

2

2

2

222

222

2

2

2

2

2

2

2

2

2

22

22

2

22

222222

22

2

2

3

3

3

3

3

3

3

33

3

3

3

3

3

3

3

3

333

3

3

3

3

3

3

3

333

3

3

33

3

3

3

3

33

33

3

3

333333

3

3

3

3

3

3

3

3

33

33333

33

3

3

3

3

3

3

3333

3

33

3

3

3

3

3

333

3

3

3333

3

33

3

3

3

33

33

33

3

3

3

333

3

3

333

3

3

3333

3

33

3

3

333

3333333

3333

33

33333

3

3

3

3

3

33333

3

33

3

3

3

33

3

33

3

3

333

3

3

3

3

3

3

33

3

33

333

33

3

3333

3

3

3

3

3

33

3

3

3

3

3

3

3

3

3

3

444

4

4

4

4

44

4

4

4

44

4

4

444

444

4

4

4

4

444

4

4

4444

4

4

4

44

44

4

4

44

44

4

4

44444

4

4

4

4

4

4

4

4

4

4

44

44

4

44

4

44

44

4

44

4

4

4

4

44444

444

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

444

4

4

4

44

4

444

4

44

4

4

44

4

4

4

4

4

4

4

4

4

4

4

4

4

44

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

444

4

44

4

4

4

444

4

4

4

444

4

4

4

4

4

44

4

444

4

44

4

4

444

4

4

4

4

44

4

4

4

4

4

44

4

4

4

44

44

4

44

4

4

4

4

4

4

4

4

4

444

44

4

4

4

4

4

4

445

55

5

55

5

5

5

5

5

5

55

55

5

5

55

5

5

5

5

5

5

5

5

5

5555

5

55

55

5

55

55

5

55

5

5

5555555

5

55

5

5

5

5

5

5

5

5

5555

5

5

5

5

5

55

5

5

5

5

5555

555

555

5555

5

5

5

5555

5

5

55

5

55

5

55

5

55

5

55

5

5

5

5

5

5

55

55

5

5

555

5

55

5

55

5

5

5

555

5

55

55

55

555

5

55

55

56

6

66

66

6

6

666

6

666

6

666

6

666

6

6666666

6

6

6

6

6

6

6

6

6

666

6

-6-5

-4-3

-2-1

01

2Li

near

pre

dict

ion

0-39 40-79 80-119 120+Alcohol g/day

Dummy variable model

Page 18: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 18

Example from the Framingham study

Assume that cholesterol is the risk factor of interest for CHD and that age and sex are regarded as possible confoundersCoding: Sex: Male=0 Female=1

Age: 30-49 yrs=0 50-62 yrs=1Chol=0 <190 mg/100ml 1 190-219 mg / 100ml 2 220-249 mg /100ml 3 250+ mg/ 100ml

. infile sex age chol case count using "p:\536\framingham.txt"

. gen sa=sex*age

. logistic case sex age sa [freq=count]

Logistic regression Number of obs = 4856 LR chi2(3) = 223.78 Prob > chi2 = 0.0000Log likelihood = -1238.1973 Pseudo R2 = 0.0829

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- sex | .2343622 .0436654 -7.79 0.000 .1626654 .3376601 age | 2.708977 .3646438 7.40 0.000 2.080792 3.526809 sa | 2.170123 .5172673 3.25 0.001 1.36017 3.462386------------------------------------------------------------------------------

. est store A

Page 19: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 19

. logistic case sex age [freq=count]

Logistic regression Number of obs = 4856 LR chi2(2) = 212.83 Prob > chi2 = 0.0000Log likelihood = -1243.6693 Pseudo R2 = 0.0788

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- sex | .3673749 .041778 -8.81 0.000 .2939751 .4591012 age | 3.516703 .3839374 11.52 0.000 2.839262 4.35578------------------------------------------------------------------------------

. est store B

. lrtest A B

Likelihood-ratio test LR chi2(1) = 10.94(Assumption: B nested in A) Prob > chi2 = 0.0009

Now introduce cholesterol as a dummy variable, without and then with confounder adjustment.. xi: logistic case i.chol [freq=count]i.chol _Ichol_0-3 (naturally coded; _Ichol_0 omitted)

Logistic regression Number of obs = 4856 LR chi2(3) = 85.86 Prob > chi2 = 0.0000Log likelihood = -1307.1541 Pseudo R2 = 0.0318

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _Ichol_1 | 1.408998 .2849726 1.70 0.090 .9478795 2.094438 _Ichol_2 | 2.361255 .446123 4.55 0.000 1.630502 3.419514 _Ichol_3 | 3.811035 .6825005 7.47 0.000 2.682905 5.413532------------------------------------------------------------------------------

Page 20: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 20

. xi: logistic case i.chol sex age sa [freq=count]i.chol _Ichol_0-3 (naturally coded; _Ichol_0 omitted)

Logistic regression Number of obs = 4856 LR chi2(6) = 278.82 Prob > chi2 = 0.0000Log likelihood = -1210.675 Pseudo R2 = 0.1033

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- _Ichol_1 | 1.265023 .2604385 1.14 0.253 .8449991 1.89383 _Ichol_2 | 1.959199 .3786927 3.48 0.001 1.341375 2.861587 _Ichol_3 | 3.039625 .5653248 5.98 0.000 2.111102 4.376539 sex | .2504792 .0468737 -7.40 0.000 .1735726 .3614615 age | 2.649839 .360558 7.16 0.000 2.029543 3.459718 sa | 1.6341 .3962956 2.02 0.043 1.015894 2.628504------------------------------------------------------------------------------

. est store B

. lrtest B A

Likelihood-ratio test LR chi2(3) = 55.04(Assumption: A nested in B) Prob > chi2 = 0.0000

Now explore the dose-response for cholesterol. Consider merging the two lower categories.

. gen chol2=(chol>1)+(chol>2)

Page 21: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 21

. xi: logistic case age sex sa i.chol2 [freq=count]i.chol2 _Ichol2_0-2 (naturally coded; _Ichol2_0 omitted)

Logistic regression Number of obs = 4856 LR chi2(5) = 277.50 Prob > chi2 = 0.0000Log likelihood = -1211.3366 Pseudo R2 = 0.1028

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | 2.657913 .3615256 7.19 0.000 2.035924 3.469924 sex | .2494536 .0466714 -7.42 0.000 .172876 .3599521 sa | 1.646246 .3992246 2.06 0.040 1.023466 2.64799 _Ichol2_1 | 1.702258 .2465811 3.67 0.000 1.281517 2.261134 _Ichol2_2 | 2.638887 .3545856 7.22 0.000 2.027895 3.433968------------------------------------------------------------------------------

. est store C

. lrtest B C

Likelihood-ratio test LR chi2(1) = 1.32(Assumption: C nested in B) Prob > chi2 = 0.2500

We might also consider a grouped linear model:

. logistic case sex age sa chol [freq=count]

Logistic regression Number of obs = 4856 LR chi2(4) = 278.11 Prob > chi2 = 0.0000Log likelihood = -1211.0318 Pseudo R2 = 0.1030

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- sex | .2510722 .046962 -7.39 0.000 .1740143 .3622532 age | 2.647062 .3599985 7.16 0.000 2.027689 3.455627 sa | 1.6385 .3971248 2.04 0.042 1.01892 2.634833 chol | 1.484784 .0821353 7.15 0.000 1.332222 1.654817------------------------------------------------------------------------------

Page 22: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 22

. est store D

. lrtest B D

Likelihood-ratio test LR chi2(2) = 0.71(Assumption: D nested in B) Prob > chi2 = 0.7000

Or a grouped linear model based on 3 categories:

. logistic case sex age sa chol2 [freq=count]

Logistic regression Number of obs = 4856 LR chi2(4) = 277.36 Prob > chi2 = 0.0000Log likelihood = -1211.4088 Pseudo R2 = 0.1027

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- sex | .2488943 .0465438 -7.44 0.000 .1725196 .3590802 age | 2.656132 .3612456 7.18 0.000 2.034617 3.467503 sa | 1.648407 .3997793 2.06 0.039 1.024772 2.651562 chol2 | 1.621059 .1080207 7.25 0.000 1.422585 1.847223------------------------------------------------------------------------------

. est store E

. lrtest C E

Likelihood-ratio test LR chi2(1) = 0.14(Assumption: E nested in C) Prob > chi2 = 0.7040

Using a grouped linear model with three cholesterol categories, we next proceed to explore possible interactions between the confounders and cholesterol.

Page 23: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 23

. gen sc=sex*chol2

. gen ac=age*chol2

. logistic case sex age sa chol2 sc [freq=count]Logistic regression Number of obs = 4856 LR chi2(5) = 279.08 Prob > chi2 = 0.0000Log likelihood = -1210.5474 Pseudo R2 = 0.1034

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- sex | .2962674 .0673835 -5.35 0.000 .1897079 .4626819 age | 2.656331 .3622607 7.16 0.000 2.033285 3.470291 sa | 1.773859 .4424029 2.30 0.022 1.087998 2.892078 chol2 | 1.720257 .1388469 6.72 0.000 1.468555 2.015098 sc | .8292774 .1178365 -1.32 0.188 .627694 1.095599------------------------------------------------------------------------------

. est store F

. lrtest E FLikelihood-ratio test LR chi2(1) = 1.72(Assumption: E nested in F) Prob > chi2 = 0.1893

. logistic case sex age sa chol2 ac [freq=count]Logistic regression Number of obs = 4856 LR chi2(5) = 291.83 Prob > chi2 = 0.0000Log likelihood = -1204.1695 Pseudo R2 = 0.1081

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- sex | .254191 .0477456 -7.29 0.000 .1759042 .3673198 age | 4.490304 .8809846 7.66 0.000 3.056839 6.595975 sa | 1.799092 .4374102 2.42 0.016 1.117126 2.897374 chol2 | 2.11201 .2059725 7.67 0.000 1.744548 2.556871 ac | .6036774 .0801734 -3.80 0.000 .465327 .7831619------------------------------------------------------------------------------

. est store G

Page 24: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 24

. lrtest E G

Likelihood-ratio test LR chi2(1) = 14.48(Assumption: E nested in G) Prob > chi2 = 0.0001

. lincom 2*chol2

( 1) 2 chol2 = 0

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- (1) | 4.460585 .8700318 7.67 0.000 3.043448 6.537589------------------------------------------------------------------------------

. lincom chol2+ac

( 1) chol2 + ac = 0

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- (1) | 1.274972 .1149389 2.69 0.007 1.068476 1.521376------------------------------------------------------------------------------

. lincom 2*chol2+2*ac

( 1) 2 chol2 + 2 ac = 0

------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- (1) | 1.625555 .2930878 2.69 0.007 1.141642 2.314586------------------------------------------------------------------------------

Page 25: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 25

Logistic modelsLogit(p)=β0+β1chol2+β2sex+β3age+β4sex*age+β5chol2*age

Sex Age Cholesterol Logit(p) 0 0 0 <220 β0 0 0 1 220-249 β0 + β1 0 0 2 250+ β0 + 2β1 1 0 0 β0 + β2 1 0 1 β0 + β1 + β2 1 0 2 β0 + 2β1 + β2 0 1 0 β0 + β3 0 1 1 β0 + β1 + β3 + β5 0 1 2 β0 + 2β1 + β3 + 2β5 1 1 0 β0 + β2+ β3 + β4 1 1 1 β0 + β1 + β2+ β3 + β4 + β5 1 1 2 β0 + 2β1 + β2 + β3 + β4 + 2β5

Page 26: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 26

What is the estimated odds ratio for CHD comparing:

30-49 year old men with cholesterol 220-249mg/100 ml to those with cholesterol below 220mg/100ml?

30-49 year old men with cholesterol 250+ mg/100 ml to those with cholesterol below 220mg/100ml?

30-49 year old women with cholesterol 250+mg/100 ml to those with cholesterol 220-249mg/100ml?

50-62 year old women with cholesterol 220-249mg/100 ml to those with cholesterol below 220mg/100ml?

50-62 year old women with cholesterol 250+mg/100 ml to those with cholesterol below 220mg/100ml?

2.11

4.46

2.11

1.27

1.63

Page 27: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 27

Dose response models

3. Continuous X

Consider the low birthweight study: X= mother's prepregnancy weight in lbs.

(i) Model: logit(p)= β0 + β1 X

What is the interpretation of β1 ?

How does it depend on the measurement scale?

How is it related to interpretation in a grouped linear model?

Page 28: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 28

. logistic low lwt

Logistic regression Number of obs = 189 LR chi2(1) = 5.98 Prob > chi2 = 0.0145Log likelihood = -114.34533 Pseudo R2 = 0.0255

------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- lwt | .9860401 .0060834 -2.28 0.023 .9741886 .9980358------------------------------------------------------------------------------

. lincom 10*lwt, or

( 1) 10 lwt = 0

------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- (1) | .8688519 .0536044 -2.28 0.023 .7698929 .9805307------------------------------------------------------------------------------

. logit low lwt

Logistic regression Number of obs = 189 LR chi2(1) = 5.98 Prob > chi2 = 0.0145Log likelihood = -114.34533 Pseudo R2 = 0.0255

------------------------------------------------------------------------------ low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- lwt | -.0140583 .0061696 -2.28 0.023 -.0261504 -.0019661 _cons | .9983143 .7852889 1.27 0.204 -.5408235 2.537452------------------------------------------------------------------------------.

. predict lp, p

. est store A

Page 29: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 29

020

4060

80Te

st s

core

0 50 100 150Exposure

Be careful how you interpret the results of the trend test. For instance, the difference between no exposure and some, can cause trend tests to reject H0 : β1 =0.

Page 30: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 30

(ii) Categorical dummy variable models . centile lwt, c(20,40,60,80)

-- Binom. Interp. --Variable | Obs Percentile Centile [95% Conf. Interval]---------+------------------------------------------------------------- lwt | 189 20 107 102.8034 110 | 40 120 115 120 | 60 130 123 132 | 80 150 140 160

. gen lwtc=(lwt>107)+(lwt>120)+(lwt>130)+(lwt>150)

. xi: logistic low i.lwtc

i.lwtc Ilwtc_0-4 (naturally coded; Ilwtc_0 omitted)

Logit estimates Number of obs = 189

LR chi2(4) = 11.04

Prob > chi2 = 0.0261

Log likelihood = -111.81371 Pseudo R2 = 0.0471

------------------------------------------------------------------------------

low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

Ilwtc_1 | .3410256 .1523973 -2.407 0.016 .1420374 .818788

Ilwtc_2 | .5225 .2578529 -1.315 0.188 .1986185 1.374526

Ilwtc_3 | .2891304 .1554239 -2.308 0.021 .1008149 .8292065

Ilwtc_4 | .2293103 .1213334 -2.783 0.005 .0812893 .6468657

------------------------------------------------------------------------------

Page 31: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 31

(iii) Continuous non-linear functions

Greenland S. (1995) Dose-Response and Trend Analysis in Epidemiology: Alternatives to Categorical Analysis. Epidemiology 6: 356-365.

Non-parametric smoothing Splines

Fractional polynomials

Generalized additive models (GAMS)

Page 32: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 32

Smoothing. lowess low lwt, gen(lows)

. twoway (scatter low lwt) (line lows lwt, sort lcol(red)) (line lp lwt, sort ), scheme(s1mono) legend(off) xtitle(Pre-pregnancy wt (lbs)) ytitle(Prob of low birthweight)

0.2

.4.6

.81

Pro

b of

low

bir

thw

eigh

t

50 100 150 200 250Pre-pregnancy wt (lbs)

Page 33: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 33

Splines

Instead of a simple continuous function or a step function, fit a function that is linear / quadratic / cubic within group categories, but constrained to join "nicely" at the boundaries.

Advantages: Individual data points have a strong influence on the shape of the curve only in

the interval in which they lie More plausible than a step function Flexibility Relatively easy to fit

Disadvantages: As with step function: interval choice is subjective. Hard to report results succinctly any way but graphically. More parameters to be fitted

---

Page 34: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 34

Linear spline

Divide the observed values of X into k+1 categories Define:

S1 = 0 x< c1 S2 = 0 x< c2 … Sk = 0 x< ck = x- c1 x ≥ c1 = x- c2 x ≥ c2 = x- ck x ≥ ck

Model: logit(p) = β0 + β1 X + β2 S1 + β3 S2 + … + βk Sk

- Coefficient βi measures change in slope from interval (i-1) to interval i - β1 + β2 + … + βi estimates the log odds ratio corresponding to a 1 unit

change in X within interval i (i≥1).

Page 35: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 35

Stata example . gen s1=(lwt>107)*(lwt-107) . gen s2=(lwt>120)*(lwt-120) . gen s3=(lwt>130)*(lwt-130) . gen s4=(lwt>150)*(lwt-150) . logit low lwt s1-s4

Logistic regression Number of obs = 189 LR chi2(5) = 9.90 Prob > chi2 = 0.0781Log likelihood = -112.38526 Pseudo R2 = 0.0422

------------------------------------------------------------------------------ low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- lwt | -.0260585 .0409074 -0.64 0.524 -.1062356 .0541185 s1 | -.055212 .081114 -0.68 0.496 -.2141925 .1037685 s2 | .1424752 .1001686 1.42 0.155 -.0538516 .338802 s3 | -.0732678 .0852176 -0.86 0.390 -.2402912 .0937556 s4 | -.0004225 .0449774 -0.01 0.993 -.0885765 .0877316 _cons | 2.493599 4.084743 0.61 0.542 -5.51235 10.49955------------------------------------------------------------------------------

. predict lsp, p

. est store B

. lrtest B A

Likelihood-ratio test LR chi2(4) = 3.92(Assumption: A nested in B) Prob > chi2 = 0.4169

Page 36: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 36

. mkspline l1 107 l2 120 l3 130 l4 150 l5=lwt, marginal

. logit low l1-l5Logistic regression Number of obs = 189 LR chi2(5) = 9.90 Prob > chi2 = 0.0781Log likelihood = -112.38526 Pseudo R2 = 0.0422

------------------------------------------------------------------------------ low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- l1 | -.0260585 .0409074 -0.64 0.524 -.1062356 .0541185 l2 | -.055212 .081114 -0.68 0.496 -.2141925 .1037685 l3 | .1424752 .1001686 1.42 0.155 -.0538516 .338802 l4 | -.0732678 .0852176 -0.86 0.390 -.2402912 .0937556 l5 | -.0004225 .0449774 -0.01 0.993 -.0885765 .0877316 _cons | 2.493599 4.084743 0.61 0.542 -5.51235 10.49955------------------------------------------------------------------------------

. est store C

. lrtest C A

Likelihood-ratio test LR chi2(4) = 3.92(Assumption: A nested in B) Prob > chi2 = 0.4169

Alternative Stata code

Page 37: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 37

Sensitivity to choice of intervals. gen t1=(lwt>100)*(lwt-100). gen t2=(lwt>125)*(lwt-125). gen t3=(lwt>150)*(lwt-150). gen t4=(lwt>175)*(lwt-175). logistic low lwt t1 t2 t3 t4. logit low lwt t1 t2 t3 t4

Logistic regression Number of obs = 189 LR chi2(5) = 8.70 Prob > chi2 = 0.1216Log likelihood = -112.98503 Pseudo R2 = 0.0371

------------------------------------------------------------------------------ low | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- lwt | -.0209016 .062629 -0.33 0.739 -.1436521 .101849 t1 | -.023866 .0785323 -0.30 0.761 -.1777865 .1300544 t2 | .065341 .0469319 1.39 0.164 -.0266438 .1573259 t3 | -.0494432 .0615913 -0.80 0.422 -.1701599 .0712735 t4 | .0204834 .0587547 0.35 0.727 -.0946738 .1356405 _cons | 1.985505 6.026281 0.33 0.742 -9.82579 13.7968------------------------------------------------------------------------------

. predict lsp2, p

. twoway (scatter low lwt) (line lp lwt, sort lcol(red)) (line lsp lwt, sort clpat(dash) lcol(green) ) (line lsp2 lwt, sort clpat(dash_dot) lcol(blue) ), scheme(s1mono) legend(off) xtitle(Pre-pregnancy wt (lbs)) ytitle(Prob of low birthweight)

Page 38: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 38

0.2

.4.6

.81

Pro

b of

low

bir

thw

eigh

t

50 100 150 200 250Pre-pregnancy wt (lbs)

Page 39: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 39

. spline low lwt, knots(107,120,130,150) regress(logit) gen(csp)

. twoway (scatter low lwt) (line lp lwt, sort lcol(red)) (line lsp lwt, sort clpat(dash_dot) lcol(green)) (line csp lwt, sort clpat(dash)) , scheme(s1mono) legend(off) xtitle(Pre-pregnancy wt (lbs)) ytitle(Prob of low birthweight)

0.2

.4.6

.81

Pro

b of

low

bir

thw

eigh

t

50 100 150 200 250Pre-pregnancy wt (lbs)

Cubic spline

Page 40: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 40

. spline low lwt, n(3) regress(logistic) gen(csp2)

. twoway (scatter low lwt) (line lp lwt, sort lcol(red)) (line csp2 lwt, sort clpat(dash_dot) lcol(pink)) (line csp lwt, sort clpat(dash)) , scheme(s1mono) legend(off) xtitle(Pre-pregnancy wt (lbs)) ytitle(Prob of low birthweight)

0.2

.4.6

.81

Pro

b of

low

bir

thw

eigh

t

50 100 150 200 250Pre-pregnancy wt (lbs)

Sensitivity to choice of intervals

Page 41: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 41

Fractional polynomialsA fractional polynomial of degree m in a variable, X (e.g. pre-pregnancy weight), is a

linear combination of m power transformations of the form:1

k

mp

kk

X , where kpX = ln (X) if

pk=0. Typically FPs based on powers from the set P ={-2, -1, -0.5, 0, 0.5, 1, 2, 3} are considered.

Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol, 1999; 28: 964-974.

Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious modelling. Applied Statistics, 1994; 43: 429-467.

Sauerbrei W, Royston P. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Statist Soc A, 1999; 162: 71-94.

Page 42: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 42

Fractional polynomials. fracpoly logistic low lwt........-> gen double Ilwt__1 = X^-2-.5934053858 if e(sample)-> gen double Ilwt__2 = X^-2*ln(X)-.1548424581 if e(sample) (where: X = lwt/100)

Logistic regression Number of obs = 189 LR chi2(2) = 7.51 Prob > chi2 = 0.0234Log likelihood = -113.58167 Pseudo R2 = 0.0320

------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- Ilwt__1 | 3.044759 5.104863 0.66 0.507 .1138735 81.41099 Ilwt__2 | .2034832 .9615213 -0.34 0.736 .0000193 2141.511------------------------------------------------------------------------------Deviance: 227.16. Best powers of lwt among 44 models fit: -2 -2.

. predict fp,p

. twoway (scatter low lwt, symbol(x)) (line lp lwt, sort lcol(red)) (line fp lwt, sort clpat(dash_dot) lcol(green)) (line csp lwt, sort clpat(dash)), scheme(s1mono) legend(off) xtitle(Pre-pregnancy wt (lbs)) ytitle(Prob of low birthweight)

Page 43: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 43

0.2

.4.6

.81

Pro

b of

low

Bir

thw

eigh

t

50 100 150 200 250Pre-pregnancy wt (lbs)

Page 44: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 44

. fracpoly logistic low lwt, degree(3) compare

............................................-> gen double Ilwt__1 = X^3-2.187624479 if e(sample)-> gen double Ilwt__2 = X^3*ln(X)-.5708359916 if e(sample)-> gen double Ilwt__3 = X^3*ln(X)^2-.1489532287 if e(sample) (where: X = lwt/100)

Logistic regression Number of obs = 189 LR chi2(3) = 9.32 Prob > chi2 = 0.0253Log likelihood = -112.67397 Pseudo R2 = 0.0397

------------------------------------------------------------------------------ low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- Ilwt__1 | .0065153 .0166058 -1.97 0.048 .0000441 .9625743 Ilwt__2 | 46656.28 277007.2 1.81 0.070 .4122545 5.28e+09 Ilwt__3 | .0013849 .0055176 -1.65 0.099 5.63e-07 3.408812------------------------------------------------------------------------------Deviance: 225.35. Best powers of lwt among 164 models fit: 3 3 3.

Fractional polynomial model comparisons:---------------------------------------------------------------lwt df Deviance Gain P(term) Powers---------------------------------------------------------------Not in model 0 234.672 -- --Linear 1 228.691 0.000 0.014 1m = 1 2 227.276 1.414 0.234 -2m = 2 4 227.163 1.527 0.945 -2 -2m = 3 6 225.348 3.343 0.403 3 3 3---------------------------------------------------------------

Page 45: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 45

Selection of variables

It is fairly straightforward to fit models and interpret them.

The choice of what variables to include in a model is less straightforward.

Why select variables rather than using all available?

The approach to model building may vary depending on the purpose of the analysis - whether one is focussed on estimating associations or accurate prediction of the outcome. In all cases, multivariate modeling should be preceded by careful univariate analyses.

Page 46: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 46

Modeling associations

Distinguish between a clearly designed study and a "fishing expedition". There has been much discussion in the statistical and epidemiologic literature as to the appropriate confounder selection strategy. Breslow & Day (1980) Chatfield C (1995). Model uncertainty, data mining and statistical inference. J R Statist Soc A 158: 419-466. Day NE, Byar DP, Green SB (1980). Overadjustment in case-control studies. Am J Epidemiol 112: 696-706. Greenland S (1989). Modeling and variable selection in epidemiologic analysis. Am J Public Health 79: 340-349. Maldonado G, Greenland S (1993). Simulation study of Confounder-Selection Strategies. Am J Epidemiol 138: 923-936. Mickey RM, Greenland S (1989) The impact of confounder selection criteria on effect estimation. Am J Epidemiol 129: 125-137. Schall R, Zucchini W (1990). Model selection and the estimation of odds ratios in the presence of extraneous factors. Statistics in Medicine 9: 1131-1141.

Page 47: Lecture 6 Comparison of logistic regression  and stratified analyses

Biost 536 Thompson Part 2 47

Strategies Fit all known confounders

Significance testing

Change of estimate rule

A sensible strategy:

All variables that are logically confounders should be examined for their effects as confounders by controlling for them.

If the confounder alters the estimate of interest or its standard error to an important degree, include it in the model.

If the confounder does not appreciably alter the estimate, then include it in the model if it is: traditional (e.g. gender or age) statistically significant estimates are believable there are not too many other confounders in the model

Assess the role of exposure and possible effect modification by means of hypothesis testing.