Using Logistic Regression In Case Control Studies Department of Community Health Sciences: September...
Transcript of Using Logistic Regression In Case Control Studies Department of Community Health Sciences: September...
Using Logistic Regression In Case Control Studies
Department of Community Health Sciences: September 27,2002
No statistics should stand in the way of an experimenter keeping his eyes open, his mind flexible, and on the lookout for surprises. (William Feller)
Background:
Quan H., Arboleda-Florez J., Fick G.H., Stuart H.L., Love E.J. (2002) Association Between Physical Illness and Suicide Among The Elderly. Social Psychiatry and Psychiatric Epidemiology, 37:190-197
David Adler, Nimira Kanji, Kiril Trpkov, Gordon Fick, Rhiannon M. Hughes HPC2/ELAC2 Gene Variants Associated with Prostate Cancer (in submission)
The class of MDSC643.02 in the Winter Term 2002
Case Control Studies
Investigator selects cases and controls
Investigator determines exposurePrimary outcome measure: Odds of
exposure (yes/no)The ‘Magic’ Odds Ratio (OR)
Case Control Studies
Two by Two tablesClassical Stratified Analysis (SA)Stratum specific odds ratiosCrude odds ratioMantel Haentzel odds ratioEASY…. Right?
A Definition of the Chi-Square test:
A procedure any fool can carry out and frequently does.
(SJ Penn)
Logistic Regression
1) Model the log of the odds of exposureOR2) Model the log of the odds of disease
Does it matter? MOST of the time.Standard Likelihood theory gives us
blessing for option 1)
It does not matter -
IF the model is equivalent to a stratified analysis,
…. then some of the coefficients from LR will the same as the log(OR) values from SA
…. not all the coefficients will be the same though…
Results will differ -
In ALL other situations (at least a little…)
BUT there are those solid papers in the literature that appear to say “it’s OKAY” to model the odds of disease
AND the textbooks and standard references appear to give a ‘green light’ as well
References
Prentice RL and Pyke R (1979) Biometrika 66:3 403-411
[after some impenetrable mathematics]
“….is precisely the distributional statement that would arise if [a model for the odds of disease] were directly applied to the case-control data”
BUT… BUT… Aren’t we all frequentists?
The books:
Kleinbaum, Kupper, Morgenstern Rothman and Greenland Rosner Matthews and Farewell
They ALL note the estimates are OKAY They are ALL silent on the sampling distribution. BUT what about the standard errors? P-values?
It does not follow that if quantitative methods be indiscriminately applied to inexhaustible quantitiesof data, scientific understanding will necessarily emerge. (M.K. Hubbert)
An exercise that makes no provision for the definition and estimation of error cannot properly be called an experiment. (D.B. DeLury)
Exposures may not be dichotomous
If exposure is ‘measured’, then the model for exposure could be linear regression
There is no ‘obvious’ magical odds ratio now
BUT it is still SO SO tempting to just model the log of odds of disease with a *continuous* independent variable (exposure)
The modelling process -
Can lead us in very different ways to very different models and very different conclusions:
QUAN Hude et al et al and Rhiannon Hughes
What about the Gate Keepers?
Editors and Associate EditorsEpidemiologistsBiostatisticians
Conclusions
I am taking yet another poke at the much maligned case-control study
Epidemiological issues still dominate the challenges of designing and using case-control studies
It remains safe to model the exposure(s) individually as dependent variable(s) (if we trust the standard likelihood theory)
A definition of Power:
A probability of a possible outcome of a potential decision conditional upon an imaginable circumstance given a conceivable value of an algebraic embodiment of an abstract mathematical idea and the strict adherence to an extremely precise rule.
SJ Penn again