Using Logistic Regression In Case Control Studies Department of Community Health Sciences: September...

Using Logistic Regression In Case Control Studies

Department of Community Health Sciences: September 27,2002

No statistics should stand in the way of an experimenter keeping his eyes open, his mind flexible, and on the lookout for surprises. (William Feller)

Background:

Quan H., Arboleda-Florez J., Fick G.H., Stuart H.L., Love E.J. (2002) Association Between Physical Illness and Suicide Among The Elderly. Social Psychiatry and Psychiatric Epidemiology, 37:190-197

David Adler, Nimira Kanji, Kiril Trpkov, Gordon Fick, Rhiannon M. Hughes HPC2/ELAC2 Gene Variants Associated with Prostate Cancer (in submission)

The class of MDSC643.02 in the Winter Term 2002

Case Control Studies

Investigator selects cases and controls

Investigator determines exposurePrimary outcome measure: Odds of

exposure (yes/no)The ‘Magic’ Odds Ratio (OR)

Case Control Studies

Two by Two tablesClassical Stratified Analysis (SA)Stratum specific odds ratiosCrude odds ratioMantel Haentzel odds ratioEASY…. Right?

A Definition of the Chi-Square test:

A procedure any fool can carry out and frequently does.

(SJ Penn)

Logistic Regression

1) Model the log of the odds of exposureOR2) Model the log of the odds of disease

Does it matter? MOST of the time.Standard Likelihood theory gives us

blessing for option 1)

It does not matter -

IF the model is equivalent to a stratified analysis,

…. then some of the coefficients from LR will the same as the log(OR) values from SA

…. not all the coefficients will be the same though…

Results will differ -

In ALL other situations (at least a little…)

BUT there are those solid papers in the literature that appear to say “it’s OKAY” to model the odds of disease

AND the textbooks and standard references appear to give a ‘green light’ as well

References

Prentice RL and Pyke R (1979) Biometrika 66:3 403-411

[after some impenetrable mathematics]

“….is precisely the distributional statement that would arise if [a model for the odds of disease] were directly applied to the case-control data”

BUT… BUT… Aren’t we all frequentists?

The books:

Kleinbaum, Kupper, Morgenstern Rothman and Greenland Rosner Matthews and Farewell

They ALL note the estimates are OKAY They are ALL silent on the sampling distribution. BUT what about the standard errors? P-values?

It does not follow that if quantitative methods be indiscriminately applied to inexhaustible quantitiesof data, scientific understanding will necessarily emerge. (M.K. Hubbert)

An exercise that makes no provision for the definition and estimation of error cannot properly be called an experiment. (D.B. DeLury)

Exposures may not be dichotomous

If exposure is ‘measured’, then the model for exposure could be linear regression

There is no ‘obvious’ magical odds ratio now

BUT it is still SO SO tempting to just model the log of odds of disease with a *continuous* independent variable (exposure)

The modelling process -

Can lead us in very different ways to very different models and very different conclusions:

QUAN Hude et al et al and Rhiannon Hughes

What about the Gate Keepers?

Editors and Associate EditorsEpidemiologistsBiostatisticians

Conclusions

I am taking yet another poke at the much maligned case-control study

Epidemiological issues still dominate the challenges of designing and using case-control studies

It remains safe to model the exposure(s) individually as dependent variable(s) (if we trust the standard likelihood theory)

A definition of Power:

A probability of a possible outcome of a potential decision conditional upon an imaginable circumstance given a conceivable value of an algebraic embodiment of an abstract mathematical idea and the strict adherence to an extremely precise rule.

SJ Penn again

Using Logistic Regression In Case Control Studies Department of Community Health Sciences: September...

Documents

Transcript of Using Logistic Regression In Case Control Studies Department of Community Health Sciences: September...