Post on 03-Jan-2016
description
Applied Epidemiologic AnalysisFall 2002
Applied Epidemiologic Analysis
Patricia Cohen, Ph.D.
Henian Chen, M.D., Ph. D.
Teaching Assistants
Julie Kranick Sylvia TaylorChelsea Morroni Judith Weissman
Applied Epidemiologic AnalysisFall 2002
Lecture 7
Categorical analysis
Conditional logistic regression
Unconditional logistic regression
Introduction to stratifiers
Applied Epidemiologic AnalysisFall 2002
Objectives
• To understand the basic assumptions of analyses of case-control and cohort data
• To see how assumptions about the predictor variables differ between categorical analyses and some regression models
• To see the connection between stratified analyses and analyses incorporating all stratifiers as predictors
Applied Epidemiologic AnalysisFall 2002
Categorical analysis:Analyses of tables of frequencies
Assumptions / requirements
• Adequate sample size in each table cell and in total
• Independence of outcomes• no contagion effects • single event per person
• For rates, homogeneity: probability of outcome is uniform for all time units in a stratum
e.g., doesn’t matter if 6 people are observed for 10 years or 10 people are observed for 6 years
Applied Epidemiologic AnalysisFall 2002
Does not assume that distributions of exposure and other predictors are fixed.
In contrast, ordinary regression analysis assumes that distributions of independent variables are fixed (selected or created by the researchers, rather than whatever distributions happen to characterize the sampled population).
Ordinary or “unconditional” logistic regression also assumes that independent variables are fixed.
Categorical analysis
Applied Epidemiologic AnalysisFall 2002
Cateforical analysis of incidence rates:A single group in comparison to some expected rate
T
AIncidence per time unit in exposed group
T
E Incidence per time unit expected in the reference population for the same distribution of person-time (e.g., based on morbidity rates for equivalent age groups)
Applied Epidemiologic AnalysisFall 2002
Categorical analysis:Single group in comparison to some expected rate
E
A
TETA
ratio = standardized morbidity ratio
Since person-time distribution is contant:
Confidence limits on this rate ratio employ the Poisson distribution and maximum likelihood estimation.
Should be adequate when E > 5.
Applied Epidemiologic AnalysisFall 2002
Categorical analysis of 2 groups, exposed and unexposed
Maximum likelihood estimates using the Poisson model are used to estimate rate ratios and rate difference or risk ratios and risk differences.
Hand calculation of these estimates is rare, partly because of the inclusion of multiple confounders and/or exposures in the models.
Applied Epidemiologic AnalysisFall 2002
Selecting an analytic model forcase-control (or cohort) data
The ordinary least squares (OLS) method of analyzing dichotomous outcomes is problematic because the formal assumptions of the model (homoscedasticity) are necessarily violated.
Nevertheless, for case-control data with similar sample sizes in the two groups, conclusions from OLS and logistic regression may well be similar.
Applied Epidemiologic AnalysisFall 2002
Ordinary Least Squares
This model uses as a link function the “identity” function: a difference in the value of the predictor is (linearly) related to a difference in the value of the outcome.
When the outcome is disease or non-disease, this is equivalent to a difference in the proportion with the disease (incidence or prevalence).
For a binary exposure the B = difference in proportion, or risk difference.
Applied Epidemiologic AnalysisFall 2002
Logistic Regression Model
The link function estimated by the logistic regression model (using maximum likelihood methods) is the log odds or logit.
In this model, for a binary exposure the B = difference in the log odds of the outcome (disease).
It is equivalent to an exponential odds model, so taking the anti-log provides the odds ratio, an estimate of risk ratio.
Applied Epidemiologic AnalysisFall 2002
Other models
• Exponential risk models – a log- linear risk model (requires an estimate of risk in the source population)
• Probit model – assumes a normal distribution underlying outcome; used in bioassay and economics
Note: These alternative models are designed to provide apprpriate statistical tests, but do not necessarily match the actual biological mechanisms.
Applied Epidemiologic AnalysisFall 2002
Stratification of case-control data
• A means of equating for stratifiers• Most often on sex and age categories
• Note: If there is a non-trivial age difference there will be a remaining mean difference within categories.
Applied Epidemiologic AnalysisFall 2002
Stratifying variables: standardization
Standardization of rates or risks with regard to a stratifying variable.
Example: Control group = 40% male
Case group = 60% male
Can standardize the case group to the control by weighting every female case by 1.5 and every male case by .67. So the sum of weights still = N in the case group.
Applied Epidemiologic AnalysisFall 2002
Stratifying variables: standardization
Thus, for every 100 cases we have:
60 males * .67 (= 40)
40 females * 1.5 (=60)
Weighted N = 100.
Could, alternatively, weight both case and control groups to equal male and female sizes.
Applied Epidemiologic AnalysisFall 2002
Weighting for rate or risk difference, or unconditional logistic regression
This weighting to produce equality on predictors can be done for hand calculation of rate or risk differences or for computer analyses of data by conditional or unconditional logistic regression.
Note: This is only one reason for weighting observations. Another common reason is to take into account sampling strategies with unequal probabilities for inclusion. Such strategies often over-sample certain strata in order to improve the statistical power for analyses of subgroups.
Applied Epidemiologic AnalysisFall 2002
Weighting
It is useful to see this as analogous to what the analytic program does when inclusion of a predictor “equates” groups by removing effects of counfounders.
Simple standardization assumes a uniform effect of exposure across strata: each stratum provides an estimate of the same quantity.
Statistical tests of homogeneity are commonly used to decide whether this assumption is warranted.
Applied Epidemiologic AnalysisFall 2002
Mantel – Haenszel Estimation
Mantel – Haenszel estimation of uniform rate differences (using weights as described above applied to person – time)
Preferred when some strata have fewer than 10 cases
Unbiased, unlike maximum likelihood estimates, but larger SE (much larger for rate difference, not much for rate ratio)
Applied Epidemiologic AnalysisFall 2002
First Study : Wine drinking and risk of non-Hodgkin’s lymphoma among men in the United States: a population based case-control study
Reference:
Nathaniel C. Briggs, Robert S. Levine, Linda D. Bobo, William P. Haliburton, Edward A. Brann, and Charles H. Hennekens, American Journal of Epidemiology, 156, No. 5, 454-462
Applied Epidemiologic AnalysisFall 2002
The problem: Lymphoma study
Non-Hodgkin’s lymphoma (NHL) is the fifth most common cancer in the United States with etiology mostly unknown.
Can exploration of protective factors help move toward etiological understanding? Specifically, will this study strengthen prior weak evidence of lower NHL in wine drinkers?
Applied Epidemiologic AnalysisFall 2002
Population studied, study design, and sample size : Lymphoma study
960 cases of NHL males born 1929 – 1953 and diagnosed 1984 – 1988 (without specific known risks such as HIV)
1717 controls of males recruited through random digit dialing and matched geographically
Applied Epidemiologic AnalysisFall 2002
Measurement issues: Lymphoma
Data collected by interviews regarding life-time habits
Selection and inclusion of predictors in the analysis:
* All odds ratios (OR) are adjusted for age, race/ethnicity, cancer registry,smoking history, and education.
Odds ratios for each alcohol beverage type are adjusted for the other types. All odds ratios are in reference to nondrinkers.
Applied Epidemiologic AnalysisFall 2002
The effect being estimated: Lymphoma
Basic analysis to answer study questions:
Logistic regression analysis
Test for the significance of the trend (dose-response) in the OR as dose increases
Odds ratios of NHL associated with alcohol consumption by type and quantity over the life-time
Applied Epidemiologic AnalysisFall 2002
TABLE 5. Adjusted odds ratios and 95% confidence intervals for risk of developing non-Hodgkin’s lymphoma by type, quantity, and age of onset of alcohol beverage consumption, Selected Cancers Study, 1984–1988 No. of
cases No. of controls
OR* 95% CI†
p for trend‡
Never drinkers 300 510 1.0 All drinkers§ 660 1,207 0.9 0.8, 1.1 Current drinkers¶ 490 930 0.9 0.8, 1.1 Former drinkers 170 277 1.0 0.8, 1.3 Wine drinkers 1–6 drinks/week 178 352 0.8 0.5, 1.3 1 drink/day 46 121 0.4 0.2, 0.9 0.02 Beer drinkers 1–6 drinks/week 271 555 0.8 0.6, 1.1 1–2 drinks/day 168 242 1.2 0.8, 1.7 3 drinks/day 93 160 0.9 0.6, 1.4 0.58 Spirits drinkers 1–6 drinks/week 237 454 0.8 0.6, 1.2 1–2 drinks/day 109 178 1.1 0.7, 1.8 3 drinks/day 53 69 1.1 0.6, 2.1 0.38 Age at onset (years) 16 52 130 0.7 0.4, 0.96 17–18 182 291 1.0 0.8, 1.3 19–20 103 200 0.9 0.6, 1.2 21 319 572 0.9 0.8, 1.2 0.75
Applied Epidemiologic AnalysisFall 2002
Conclusions: Lymphoma
“Among wine drinkers, there was a significant linear decrease in risk of NHL with increasing quantity of wine intake. A more than twofold decrease in risk was seen for consumption of one wine drink or more per day.”
Note that the p for trend tests the dose-response aspect.
Applied Epidemiologic AnalysisFall 2002
TABLE 7. Adjusted odds ratios and 95% confidence intervals for risk of developing non-Hodgkin’s lymphoma among drinkers by type and quantity of alcohol beverage consumption stratified by age of onset of drinking, Selected Cancers Study, 1984–1988
Onset age <16 years Onset age >17 years
No. of cases
No. of controls OR* 95% CI†
p for trend‡
No. of cases
No. of controls OR* 95% CI†
p for trend‡
Nondrinkers 300 510 1.0 300 510 1.0
All drinkers
Current drinkers 37 91 0.7 0.4, 1.1 450 154 0.9 0.8, 1.1
Former drinkers 15 39 0.6 0.3, 1.1 826 237 1.0 0.8, 1.3
Wine drinkers 12 55 0.4 0.2, 0.7 211 412 0.9 0.7, 1.2
1–6 drinks/week 8 31 0.4 0.2, 0.97 169 315 1.0 0.8, 1.3
1 drink/day 4 24 0.3 0.1, 0.8 0.004 42 97 0.7 0.4, 1.04 0.05
Nonwine drinkers 40 75 1.0 0.8, 1.2 393 651 1.0 0.9, 1.1
1–6 drinks/week 9 24 0.7 0.3, 1.6 140 268 0.8 0.6, 1.1
1 drink/day 31 51 1.0 0.8, 1.3 0.85 253 383 1.0 0.9, 1.2 0.88
Applied Epidemiologic AnalysisFall 2002
Conclusions: Lymphoma
Early age of onset of drinking was associated with decreased risk of NHL specifically for wine drinkers.
Discussed biologic plausibility, probable effects of self-report, and data limitations (biases generally would be expected to lower effects) and age-sex limitations of sample.
Applied Epidemiologic AnalysisFall 2002
Second Study : Occupation and Adult Gliomas
Reference:
Susan E. Carozza, Margaret Wrensch, Rei Miike, Beth Newman, Andrew F. Olshan, David A. Savitz, Michael Yost and Marion Lee American Journal of Epidemiology, 152, No 9, 838 - 846.
Applied Epidemiologic AnalysisFall 2002
The problem: Gliomas
Gliomas are the most common form of primary malignant brain tumor in adults. The etiology is largely unknown but prior evidence implicates occupational exposures associated with certain chemically-exposed industrial, agricultural and blue-collar workers.
Applied Epidemiologic AnalysisFall 2002
Population studied, study design, and sample size : Gliomas
492 incident cases in San Francisco bay area, age over 20
462 controls recruited through random digit dialing, matched by :
5 year age group
gender
ethnicity
(Note: 1/3 declined to participate. Controls more educated because of participation bias.)
Applied Epidemiologic AnalysisFall 2002
Measurement issues: Gliomas
Because of rapid death of cases, many proxy informants needed to supply information. How might these interviews be biased?
Are the controls likely to be adequate?
Control variables: age (20- 54 vs 55+), gender, years of education, race
Applied Epidemiologic AnalysisFall 2002
Analyses
Exposure measures:
•All jobs held at least 6 months in lifetime
•All jobs up to 10 years previously (assuming a 10 year latency)
•Within each,
ever employed
< 10 years
=> 10 years
Applied Epidemiologic AnalysisFall 2002
Logic of study: Gliomas
If real, the association should increase with longer exposure.
Also, if real, the effect should be more apparent when the latency period is excluded.
Applied Epidemiologic AnalysisFall 2002
Ever employed < 10 years 10 years No latency period
OR 95% CI OR 95% CI OR 95% CI Managers, administrators 0.9 0.7, 1.2 0.8 0.6, 1.2 1.1 0.7, 1.6 Engineers, architects, draughtsmen 0.9 0.6, 1.4 0.8 0.4, 1.4 1.1 0.6, 1.9 Mathematical, physical, computer scientists 1.0 0.7, 1.6 1.5 0.9, 2.6 0.6 0.3, 1.1 Biologic scientists 1.0 0.4, 2.3 1.2 0.4, 3.2 0.6 0.1, 3.2 Chemists, pharmacists, chemical engineers 0.6 0.4, 2.3 1.0 0.3, 3.6 0.0 0.0, Engineering, science technicians 0.6 0.3, 1.2 0.5 0.2, 1.1 Dentists, dental technicians 1.0 0.4, 3.0 0.6 0.2, 2.0 Physicians, surgeons 3.5 0.7, 17.6 2.2 0.2, 25.0 4.7 0.5, 42.7 Nurses, health technicians 1.3 0.8, 2.1 1.3 0.7, 2.3 1.3 0.6, 2.9 Teachers, librarians 0.8 0.5, 1.1 0.7 0.4, 1.1 1.0 0.6, 1.7 Legal and social service workers 1.1 0.6, 1.9 0.8 0.4, 1.6 1.8 0.7, 4.8 Entertainers, athletes 0.6 0.3, 1.0 0.6 0.3, 1.1 0.7 0.2, 2.2 Writers, journalists 0.8 0.3, 2.0 1.3 0.4, 3.8 0.2 0.0, 2.0 Artists 1.9 0.5, 6.5 4.2 0.5, 38.4 1.1 0.2, 5.4 Photographers, photo processors 0.5 0.1, 1.8 0.5 0.1, 2.7 0.4 0.0, 4.4 Printers 0.7 0.3, 1.9 1.0 0.3, 3.0 0.4 0.1, 2.2 Salesmen 0.7 0.6, 1.0 0.6 0.4, 0.8 1.2 0.8, 1.8 Clerks 0.6 0.5, 0.9 0.6 0.5, 0.9 0.8 0.5, 1.3 Shippers 1.1 0.7, 1.9 1.1 0.6, 1.8 1.6 0.5, 5.6 Messengers 1.0 0.6, 1.9 1.0 0.5, 1.9 1.1 0.2, 5.1 Electronic equipment operators 0.7 0.4, 1.0 0.6 0.4, 1.0 0.9 0.3, 2.8 Firemen 2.7 0.3, 26.1 0.0 0.0, Policemen, guards 0.7 0.3, 1.3 0.5 0.2, 1.2 1.1 0.3, 3.7 Armed forces 0.8 0.5, 1.2 0.7 0.5, 1.1 2.9 0.6, 14.9 Janitors 0.9 0.6, 1.5 0.6 0.4, 1.2 2.5 0.9, 7.2 Personal service workers 0.6 0.3, 1.0 0.6 0.4, 1.2 0.4 0.14, 1.5 Textile workers 1.2 0.5, 3.1 1.8 0.6, 5.4 0.3 0.0, 2.9 Food service workers 0.9 0.6, 1.2 0.8 0.6, 1.1 1.3 0.6, 2.9 Food processors 0.9 0.5, 1.6 1.1 0.6, 2.1 0.4 0.1, 1.5 Farm managers and workers 0.5 0.4, 0.8 0.5 0.3, 0.8 0.7 0.4, 1.4
Applied Epidemiologic AnalysisFall 2002
Gliomas Odd Ratios
Virtually no odds ratios were statistically significantly different from 1.0. Nevertheless, several were discussed. Is this sensible?