Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

47
Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Transcript of Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Page 1: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Analysis and presentation of Case-control study data

Chihaya KoriyamaFebruary 14 (Lecture 1)

Page 2: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Study design in epidemiology

Page 3: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Why case-control study?

• In a cohort study, you need a large number of the subjects to obtain a sufficient number of case, especially if you are interested in a rare disease.– Gastric cancer incidence in Japanese male:

128.5 / 100,000 person year

• A case-control study is more efficient in terms of study operation, time, and cost.

Page 4: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Comparison of the study design

Case-control Cohort

Rare diseases suitable not suitableNumber of disease 1 1<Sample size relatively small need to be largeControl selection difficult easierStudy period relatively short longRecall bias yes noRisk difference no available available

Page 5: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Case-control study - Sequence of determining exposure and outcome status

• Step1: Determine and select cases of your research interest

• Step2: Selection of appropriate controls

• Step3: Determine exposure status in both cases and controls

Page 6: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Case ascertainment

• What is the definition of the case?– Cancer (clinically? Pathologically?)– Virus carriers (Asymptomatic patients)

→ You need to screen the antibody– Including deceased cases?

• You have to describe the following points,– the definition– when, where & how to select

Page 7: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Who will be controls?

• Control ≠ non-case– Controls are also at risk of the disease

in his(her) future.– “Controls” are expected to be a

representative sample of the catchment population from which the case arise.

– In a case-control study of gastric cancer, a person who has received the gastrectomy cannot be a control since he never develop gastric cancer .

Page 8: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

1) a population-based case-control studyBoth cases and controls are recruited from the

population.

2) a case-control study nested in a cohortBoth case and controls are members of the cohort.

3) a hospital-based case-control studyBoth case and controls are patients who are

hospitalized or outpatients.Controls with diseases associated with the exposure

of interest should be avoided.

Various types of case-control studies

Page 9: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

The following points should be recorded (described in your paper)

• The list (number) of eligible cases whose medical records unavailable

• The list (number) of refused subjects, if possible, with descriptions of the reasons of refusal

• The length of interview• The list (number) of subjects lacking

the measurement data, with descriptions of the reasons

Page 10: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Exploratory or Analytic

• Exploratory case-control studies– There is no specific a priori

hypothesis about the relationship between exposure and outcome.

• Analytic case-control studies– Analytic studies are designed to test

specific a priori hypotheses about exposure and outcome.

Page 11: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Case-control study - information

• Sources of the information of exposure and potential confounding factors– Existing records– Questionnaires– Face-to-face / telephone interviews– Biological specimens– Tissue banks– Databases on biochemical and

environmental measurements

Page 12: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Temporality is essential in Hill’s criteria

The study exposure is unlikely to be altered at this stage because of the disease.

The study exposure is more likely to be altered at this stage because of the symptoms.

Essential Epidemiology (WA Oleckno)

Page 13: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Bias should be minimized

• Bias & Confounding– Selection bias– Detection bias– Information bias (recall bias)– Confounding

Confounding can be controlled by statistical analyses but we can do nothing about bias after data collection.

Page 14: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Case-control studies ・・・

• are potential sources of many biases• should be carefully designed, analyzed, and interpreted.

Page 15: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

How can we solve the problem of confounding in a case-control study?

“Prevention” at study design LimitationMatching in a cohort

study But not in a case-control study

Page 16: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Matching in a case-control study

• Matched by confounding factor(s) to increase the efficiency of statistical analysis

• Cannot control confounding– A conditional logistic analysis is

required.

Page 17: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Over matching

• Matched by factor(s) strongly related to the exposure which is your main interest

– CANNOT see the difference in the exposure status between cases and controls

Page 18: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

How can we solve the problem of confounding?

“Treatment “ at statistical analysis

Stratification by a confounderMultivariate analysis

Page 19: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

What you should describe in the materials and methods,

1. Study design

2. Definition of eligible cases and controls

– Inclusion / exclusion criteria of cases and controls

3. Number of the respondents and response rate

4. Main exposure and other factors including potential confounding factors

Page 20: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

5. Sources of the information of exposure and other factors

6. Matched factors, if any

7. The number of subjects used in statistical analyses

8. Statistical test(s) and model(s)

9. Name and version of the statistical software

What you should describe in the materials and methods,

Page 21: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Assuring adequate study power

• Following information is necessary– The confidence level desired (usually 95%

corresponding to a p-value of 0.05)– The level of power desired (80-95%)– The ratio of controls to cases– The expected frequency of the exposure in

the control group– The smallest odds ratio one would like to be

able to detect (based on practical significance)

Page 22: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Statistical analysis “Matched” vs. “Unmatched” studiesThe procedures for analyzing the

results of case-control studies differ depending on whether the cases and controls are matched or unmatched.

Matched Unmatched・ McNemar’s test ・ Chi-square test・ Conditional logistic ・ Unconditional logistic regression analysis regression analysis

Page 23: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Advantages of pair matching in case-control studies

• Assures comparability between cases and controls on the selected variables

• May simplify the selection of controls by eliminating the need to identify a random sample

• Useful in small studies where obtaining cases and controls that are similar on potentially confounding factors may otherwise be difficult

• Can assure adequate numbers of subjects with specified characteristics so as to permit statistical comparisons Essential Epidemiology (WA Oleckno)

Page 24: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Disdvantages of pair matching in case-control studies

• May be difficult or costly to find a sufficient number of controls

• Eliminates the possibility of examining the effects of the matched variables on the outcome

• Can increase the difficulty or complexity of controlling for confounding by the remaining unmatched variables

• Overmatching• Can result in a greater loss of data since a pair

of subjects has to be eliminated even if ne subject is not responsive Essential Epidemiology (WA Oleckno)

Page 25: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Lung cancer ControlscasesN=100 N=100Smokers (NOT recently started) ↓ ↓

70 40 

An example of unmatched case-control study

Cases Controls

smoker 70 40

Non-smoker 30 60

Odds ratio=

Page 26: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Risk measure in a case-control study

Odds = prevalence / (1 - prevalence)

Odds ratio = odds in cases / odds in controls Disease+ ( case ) -( control

)+ a c

Exposure - b d

Exposure odds in cases = a / bExposure odds in controls = c / dOdds ratio = (a / b) / (c / d) = a * d / b * c

Page 27: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Lung cancer Matched controlsCases by sex & ageN=100 N=100Smokers (NOT recently started) ↓ ↓ 70 40 

An example of matched case-control study

Case

Smoker Non-smoker

Control smoker 30 10

Non-smoker 40 20Notice that this is the distribution of 100 matched pairs.

Page 28: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

McNemar’s test

Case

Smoker Non-smoker

Control smoker 30 10

Non-smoker 40 20Chi-square (test) statistic

= (40 – 10)2 / (40+10)= 18

where degree of freedom is “1”.

Odds ratio = 40 / 10 = 4

Page 29: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Logistic regression analysis

• Logistic regression is used to model the probability of a binary response as a function of a set of variables thought to possibly affect the response (called covariates).

1: case (with the disease)

Y =

0: control (no disease)

Page 30: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

One could imagine trying to fit a linear model (since this is the simplest model !) for the probabilities, but often this leads to problems:

In a linear model, fitted probabilities can fall outside of 0 to 1. Because of this, linear models are seldom used to fit probabilities.

Probability

Page 31: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

In a logistic regression analysis, the logit of the probability is modelled, rather than the probability itself.

P = probability of getting disease

p

logit (p) = log

1-p

As always, we use the natural log. The logit is therefore the log odds,

since odds = p / (1-p)

Page 32: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Simple logistic regression (with a continuous covariate)

Suppose we give each of several beetles some dose of a potential toxic agent (x=dose), and we observe whether the beetle dies (Y=1) or lives (Y=0). One of the simplest models we can consider is to assume that the relationship of the logit of the probability of death and the dose is linear, i.e.,

px

logit (px) = log = + x 1 – px

where px = probability of death for a given dose x, and and are unknown parameters to be estimated from the data.

Page 33: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

The values of and will determine whether or not and how steeply the dose-response curve rises (or falls) and where it is centered.

If = 0 px is constant over x

> 0 px increases with x

< 0 px decreases with x

H0: = 0 is the null hypothesis in a “test of trend” when x is a continuous variable. Knowledge of would give us insight to the direction and degree of association outcome and exposure.

e (+x)

Px = 1 + e (+x)

Page 34: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Simple logistic regression (with a dichotomous covariate)

Suppose we are considering a case-control study where the response variable is disease (case) / non-disease (control) and the predictor variable is exposed / non-exposed, which we “code” as an indicator variable, or dummy variable.

1 D1 1 E1

Y = x =0 D0 0 E0

And px = Prob (disease given exposure x)

= P (Y = 1 | x) x = 0, 1

Thus, p1 = probability of disease among exposed

p0 = probability of disease among non-exposed

Page 35: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

In case of exposure (X=1): logit(PE1)=intercept + In case of non-exposure (X=0): logit (PE0) =intercept

 

If you want to obtain odds ratio of exposure group, 

 OR =( PE1 / (1-PE1) ) / (PE0 / (1-PE0))

log(OR) = log { ( PE1 / (1-PE1) ) / (PE0 / (1-PE0))}

= log (PE1 / (1-PE1)) – log(PE0 / (1-PE0))

= logit (P for exposure) – logit (P for non-

exposure)

= (intercept + ) – intercept

= OR = e

Definition of odds ratio

Page 36: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Simple logistic regression (with a covariate having more than two categories)

Suppose we are considering a case-control study where the predictor variable is current smoker / ex-smoker / non-smoker, which we “code” as a dummy variable.

Case Smoking status

SMK1(X1)

SMK2(X2)

1 Current 1 0

0 Ex-smoker 0 1

1 Non-smoker 0 0

1 Ex-smoker 0 1

0 Non-smoker 0 0

0 Non-smoker 0 0

Original data Dummy variables

Page 37: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Logistic regression model of the previous example

logit (P) = + 1(X1) + 2 (X2)

In case of current smoker (X1=1, X2=0): logit(Pcurrent)= +

In case of ex-smoker (X1=0, X2=1) : logit(Pex)= +

In case of non-smoker (X1=0, X2=0) :

logit(Pnon)=

ORcurrent = e

ORex = e

ORnon = 1 (referent)

Page 38: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Wald’s test for no associationThe null hypothesis of no association between

outcome and exposure corresponds to

H0: OR=1 or H0: =logOR=0

Using logistic regression results, we can test this hypothesis using standard coefficients or Wald’s test.

Note: STATA and SAS present two-sided Wald’s test p-values.

Page 39: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Likelihood Ratio Test (LRT)An alternative way of testing hypotheses in a

logistic regression model is with the use of a likelihood ratio test. The likelihood ratio test is specifically designed to test between nested hypotheses.

H0: log (Px / (1-Px)) = HA: log (Px / (1-Px)) = + x

and we say that H0 is nested in HA.

Page 40: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Likelihood Ratio Test (LRT)In order to test H0 vs. HA, we compute the likelihood

ratio test statistic:

G= -2 ・ log(LH0 / LHA) = 2 (log LHA – log LH0)

= (-2log LH0) – (-2log LHA)

Where

LHA is the maximized likelihood under the alternative hypothesis HA and

LH0 is the maximized likelihood under the null hypothesis H0.

If the null hypothesis H0 were true, we would expect the likelihood ratio test statistic to be close to zero.

Page 41: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

Wald’s test vs. LRT•In general, the LRT often works a little better than the Wald test, in that the test statistic more closely follows a X2 distribution under H0. But the Wald test often works very well and usually gives similar results.

•More importantly, the LRT can more easily be extended to multivariate hypothesis tests, e.g.,

H0: 1 = 2 = 0 vs. HA: 1 = 2 = 0

Page 42: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

World J. Gastroenterology 2006

Page 43: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

216 CASES

173 formalin-fixed

paraffin-embedded blocks

We could not obtain the information on tumor location for 23 cases, and those cases were excluded from the tumor location specific analysis.

81 cases were excluded

7

65

91

16

REFUSED TO PARTICIPATE

IN THE STUDY

LIVED INVALLE DEL CAUCA

LESS THAN 5 YEARS

RECURRENT CASES

COULD NO CONTACT

Recruitment of cases

12

3

4

PATIENTS NEWLY

DIAGNOSEDAS G.C.

395

Sep.2000 ~Dec.2002

Page 44: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

431CONTROLS

POTENTIAL CONTROLS

528

67

1

29

LIVED INVALLE DEL CAUCA

LESS THAN 5 YEARS

REFUSED TO PARTICIPATE

IN THE STUDY

Histry of G.C.

Recruitment of controls1

2

3

Matched by sex, age (5-year ), hospital, date of administration

Case: control= 1 : 2

Major diseases of controls cardiovascular diseases ( 208 ) trauma      

( 117 ) infectious diseases  ( 3

8 ) urological disorders  ( 21)

Page 45: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

xi:logistic casocon i.fumari.fumar _Ifumar_0-2 (naturally coded; _Ifumar_0 omitted)

Logistic regression Number of obs = 647 LR chi2(2) = 4.24 Prob > chi2 = 0.1198Log likelihood = -409.93333 Pseudo R2 = 0.0051

------------------------------------------------------------------------------------------------ casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------------------------- _Ifumar_1 | 1.479399 .2817549 2.06 0.040 1.018526 2.148813 _Ifumar_2 | 1.205128 .2660901 0.85 0.398 .7817889 1.857706------------------------------------------------------------------------------------------------

| gastric cancerSmoking | 0 1 | Total-----------+----------------------+---------- Never 0 | 188 78 | 266 Ex- 1 | 145 89 | 234 Current 2 | 98 49 | 147 -----------+----------------------+---------- Total | 431 216 | 647

Walt’s test p values

Page 46: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

xi:clogit casocon i.fumar, group(identi) or

Conditional (fixed-effects) logistic regression Number of obs = 647 LR chi2(2) = 4.64 Prob > chi2 = 0.0982Log likelihood = -234.5745 Pseudo R2 = 0.0098--------------------------------------------------------------------------------------------------- casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+------------------------------------------------------------------------------------- _Ifumar_1 | 1.535023 .3061998 2.15 0.032 1.038295 2.269389 _Ifumar_2 | 1.219851 .2784042 0.87 0.384 .7799 1.907985---------------------------------------------------------------------------------------------------

Wald’s test p values

Fumar=0

Fumar=1

Fumar=2

Results of conditional logistic regression analysis using the same data

Case Control OR (95%CI)

Stata command

Page 47: Analysis and presentation of Case-control study data Chihaya Koriyama February 14 (Lecture 1)

OR (95%CI) Lower Middle Upper (N=116)* (N=52)* (N=24)* cigarrete smoking never 1.0 referent 1.0 referent 1.0 referent

ex-smoker 1.9 (1.1 - 3.4) 1.2 (0.6 - 2.5) 3.7 (1.1 - 12.5) current 1.3 (0.7 - 2.3) 1.3 (0.5 - 3.4) 3.0 (0.6 - 13.9)

P for trend 0.257 0.597 0.083 P for heterogeneity 0.059 0.859 0.070

GC risk by smoking in Cali, Colombia results of tumor-location specific analysis

P = 0.51 P value by LRT

This test examines the difference in the magnitude of the association between smoking and GC risk among 3 tumor sites.