7 Case-control Analysis Chihaya Hundout

download 7 Case-control Analysis Chihaya Hundout

of 47

Transcript of 7 Case-control Analysis Chihaya Hundout

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    1/47

    Analysis and presentation of

    Case-control study data

    Chihaya Koriyama

    February 14 (Lecture 1)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    2/47

    Study design in epidemiology

    Observationalstudy

    individual

    Case-control

    study

    Cohortstudy

    population

    Ecological

    study

    intervention

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    3/47

    Why case-control study?

    In a cohort study, you need a large number

    of the subjects to obtain a sufficient number

    of case, especially if you are interested in a

    rare disease. Gastric cancer incidence in Japanese male:

    128.5 / 100,000 person year

    A case-control study is more efficient in

    terms of study operation, time, and cost.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    4/47

    Comparison of the study design

    Case-control Cohort

    Rare diseases suitable not suitable

    Number of disease 1 1

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    5/47

    Case-control study- Sequence of determining exposure and outcome status

    Step1: Determine and select cases of

    your research interest

    Step2: Selection of appropriate controls

    Step3: Determine exposure status in

    both cases and controls

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    6/47

    Case ascertainment

    What is the definition of the case?

    Cancer (clinically? Pathologically?)

    Virus carriers (Asymptomatic patients)

    You need to screen the antibody

    Including deceased cases?

    You have to describe the following points,

    the definition

    when, where & how to select

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    7/47

    Who will be controls?

    Control non-case

    Controls are also at risk of the disease

    in his(her) future.

    Controls are expected to be arepresentative sample of the

    catchment population from which the

    case arise.

    In a case-control study of gastric

    cancer, a person who has received the

    gastrectomy cannot be a control since

    he never develop gastric cancer .

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    8/47

    a population-based case-control study

    Both cases and controls are recruited from the

    population.

    a case-control study nested in a cohort

    Both case and controls are members of the cohort.

    a hospital-based case-control studyBoth case and controls are patients who are

    hospitalized or outpatients.

    Controls with diseases associated with the exposure

    of interest should be avoided.

    Various types of case-control studies

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    9/47

    The following points should be

    recorded (described in your paper) The list (number) of eligible cases

    whose medical records unavailable

    The list (number) of refused subjects,if possible, with descriptions of the

    reasons of refusal

    The length of interview The list (number) of subjects lacking

    the measurement data, with

    descriptions of the reasons

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    10/47

    Exploratory or Analytic

    Exploratory case-control studies

    There is no specific a priori

    hypothesis about the relationship

    between exposure and outcome.

    Analytic case-control studies

    Analytic studies are designed to test

    specific a priori hypotheses aboutexposure and outcome.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    11/47

    Case-control study - information

    Sources of the information of exposure and

    potential confounding factors

    Existing records

    Questionnaires

    Face-to-face / telephone interviews

    Biological specimens

    Tissue banks Databases on biochemical and

    environmental measurements

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    12/47

    Temporality is essential in Hills criteria

    Disease

    onset

    Initial

    Symptoms

    Clinical

    Diagnosis

    The study exposure

    is unlikely to bealtered at this stage

    because of the

    disease.

    The study exposure

    is more likely to bealtered at this stage

    because of the

    symptoms.

    Essential Epidemiology (WA Oleckno)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    13/47

    Bias should be minimized

    Bias & Confounding

    Selection bias

    Detection bias

    Information bias (recall bias)

    Confounding

    Confounding can be controlledby statistical analyses but we

    can do nothing about bias after

    data collection.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    14/47

    Case-control studies

    are potential sources

    of many biases

    should be carefully

    designed, analyzed,

    and interpreted.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    15/47

    How can we solve the problem of

    confounding in a case-control study?

    Prevention at study design

    Limitation

    Matching in a cohort study But

    not in a case-control study

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    16/47

    Matching in a case-control study

    Matched by confounding

    factor(s) to increase the

    efficiency of statistical analysis

    Cannot control confounding

    A conditional logistic analysis is

    required.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    17/47

    Over matching

    Matched by factor(s) strongly

    related to the exposure which is

    your main interest

    CANNOT see the difference in

    the exposure status between

    cases and controls

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    18/47

    How can we solve the problem of

    confounding?

    Treatment at statistical analysis

    Stratification by a confounder

    Multivariate analysis

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    19/47

    What you should describe in the

    materials and methods,

    1. Study design

    2. Definition of eligible cases

    and controls

    Inclusion / exclusion criteria of

    cases and controls

    3. Number of the respondents

    and response rate4. Main exposure and other

    factors including potential

    confounding factors

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    20/47

    5. Sources of the information of

    exposure and other factors

    6. Matched factors, if any7. The number of subjects used

    in statistical analyses

    8. Statistical test(s) and model(s)

    9. Name and version of the

    statistical software

    What you should describe in the

    materials and methods,

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    21/47

    Assuring adequate study power

    Following information is necessary

    The confidence level desired (usually 95%

    corresponding to a p-value of 0.05)

    The level of power desired (80-95%) The ratio of controls to cases

    The expected frequency of the exposure in

    the control group

    The smallest odds ratio one would like to beable to detect (based on practical

    significance)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    22/47

    Statistical analysis

    Matched vs. Unmatched studiesThe procedures for analyzing the

    results of case-control studies

    differ depending on whether the

    cases and controls are matched orunmatched.

    Matched Unmatched

    McNemars test Chi-square test

    Conditional logistic Unconditional logistic

    regression analysis regression analysis

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    23/47

    Advantages of pair matching in case-

    control studies

    Assures comparability between cases and

    controls on the selected variables

    May simplify the selection of controls by

    eliminating the need to identify a randomsample

    Useful in small studies where obtaining cases

    and controls that are similar on potentially

    confounding factors may otherwise be difficult

    Can assure adequate numbers of subjects with

    specified characteristics so as to permit

    statistical comparisons Essential Epidemiology (WA Oleckno)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    24/47

    Disdvantages of pair matching in case-

    control studies

    May be difficult or costly to find a sufficientnumber of controls

    Eliminates the possibility of examining the effects

    of the matched variables on the outcome Can increase the difficulty or complexity of

    controlling for confounding by the remaining

    unmatched variables

    Overmatching

    Can result in a greater loss of data since a pair

    of subjects has to be eliminated even if ne

    subject is not responsive Essential Epidemiology (WA Oleckno)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    25/47

    Lung cancer Controlscases

    N=100 N=100

    Smokers (NOT recently started)

    70 40

    An example of unmatched case-control study

    Cases Controls

    smoker 70 40

    Non-smoker 30 60

    Odds ratio=

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    26/47

    Risk measure in a case-control study

    Odds = prevalence / (1 prevalence)Odds ratio = odds in cases / odds in controls

    Disease

    +case control+ a c

    Exposure b dExposure odds in cases a / bExposure odds in controlsc / dOdds ratio(a / b) / (c / d) a * d / b * c

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    27/47

    Lung cancer Matched controlsCases by sex & age

    N=100 N=100

    Smokers (NOT recently started)

    70 40

    An example of matched case-control study

    Case

    Smoker Non-smoker

    Control smoker 30 10Non-smoker 40 20

    Notice that this is the distribution of 100 matched pairs.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    28/47

    McNemars test

    Case

    Smoker Non-smoker

    Controlsmoker 30 10

    Non-smoker 40 20

    Chi-square (test) statistic

    = (40 10)2 / (40+10)

    = 18

    where degree of freedom is 1.

    Odds ratio = 40 / 10 = 4

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    29/47

    Logistic regression analysis

    Logistic regression is used to

    model the probability of a

    binary response as a function

    of a set of variables thought topossibly affect the response

    (called covariates).

    1: case (with the disease)

    Y =

    0: control (no disease)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    30/47

    One could imagine trying to fit a linear model

    (since this is the simplest model !) for the

    probabilities, but often this leads to problems:

    In a linear model, fitted probabilities can fall

    outside of 0 to 1. Because of this, linear models

    are seldom used to fit probabilities.

    Probability

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    31/47

    In a logistic regression analysis, the

    logit of the probability is modelled,

    rather than the probability itself.

    P = probability of getting disease

    plogit (p) = log

    1-p

    As always, we use the natural log. The logit

    is therefore the log odds,

    since odds = p / (1-p)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    32/47

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    33/47

    The values ofa and b will determine whether or

    not and how steeply the dose-response curve

    rises (or falls) and where it is centered.

    Ifb = 0 px is constant over x

    b > 0 px increases with xb < 0 px decreases with x

    H0: b= 0 is the null hypothesis in a test of trendwhen x is a continuous variable. Knowledge ofb

    would give us insight to the direction and degree

    of association outcome and exposure.

    e (a+bx)

    Px =

    1 + e(a+bx)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    34/47

    Simple logistic regression (with a dichotomous covariate)

    Suppose we are considering a case-control study

    where the response variable is disease (case) /

    non-disease (control) and the predictor variable is

    exposed / non-exposed, which we code as an

    indicator variable, or dummy variable.

    1 D1 1 E1

    Y = x =

    0 D0 0 E0

    And px = Prob (disease given exposure x)= P (Y = 1 | x) x = 0, 1

    Thus, p1 = probability of disease among exposed

    p0 = probability of disease among non-exposed

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    35/47

    In case of exposure (X=1): logit(PE1)=intercept + b

    In case of non-exposure (X=0): logit (PE0) =intercept

    If you want to obtain odds ratio of exposure group,

    ORPE1 / (1-PE1)/ (PE0 / (1-PE0))

    log(OR) = log {PE1 / (1-PE1)/ (PE0 / (1-PE0))}

    = log (PE1 / (1-PE1)) log(PE0 / (1-PE0))

    = logit (P for exposure) logit (P for non-exposure)= (intercept + b)intercept

    = b OR = e b

    Definition of odds ratio

    Si l l i ti i

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    36/47

    Simple logistic regression

    (with a covariate having more than two categories)

    Suppose we are considering a case-control study

    where the predictor variable is current smoker / ex-smoker / non-smoker, which we code as a dummy

    variable.

    Case Smokingstatus

    SMK1(X1)

    SMK2(X2)

    1 Current 1 0

    0 Ex-smoker 0 1

    1 Non-smoker 0 0

    1 Ex-smoker 0 1

    0 Non-smoker 0 0

    0 Non-smoker 0 0

    Original data Dummy variables

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    37/47

    Logistic regression model of the previous example

    logit (P) = a + b1(X1) + b2 (X2)

    In case of current smoker (X1=1, X2=0):

    logit(Pcurrent)= a + b1

    In case of ex-smoker (X1=0, X2=1) :

    logit(Pex)= a + b2

    In case of non-smoker (X1=0, X2=0) :logit(Pnon)= a

    ORcurrent = e b1

    ORex = e b2

    ORnon = 1 (referent)

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    38/47

    Walds test for no association

    The null hypothesis of no association betweenoutcome and exposure corresponds to

    H0: OR=1 or H0: b =logOR=0

    Using logistic regression results, we can testthis hypothesis using standard coefficients or

    Walds test.

    Note: STATA and SAS present two-sidedWalds test p-values.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    39/47

    Likelihood Ratio Test (LRT)

    An alternative way of testing hypotheses in alogistic regression model is with the use of a

    likelihood ratio test. The likelihood ratio test

    is specifically designed to test between

    nested hypotheses.

    H0: log (Px / (1-Px)) = a

    HA: log (Px / (1-Px)) = a + bxand we say that H0 is nested in HA.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    40/47

    Likelihood Ratio Test (LRT)

    In order to test H0 vs. HA, we compute the likelihood

    ratio test statistic:

    G= -2log(LH0 / LHA) = 2 (log LHA log LH0)

    = (-2log LH0) (-2log LHA)

    Where

    LHA is the maximized likelihood under the

    alternative hypothesis HA and

    LH0 is the maximized likelihood under the nullhypothesis H0.

    If the null hypothesis H0 were true, we would expect

    the likelihood ratio test statistic to be close to zero.

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    41/47

    Walds test vs. LRT

    In general, the LRT often works a little better thanthe Wald test, in that the test statistic more closely

    follows a X2 distribution under H0. But the Wald test

    often works very well and usually gives similar

    results.

    More importantly, the LRT can more easily be

    extended to multivariate hypothesis tests, e.g.,

    H0: b1 = b2 = 0 vs. HA: b1 = b2 = 0

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    42/47

    World J. Gastroenterology 2006

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    43/47

    216CASES

    173formalin-fixed

    paraffin-embeddedblocks

    We could not obtain the information

    on tumor location for 23 cases, and

    those cases were excluded from the

    tumor location specific analysis.

    81 cases were excluded

    7

    65

    91

    16

    REFUSED TOPARTICIPATE

    IN THE STUDY

    LIVED INVALLE DEL CAUCALESS THAN 5 YEARS

    RECURRENT CASES

    COULD NOCONTACT

    Recruitment of cases1

    2

    3

    4

    PATIENTSNEWLY

    DIAGNOSEDAS G.C.

    395

    Sep.2000Dec.2002

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    44/47

    431CONTROLS

    POTENTIALCONTROLS

    528

    67

    1

    29

    LIVED INVALLE DEL CAUCALESS THAN 5 YEARS

    REFUSED TO

    PARTICIPATE

    IN THE STUDY

    Histry of G.C.

    Recruitment of controls1

    2

    3

    Matched by sex, age (5-year ),hospital, date of

    administration

    Case: control= 1 : 2

    Major diseases of controls

    cardiovascular diseases 208 trauma 117 infectious diseases 38 urological disorders 21)

    | i

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    45/47

    xi:logistic casocon i.fumar

    i.fumar _Ifumar_0-2 (naturally coded; _Ifumar_0 omitted)

    Logistic regression Number of obs = 647

    LR chi2(2) = 4.24

    Prob > chi2 = 0.1198

    Log likelihood = -409.93333 Pseudo R2 = 0.0051

    ------------------------------------------------------------------------------------------------

    casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------------------------

    _Ifumar_1 | 1.479399 .2817549 2.06 0.040 1.018526 2.148813

    _Ifumar_2 | 1.205128 .2660901 0.85 0.398 .7817889 1.857706

    ------------------------------------------------------------------------------------------------

    | gastric cancer

    Smoking | 0 1 | Total

    -----------+----------------------+----------

    Never 0 | 188 78 | 266

    Ex- 1 | 145 89 | 234

    Current 2 | 98 49 | 147

    -----------+----------------------+----------

    Total | 431 216 | 647

    Walts test p values

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    46/47

    xi:clogit casocon i.fumar, group(identi) or

    Conditional (fixed-effects) logistic regression Number of obs = 647

    LR chi2(2) = 4.64

    Prob > chi2 = 0.0982

    Log likelihood = -234.5745 Pseudo R2 = 0.0098

    ---------------------------------------------------------------------------------------------------

    casocon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

    -------------+-------------------------------------------------------------------------------------

    _Ifumar_1 | 1.535023 .3061998 2.15 0.032 1.038295 2.269389

    _Ifumar_2 | 1.219851 .2784042 0.87 0.384 .7799 1.907985

    ---------------------------------------------------------------------------------------------------

    Walds test p values

    Fumar=0

    Fumar=1

    Fumar=2

    Results ofconditional logistic regression analysis using the same data

    Case Control OR (95%CI)

    Stata command

  • 7/28/2019 7 Case-control Analysis Chihaya Hundout

    47/47

    OR (95%CI)Lower Middle Upper(N=116)* (N=52)* (N=24)*

    cigarrete smokingnever 1.0 referent 1.0 referent 1.0 referent

    ex-smoker 1.9 (1.1 - 3.4) 1.2 (0.6 - 2.5) 3.7 (1.1 - 12.5)current 1.3 (0.7 - 2.3) 1.3 (0.5 - 3.4) 3.0 (0.6 - 13.9)

    P for trend 0.257 0.597 0.083P forheterogeneity 0.059 0.859 0.070

    GC risk by smoking in Cali, Colombia

    results of tumor-location specific analysis

    P= 0.51 P value by LRT

    This test examines the difference in the magnitude of the

    association between smoking and GC risk among 3 tumor sites.