Inferences on Two-Way Contingency Tables

download Inferences on Two-Way Contingency Tables

of 45

Transcript of Inferences on Two-Way Contingency Tables

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    1/45

    INFERENCES ON TWO-WAY

    CONTINGENCY TABLES

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    2/45

    DIFFERENCE OF PROPORTIONS

    Suppose denote the (conditional) probability

    of success for row i. Then the difference of

    proportions ( ) compares the success

    probabilities in the two rows, i andj.

    Note:1 1

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    3/45

    DIFFERENCE OF PROPORTIONS

    estimates the true difference .

    =

    +

    +

    [Large Sample] %CI: [due to Walds]

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    4/45

    EXAMPLE # 1

    The following table is from a report on the relationship between

    aspirin use and myocardial infarction (heart attacks) by the

    Physicians Health Study Research Group at Harvard MedicalSchool. The Physicians Health Study was a five-yearrandomized study testing whether regular intake of aspirin

    reduces mortality from cardiovascular disease. Every other day,

    the male physicians participating in the study took either one

    aspirin tablet or a placebo. The study was blind thephysicians in the study did not know which type of pill theywere taking.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    5/45

    EXAMPLE # 1

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    6/45

    EXAMPLE # 1

    (a) Estimate the probability of suffering myocardial

    infarction (MI) for both placebo and aspirin groups.

    (b) Construct a 95% CI for the true difference of

    probabilities of heart attack between male physicians who

    took placebo and those who took aspirin. From this,

    determine if aspirin is effective in diminishing the risk of

    heart attack?

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    7/45

    RELATIVE RISK

    For 2-by-2 tables, the relative risk(RR) is the ratio

    = /

    where it can be any non-negative number. RR = 1.0 iff

    = .

    /estimates the true ratio (RR) /.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    8/45

    RELATIVE RISK

    The importance of RR is due to the importance ofdifferences of a certain fixed size when proportions of

    success (in all levels of ) are close to 0 or 1. That is, whilethe same difference was observed for (a) 0.010 and 0.001and (b) 0.410 and 0.401, (a) is more striking since thediscrepancy between the two proportions can be expressedas 10 times of the other. This goes to show that RR may

    give better interpretative meaning for public healthimplications, than relying on the differences of proportions

    alone (which may be misleading if i 0 or 1).

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    9/45

    RELATIVE RISK

    The sampling distribution of RR is highly skewed

    unless the sample sizes are quite large. Under which, anapproximate [large-sample due to Walds] 1

    100%CI for the true log RRis given by:

    /

    +

    +

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    10/45

    EXAMPLE # 2

    Refer to the aspirin use and myocardial infarction (heartattacks) study by the PhysiciansHealth Study Research

    Group at Harvard Medical School.(a) Estimate and interpret the RR of heart attackbetween male physicians who took placebo and thosewho took aspirin.

    (b) Construct a 95% CI for the true RR of heart attackbetween male physicians who took placebo and thosewho took aspirin.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    11/45

    ODDS RATIO

    For a probability of success , the odds(of success)

    are defined to be

    = /( )

    from which we can get

    = /( )

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    12/45

    ODDS RATIO

    For 2-by-2 tables, the odds ratio () is the ratio

    =

    =

    /

    /

    where it can be any non-negative number.

    Sample odds ratio () [through ML under multinomial

    assumption, or independent binomial assumption]:

    =

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    13/45

    ODDS RATIO

    and independent = .

    > . : higher success rate for row [Xlevel] 1

    < . : higher success rate for row [Xlevel] 2

    Values of farther from 1.0 in any direction represent strongerassociation between and.

    is orientation invariant (unlike RR).

    may be viewed as a cross-product ratio of joint probabilities ifinterdependence is desired.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    14/45

    ODDS RATIO

    The sampling distribution of is highly skewed unless

    the sample sizes are quite large. Under which, anapproximate [large-sample] 1 100% CI for the

    true log [which is symmetric about 0] is given by:

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    15/45

    ODDS RATIO

    If some cell counts (nij) are 0, then can either be 0 or ,

    or even undefined if both entries in a row or column are 0. To

    adjust for this, an amended estimator is given by

    =(. )( . )

    ( . )( . )

    i.e., an adjustment of 0.5 was made on each cell count (also

    applies for SE() for estimating a 1 100%CI).

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    16/45

    EXAMPLE # 3

    Refer to the aspirin use and myocardial infarction (heartattacks) study by the PhysiciansHealth Study Research

    Group at Harvard Medical School.(a) Estimate and interpret of heart attack betweenmale physicians who took placebo and those who tookaspirin.

    (b) Construct a 95% CI for the true of heart attackbetween male physicians who took placebo and thosewho took aspirin.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    17/45

    R E L A T I O N S H I P B E T W E E N

    O D D S R A T I O A N D R E L A T I V E R I S K

    =

    Hence, whenever direct estimation of RR is not

    possible, one can estimate instead, and use it to

    approximate RR, as long asand .

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    18/45

    ODDS RATIO AND

    CASE-CONTROL STUDIES

    In most case-control studies, marginal distribution of the

    response variable is usually fixed by the sampling design.

    With this being retrospective, one can construct conditional

    distributions for the explanatory variable, within levels of

    the response outcome of interest. In this case, only can

    be estimated due to its symmetric orientation (invariance).

    Thus, for relatively rare successes [usually rare diseases],

    RR is usually approximated by .

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    19/45

    TESTS OF IND EPENDENCE

    Consider

    :

    For a sample of size with cell counts *nij, the values *ij = nijare expected frequencies, i.e. *(nij)under which is true.

    To arrive at a decision, *nij is compared with *ij, such that for

    is true, *nij ij must be small, i.e. larger differences provide

    stringer evidences against .

    Test statistics used to make such comparisons have large-sample

    distributions.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    20/45

    TESTS OF IND EPENDENCE

    ()

    Mean:

    Variance:

    =

    (,)

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    21/45

    PEARSON STATISTIC

    =

    score statistic

    minimum at 0 if all nij = ij

    p-value: -

    * > for decent approximation

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    22/45

    LIKELIHOOD-RATIO STATISTIC

    =

    likelihood-ratio statistic [based on multinomial assumption]

    minimum at 0 if all nij = ij

    p-value:

    -

    * > for decent approximation

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    23/45

    TESTS OF IND EPENDENCE

    In two-way tables, the null hypothesis of statistical independence

    has the form

    : = ++

    : = ++

    Note: *is estimated by the estimated expected frequencies

    * =ni+n+j

    n

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    24/45

    TESTS OF IND EPENDENCE

    For testing independence in I x Jcontingency tables,

    the and statistics are used, with both having

    large-sample 2 distribution with degrees of

    freedom = ( )( ).

    converges in distribution more quickly than .

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    25/45

    TESTS OF IND EPENDENCE

    Recall:

    The degrees of freedom is obtained by taking the differencebetween the number of parameters [cell counts] under the alternative

    [for w/c there are IJ 1 non-redundant parameters] and null

    [for w/c there are (I 1)+(J 1) non-redundant parameters]

    hypotheses, i.e.,

    = = ( )( )

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    26/45

    EXAMPLE # 4

    The following table, from the 2000 General SocialSurvey, cross classifies gender and political party

    identification. Subjects indicated whether they identifiedmore strongly with the Democratic or Republican partyor as Independents. This also contains estimated

    expected frequencies for : Independence betweenGender and Political Party Identification.

    Determine if a significant association between genderand political party identification exists or not.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    27/45

    EXAMPLE # 4

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    28/45

    RESIDUALS FOR CEL LS

    A cell-by-cell comparison of observed and estimated

    frequencies help us better understand the nature of theevidence.

    However, it is rather insufficient to simply rely on the

    raw cell differences [due to the inherent

    magnitude of the counts].

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    29/45

    STANDARDIZED RESIDUAL

    + +

    follows a [large-sample] standard normal distribution under

    : (as compared to 0) evidence towards lack of fit of

    i.e., at a significance level , one expects 100% of the

    standardized residuals to be beyond 2 (or 3, if many cells) by chance

    alone under

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    30/45

    EXAMPLE # 5

    Refer to the gender and political party identification

    example. The following table shows the standardized

    residuals for testing independence in the previous

    example. Try to make sense of the computed standardized

    residuals in relation with the observed global result for

    testing independence between gender and political arty

    identification.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    31/45

    EXAMPLE # 5

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    32/45

    STANDARDIZED RESIDUALS

    Notice that residuals for the females are the negative

    of those of males. In general, the residuals in each

    column must sum up to 0 as the observed counts and the

    expected frequencies are constrained by the same row

    and column totals. In particular, for 2 x J tables,

    = ( )

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    33/45

    PARTITIONING

    Recall: Let and be independent 2RVs w/ degrees of

    freedom and 2, respectively. Then

    = ~+

    In essence, this enables one to separate/collapse rows or columns

    of I x Jtables to several sub-tables, and obtain 2or 2statistics for

    which the sum of each corresponding partitioned statistic is the

    globalstatistics.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    34/45

    PARTITIONING

    Consider: For a test of independence in a 2 x J table, a

    2

    statistic can be broken down intoJ

    1components: [1] thefirst two columns; [2] collapsing of the first two columns, then

    compared with the 3rd column; [3] collapsing of the first three

    columns, then compared with the 4th column, etc. until the Jth

    column is considered. In particular, this is true for .

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    35/45

    PARTITIONING

    While it might seem more natural to obtain statistic for each

    2 x 2pairing, note that the sum of these individual statistics willnot total the global.[Issues due to non-independence]

    has exact partitionings; does not (at least, algebraically).

    Nevertheless, partitioning 2 is valid for both statisticsas long as

    independence of partitions are met.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    36/45

    SOME COMMENTS ON TESTS

    These tests likewise require a very large sample size n

    relative to IJ. Moreover, converges poorly as compared

    to for very small sample sizes, i.e. for large IorJ,

    still provides decent approximation even if some expected

    frequencies are as small as 1.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    37/45

    SOME COMMENTS ON TESTS

    tests merely indicate the degree of evidence for an association;

    they do not give anything about the strength and the nature of the

    association.

    Both and are orientation invariant, i.e. they do not change

    values with reorderings of rows or columns. However, both are only

    powerful when associations regarding nominal variables are of concern.

    For ordinal, more powerful tests exist.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    38/45

    FISHERS EXACT TEST

    Recall:For 2 x 2 tables,independence = .

    Consider the cell counts { }. A small-sample nullprobability distribution for the cell counts that does not

    depend on unknown parameters results from considering the

    set of tables having the same row and column total. Under this

    condition, each * then have the hypergeometric

    distribution.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    39/45

    FISHERS EXACT TEST

    It is sufficient to know alone to determine all other cell

    counts. Under the null hypothesis of independence : = ,

    is hypergeometricwith

    =

    Hence, thep-valueequals the sum of hypergeometric probabilitiesfor outcomes at least as favorable to as the observed outcome.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    40/45

    EXAMPLE # 6

    In his 1935 book, The Design of Experiments, Fisher described

    the following experiment: When drinking tea, a colleague of

    Fishers at Rothamsted Experiment Station near London

    claimed she could distinguish whether milk or tea was added

    to the cup first. To test her claim, Fisher designed an

    experiment in which she tasted eight cups of tea. Four cups

    had milk added first, and the other four had tea added first.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    41/45

    EXAMPLE # 6

    She was told there were four cups of each type and she

    should try to select the four that had milk added first. The

    cups were presented to her in random order. The following

    table shows a potential result of the experiemtn. Perform a

    test to check whether there is evidence of a positive

    association between the true order of the pouring and her

    guess. Compute for the exact p-value of the test.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    42/45

    EXAMPLE # 6

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    43/45

    CONSERVATISM OF

    FISHERS EXACT TEST

    Being an exact test, the test is very conservative, i.e. the

    actual error rate when the null hypothesis of independence istrue is much smaller than the intended one. This is essentially

    true for one-sided alternatives. Hence, mid p-value is

    preferred as an alternative to diminish the conservativeness.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    44/45

    SMALL-SAMPLE

    CONFIDENC E INTERVAL FOR

    It is also possible to construct small-sample confidence

    intervals for odds ratio. The procedure involved is a

    generalization of Fishersexact test that tests an arbitrary value,

    : = . Hence, a % CI would then

    contain all values of for which the exact p-value of

    : = is greater than 0.05. This can also be constructed

    using mid p-value to preserve conservatism.

  • 8/12/2019 Inferences on Two-Way Contingency Tables

    45/45

    SMALL-SAMPLE

    CONFIDENC E INTERVAL FOR

    For the tea-taste experiment, a 95% CI for can be

    computed to be as follows:

    Exact p-value: (0.21 , 626.17)

    Mid p-value: (0.31 , 308.55)