Transformations. Non-parametric - Transformations. Non-parametric...  Transformations....

download Transformations. Non-parametric - Transformations. Non-parametric...  Transformations. Non-parametric

of 47

  • date post

    29-Nov-2018
  • Category

    Documents

  • view

    227
  • download

    2

Embed Size (px)

Transcript of Transformations. Non-parametric - Transformations. Non-parametric...  Transformations....

  • Transformations. Non-parametrictests

    Jon Michael GranDepartment of Biostatistics, UiO

    MF9130 Introductory course in statisticsMonday 06.06.2011

    1 / 40

  • Overview

    Transformations. Non-parametric methods(Aalen chapter 8.8, Kirkwood and Sterne chapter 13 and 30.2)

    Logaritmic transformation Choice of transformation Non-parametric methods based on ranks (Wilcoxon signedrank test and Wilcoxon rank sum test)

    2 / 40

  • Transformations

    Motivation Non-normality means that the standard methods cannot beused

    This includes linear regression, which is often a main tool ofanalysis major problem

    For data that are skewed to the right, one can often uselog-transformations

    This does not only give normally distributed data; it may alsogive equal variances in different groups

    3 / 40

  • Logaritmic transformation

    So what do we mean by doing a log transformation? Take the logaritm of all your data values xi for all i s, and doyour analysis on the new dataset of ui s, where ui = ln(xi)

    4 / 40

  • Example: Ln transformation

    Skew distribution. Example of observations: 0.40, 0.96, 11.0

    Ln transformed distribution: ln(0.4) = 0.91,ln(0.96) = 0.04, ln(11) = 2.40

    Do analysis on the ln transformed data In SPSS: transform compute

    5 / 40

  • Example: Ln transformation

    Skew distribution. Example of observations: 0.40, 0.96, 11.0 Ln transformed distribution: ln(0.4) = 0.91,ln(0.96) = 0.04, ln(11) = 2.40

    Do analysis on the ln transformed data In SPSS: transform compute

    5 / 40

  • Example: Ln transformation

    Skew distribution. Example of observations: 0.40, 0.96, 11.0 Ln transformed distribution: ln(0.4) = 0.91,ln(0.96) = 0.04, ln(11) = 2.40

    Do analysis on the ln transformed data In SPSS: transform compute

    5 / 40

  • Choice of transformations The logarthmic transformation is by far the most frequentlyapplied

    Appropriate for removing positive skewness (data skew to theright)

    There are other types of transformation for data that arestronger or weeker skewed, or data skew to the left

    6 / 40

  • Other transformations Skewed to the right:

    I Lognormal: Logarithmic (u = ln x)I More skewed than lognormal: Reciprocal (u = 1/x)I Less skewed than lognormal: Square root (u =

    x)

    Skew to the left:I Moderatly skewed: Square (u = x2)I More skewed: Cubic (u = x3)

    Non-linear relationship: Transform only one of the twovariables

    7 / 40

  • Non-parametric statistics

    Motivation So what if transforming our data doesnt help and it is stillnot normally distributed?

    Use non-parametric test

    8 / 40

  • Non-parametric tests In tests we have done so far, the null hypothesis has alwaysbeen a stochastic model with a few parameters. Models basedon for example:

    I Normal distributionI T-distributionI Binomial distribution

    In nonparametric tests, the null hypothesis is not a parametricdistribution, rather a much larger class of possible distributions

    9 / 40

  • Parametric methods we have seen Estimation

    I Confidence interval for I Confidence interval for 1 2

    TestingI One sample T-testI Two sample T-test

    The methods are based on the assumption of normallydistributed data (or normally distributed mean)

    10 / 40

  • Typical assumptions for parametric methods

    1 Independence: All observations are independent. Achieved bytaking random samples of individuals; for paired t-testindependence is achieved by using the difference betweenmeasurements

    2 Normally distributed data (Check: histograms, tests fornormal distribution, Q-Q plots)

    3 Equal variance or standard deviations in the groups

    11 / 40

  • How to test for normality

    Visual methods, like histograms and Q-Q plots, are veryuseful

    Also have several statistical tests for normality:I Kolmogorov-Smirnov testI Shapiro-Wilk

    However, with enough data, visual methods are more useful!

    12 / 40

  • Tests for normality

    Example of SPSS output from a Kolmogorov-Smirnov test:

    (measurements of lung function, separated by gender) According to this test, if the p-value is less than 0.05 the datacannot be considered as Normally distributed

    Note though that these tests are a bit problematic.Theyre not very effective at discovering departures fromnormality. The power is low, so even if you dont get asignificant result its a bit of a leap to assume normality

    13 / 40

  • Tests for normality

    Example of SPSS output from a Kolmogorov-Smirnov test:

    (measurements of lung function, separated by gender) According to this test, if the p-value is less than 0.05 the datacannot be considered as Normally distributed

    Note though that these tests are a bit problematic.Theyre not very effective at discovering departures fromnormality. The power is low, so even if you dont get asignificant result its a bit of a leap to assume normality

    13 / 40

  • Q-Q plots

    Graphical way of comparing two distributions, plotting theirquantiles against each other

    Q for quantile If the distributions are similar, they Q-Q plot should be closeto a straight line

    Q-Q Q-Q

    Heavy tailed population Skew population

    14 / 40

  • Histogram and Q-Q plot for Gender = 1

    15 / 40

  • Histogram and Q-Q plot for Gender = 2

    16 / 40

  • Non-parametric statistics

    The null hypothesis is for example that the median of thedistribution is zero

    A test statistic can be formulated, so thatI it has a known distribution under this hypothesisI it has more extreme values under alternative hypotheses

    17 / 40

  • Non-parametric methods Estimation

    I Confidence interval for the median Testing

    I Paired data: Sign test and Wilcoxon signed rank testI Two independent samples: Mann-Whitney test/Wilcoxon

    rank-sum test Make (almost) no assumptions regarding distribution

    18 / 40

  • Example: Confidence interval for the median

    Beta-endorphin concentrations (pmol/l) in 11 individuals whocollapsed during a half-marathon (in increasing order): 66.071.2 83.0 83.6 101.0 107.6 122.0 143.0 160.0 177.0 414.0

    We find that the median is 107.6 What is the 95% confidence interval?

    19 / 40

  • Confidence interval for the median in SPSS Use ratio statistics, which is meant for the ratio between twovariables. Make a variable that has value 1 for all data (call itunit)

    Analyze Descriptive statistics RatiosNumerator: betae Denominator: unit

    Click Statistics and under Central tendency check Median andConfidence Intervals

    20 / 40

  • SPSS output

    95% confidence interval for the median is (71.2, 177.0)

    21 / 40

  • Non-parametric tests

    Most tests are based on rank-sums, and not the observedvalues

    The sums of the rank are assumed approximately normallydistributed, so we can use a normal approximation for thetests

    If two or more values are equal, the tests use a mean rank

    22 / 40

  • The sign test

    Assume the null hypothesis is that the median is zero Given a sample from the distribution, there should be roughlythe same number of positive and negative values

    More precisely, number of positive values should follow abinomial distribution with probability 0.5

    When the sample is large, the binomial distribution can beapproximated with a normal distribution

    Sign test assumes independent observations, no assumptionsabout the distribution

    SPSS: Analyze nonparametric tests two related samples.Choose sign under test type

    23 / 40

  • Example: Energy intake kJ

    Want to test if energy intake is different before and aftermenstruation.

    H0: Median difference = 0H1: Median difference 6= 0

    All differences are positive

    24 / 40

  • Using the sign test

    The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

    What is the probability of observing 11 positive differences? Let P(x) count the number of positive signs P(x) is Bin(n = 11,P = 0.5) The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x 11) = P(x = 11) = 11!0!(110)!0.5

    0(1 0.5)11 = 0.0005 However, because of two-sided test, the p-value is 0.0005 2= 0.001

    Clear rejection of H0 on significance-level 0.05

    25 / 40

  • Using the sign test

    The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

    What is the probability of observing 11 positive differences?

    Let P(x) count the number of positive signs P(x) is Bin(n = 11,P = 0.5) The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x 11) = P(x = 11) = 11!0!(110)!0.5

    0(1 0.5)11 = 0.0005 However, because of two-sided test, the p-value is 0.0005 2= 0.001

    Clear rejection of H0 on significance-level 0.05

    25 / 40

  • Using the sign test

    The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

    What is the probability of observing 11 positive differences? Let P(x) count the number of positive signs P(x) is Bin(n = 11,P = 0.5)

    The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x 11) = P(x = 11) = 11!0!(110)!0.5

    0(1 0.5