Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations....

47
Transformations. Non-parametric tests Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Monday 06.06.2011 1 / 40

Transcript of Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations....

Page 1: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Transformations. Non-parametrictests

Jon Michael GranDepartment of Biostatistics, UiO

MF9130 Introductory course in statisticsMonday 06.06.2011

1 / 40

Page 2: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Overview

Transformations. Non-parametric methods(Aalen chapter 8.8, Kirkwood and Sterne chapter 13 and 30.2)

• Logaritmic transformation• Choice of transformation• Non-parametric methods based on ranks (Wilcoxon signedrank test and Wilcoxon rank sum test)

2 / 40

Page 3: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Transformations

Motivation• Non-normality means that the standard methods cannot beused

• This includes linear regression, which is often a main tool ofanalysis → major problem

• For data that are skewed to the right, one can often uselog-transformations

• This does not only give normally distributed data; it may alsogive equal variances in different groups

3 / 40

Page 4: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Logaritmic transformation

• So what do we mean by doing a log transformation?• Take the logaritm of all your data values xi for all i ’s, and doyour analysis on the new dataset of ui ’s, where ui = ln(xi)

4 / 40

Page 5: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example: Ln transformation

• Skew distribution. Example of observations: 0.40, 0.96, 11.0

• Ln transformed distribution: ln(0.4) = −0.91,ln(0.96) = −0.04, ln(11) = 2.40

• Do analysis on the ln transformed data• In SPSS: transform → compute

5 / 40

Page 6: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example: Ln transformation

• Skew distribution. Example of observations: 0.40, 0.96, 11.0• Ln transformed distribution: ln(0.4) = −0.91,ln(0.96) = −0.04, ln(11) = 2.40

• Do analysis on the ln transformed data• In SPSS: transform → compute

5 / 40

Page 7: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example: Ln transformation

• Skew distribution. Example of observations: 0.40, 0.96, 11.0• Ln transformed distribution: ln(0.4) = −0.91,ln(0.96) = −0.04, ln(11) = 2.40

• Do analysis on the ln transformed data• In SPSS: transform → compute

5 / 40

Page 8: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Choice of transformations• The logarthmic transformation is by far the most frequentlyapplied

• Appropriate for removing positive skewness (data skew to theright)

• There are other types of transformation for data that arestronger or weeker skewed, or data skew to the left

6 / 40

Page 9: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Other transformations• Skewed to the right:

I Lognormal: Logarithmic (u = ln x)I More skewed than lognormal: Reciprocal (u = 1/x)I Less skewed than lognormal: Square root (u =

√x)

• Skew to the left:I Moderatly skewed: Square (u = x2)I More skewed: Cubic (u = x3)

• Non-linear relationship: Transform only one of the twovariables

7 / 40

Page 10: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Non-parametric statistics

Motivation• So what if transforming our data doesn’t help and it is stillnot normally distributed?

• Use non-parametric test

8 / 40

Page 11: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Non-parametric tests• In tests we have done so far, the null hypothesis has alwaysbeen a stochastic model with a few parameters. Models basedon for example:

I Normal distributionI T-distributionI Binomial distribution

• In nonparametric tests, the null hypothesis is not a parametricdistribution, rather a much larger class of possible distributions

9 / 40

Page 12: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Parametric methods we have seen• Estimation

I Confidence interval for µI Confidence interval for µ1 − µ2

• TestingI One sample T-testI Two sample T-test

• The methods are based on the assumption of normallydistributed data (or normally distributed mean)

10 / 40

Page 13: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Typical assumptions for parametric methods

1 Independence: All observations are independent. Achieved bytaking random samples of individuals; for paired t-testindependence is achieved by using the difference betweenmeasurements

2 Normally distributed data (Check: histograms, tests fornormal distribution, Q-Q plots)

3 Equal variance or standard deviations in the groups

11 / 40

Page 14: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

How to test for normality

• Visual methods, like histograms and Q-Q plots, are veryuseful

• Also have several statistical tests for normality:I Kolmogorov-Smirnov testI Shapiro-Wilk

• However, with enough data, visual methods are more useful!

12 / 40

Page 15: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Tests for normality

• Example of SPSS output from a Kolmogorov-Smirnov test:

(measurements of lung function, separated by gender)• According to this test, if the p-value is less than 0.05 the datacannot be considered as Normally distributed

• Note though that these tests are a bit problematic.They’re not very effective at discovering departures fromnormality. The power is low, so even if you don’t get asignificant result it’s a bit of a leap to assume normality

13 / 40

Page 16: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Tests for normality

• Example of SPSS output from a Kolmogorov-Smirnov test:

(measurements of lung function, separated by gender)• According to this test, if the p-value is less than 0.05 the datacannot be considered as Normally distributed

• Note though that these tests are a bit problematic.They’re not very effective at discovering departures fromnormality. The power is low, so even if you don’t get asignificant result it’s a bit of a leap to assume normality

13 / 40

Page 17: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Q-Q plots

• Graphical way of comparing two distributions, plotting theirquantiles against each other

• Q for quantile• If the distributions are similar, they Q-Q plot should be closeto a straight line

Q-Q Q-Q

Heavy tailed population Skew population

14 / 40

Page 18: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Histogram and Q-Q plot for Gender = 1

15 / 40

Page 19: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Histogram and Q-Q plot for Gender = 2

16 / 40

Page 20: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Non-parametric statistics

• The null hypothesis is for example that the median of thedistribution is zero

• A test statistic can be formulated, so thatI it has a known distribution under this hypothesisI it has more extreme values under alternative hypotheses

17 / 40

Page 21: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Non-parametric methods• Estimation

I Confidence interval for the median• Testing

I Paired data: Sign test and Wilcoxon signed rank testI Two independent samples: Mann-Whitney test/Wilcoxon

rank-sum test• Make (almost) no assumptions regarding distribution

18 / 40

Page 22: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example: Confidence interval for the median

• Beta-endorphin concentrations (pmol/l) in 11 individuals whocollapsed during a half-marathon (in increasing order): 66.071.2 83.0 83.6 101.0 107.6 122.0 143.0 160.0 177.0 414.0

• We find that the median is 107.6• What is the 95% confidence interval?

19 / 40

Page 23: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Confidence interval for the median in SPSS• Use ratio statistics, which is meant for the ratio between twovariables. Make a variable that has value 1 for all data (call itunit)

• Analyze → Descriptive statistics → RatiosNumerator: betae Denominator: unit

• Click Statistics and under Central tendency check Median andConfidence Intervals

20 / 40

Page 24: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

SPSS output

• 95% confidence interval for the median is (71.2, 177.0)

21 / 40

Page 25: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Non-parametric tests

• Most tests are based on rank-sums, and not the observedvalues

• The sums of the rank are assumed approximately normallydistributed, so we can use a normal approximation for thetests

• If two or more values are equal, the tests use a mean rank

22 / 40

Page 26: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

The sign test

• Assume the null hypothesis is that the median is zero• Given a sample from the distribution, there should be roughlythe same number of positive and negative values

• More precisely, number of positive values should follow abinomial distribution with probability 0.5

• When the sample is large, the binomial distribution can beapproximated with a normal distribution

• Sign test assumes independent observations, no assumptionsabout the distribution

• SPSS: Analyze → nonparametric tests → two related samples.Choose sign under test type

23 / 40

Page 27: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example: Energy intake kJ

• Want to test if energy intake is different before and aftermenstruation.

• H0: Median difference = 0H1: Median difference 6= 0

• All differences are positive

24 / 40

Page 28: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Using the sign test

• The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

• What is the probability of observing 11 positive differences?• Let P(x) count the number of positive signs• P(x) is Bin(n = 11,P = 0.5)• The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x ≥ 11) = P(x = 11) = 11!

0!(11−0)!0.50(1− 0.5)11 = 0.0005

• However, because of two-sided test, the p-value is 0.0005 · 2= 0.001

• Clear rejection of H0 on significance-level 0.05

25 / 40

Page 29: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Using the sign test

• The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

• What is the probability of observing 11 positive differences?

• Let P(x) count the number of positive signs• P(x) is Bin(n = 11,P = 0.5)• The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x ≥ 11) = P(x = 11) = 11!

0!(11−0)!0.50(1− 0.5)11 = 0.0005

• However, because of two-sided test, the p-value is 0.0005 · 2= 0.001

• Clear rejection of H0 on significance-level 0.05

25 / 40

Page 30: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Using the sign test

• The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

• What is the probability of observing 11 positive differences?• Let P(x) count the number of positive signs• P(x) is Bin(n = 11,P = 0.5)

• The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x ≥ 11) = P(x = 11) = 11!

0!(11−0)!0.50(1− 0.5)11 = 0.0005

• However, because of two-sided test, the p-value is 0.0005 · 2= 0.001

• Clear rejection of H0 on significance-level 0.05

25 / 40

Page 31: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Using the sign test

• The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

• What is the probability of observing 11 positive differences?• Let P(x) count the number of positive signs• P(x) is Bin(n = 11,P = 0.5)• The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x ≥ 11) = P(x = 11) = 11!

0!(11−0)!0.50(1− 0.5)11 = 0.0005

• However, because of two-sided test, the p-value is 0.0005 · 2= 0.001

• Clear rejection of H0 on significance-level 0.05

25 / 40

Page 32: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Using the sign test

• The number of positive differences should follow a binomialdistribution with P = 0.5 if H0: median difference betweenpremenst and postmenst is 0

• What is the probability of observing 11 positive differences?• Let P(x) count the number of positive signs• P(x) is Bin(n = 11,P = 0.5)• The p-value of the test is the probability of observing 11positives or something more extreme if H0 is true, henceP(x ≥ 11) = P(x = 11) = 11!

0!(11−0)!0.50(1− 0.5)11 = 0.0005

• However, because of two-sided test, the p-value is 0.0005 · 2= 0.001

• Clear rejection of H0 on significance-level 0.05

25 / 40

Page 33: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

In SPSS

• Find p-value = 0.000 using t-test, so p-value from sign-test isa bit larger

• This will often be the case

26 / 40

Page 34: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Wilcoxon signed rank test

• Takes into account the strength of the differences, not just ifthey are positive or negative

• Here, the null hypothesis is a symmetric distribution withzero median. Do as follows:

I Rank all values by their absolute values, from smallest tolargest Let T+ be the sum of ranks of the positive values, andT-corresponding for negative values

I Let T be the minimum of T+ and T-I Under the null hypothesis, T has a known distribution

• For large samples, the distribution can be approximated with anormal distribution

• Assumes independent observations and symmetricaldistributions

• SPSS: Analyze → nonparametric tests → two related samples.Choose Wilcoxon under test type

27 / 40

Page 35: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Energy intake example

28 / 40

Page 36: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

In SPSS

29 / 40

Page 37: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Sign test and Wilcoxon signed rank test

• Main difference: The sign test assume nothing about thedistribution, while the Wilcoxon signed rank test assumessymmetry.

• In other words: It is a little bit harder to get significant resultsusing the Sign test, because of lesser assumptions

• Often used on paired data

30 / 40

Page 38: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Wilcoxon rank sum test (or the Mann-Whitney U test)

• Not for paired data, but rather n1 values from group 1 andn2 values from group 2

• Assumptions: Independence within and between groups, samedistributions in both groups

• We want to test whether the values in the groups are samplesfrom different distributions:

I Rank all values as if they were from the same groupI Let T be the sum of the ranks of the values from group 1I Under the assumption that the values come from the same

distribution, the distribution of T is knownI The expectation and variance under the null hypothesis are

simple functions of n1 and n2

31 / 40

Page 39: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

• For large samples, we can use a normal approximation for thedistribution of T

• The Mann-Whitney U test gives exactly the same results, butuses slightly different test statistic

• In SPSS:I Analyze → nonparametric tests → two independent samples

testsI Test type: Mann-Whitney U

32 / 40

Page 40: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example 1• We have observed values

I Group 1: 1.3, 2.1, 1.5, 4.3, 3.2I Group 2: 3.4, 4.9, 6.3, 7.1

• Are the groups different?• If we assume that the values in the groups are normallydistributed, we can solve this using the T-test

• Otherwise we can try the normal approximation to theWilcoxon rank sum test

• Test H0: Groups have equal medians vs. H1: Groups havedifferent medians on 5% significance level

33 / 40

Page 41: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example 1 (cont.)

34 / 40

Page 42: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example 1 (cont.)

35 / 40

Page 43: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example 2

36 / 40

Page 44: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Example 2 (cont.)

• Wilcoxon Rank Sum Test in SPSS:

• We would have gotten exactly the same result by using the Tdistribution!

37 / 40

Page 45: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Some useful general comments

• Always start by checking whether it is appropriate to uset-tests (normally distributed data)

• If in doubt, a useful tip is to compare p-values from t-testswith p-values from non-parametric tests

• If similar results, stay with the parametric tests

38 / 40

Page 46: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Exercises

• Data files can be found on www.med.uio.no/imb/stat/kursfiler• They are called Children, oppg-9-2 and oppg-9-4

39 / 40

Page 47: Transformations. Non-parametric testsfolk.uio.no/jonmic/Statkurs/14 - Transformations. Non-parametric... · Transformations. Non-parametric tests JonMichaelGran Department of Biostatistics,

Summary

Key words

• Skew data• Logaritmic transformations (and other alternatives)• Testing for normality• Non-parametric tests:

I Sign test and Wilcoxon (paired data)I Mann-Whitney/Wilcoxon rank-sum test (two independent

samples)

40 / 40