nonparametric lecture.ppt
-
Upload
jsembiring -
Category
Documents
-
view
46 -
download
6
Transcript of nonparametric lecture.ppt
![Page 2: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/2.jpg)
Parametric Statistics 1
Assume data are drawn from samples with a certain distribution (usually normal)
Compute the likelihood that groups are related/unrelated or same/different given that underlying model
t-test, Pearson’s correlation, ANOVA…
![Page 3: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/3.jpg)
Parametric Statistics 2
Assumptions of Parametric statistics1. Observations are independent
2. Your data are normally distributed
3. Variances are equal across groups• Can be modified to cope with unequal ∂2
![Page 4: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/4.jpg)
Non-parametric Statistics?
Non-parametric statistics do not assume any underlying distribution
They estimate the distribution AND compute the probability that your groups are the related/the same or unrelated/different
![Page 5: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/5.jpg)
Nonparametric ≠ No parameters
Model structure is not specified a priori but is instead determined from data.
The data are parameterised by the analysis
AKA: “distribution free”
![Page 6: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/6.jpg)
Non-parametric Statistics
Assumptions of non-parametric statistics1. Observations are independent
![Page 7: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/7.jpg)
Non-parametric Statistics?
Non-parametric statistics do not assume any underlying distribution
Estimating or modeling this distribution reduces their power to detect effects…
So never use them unless you have to
![Page 8: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/8.jpg)
Why use a Non-parametric Statistic?
Very small samples (<20 replicates) High probability of violating the assumption of
normality Leads to spurious Type-1 (false alarm) errors
![Page 9: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/9.jpg)
Why use a Non-parametric Statistic?
Outliers more often lead to spurious Type-1 (false alarm) errors in parametric statistics.
Nonparametric statistics reduce data to an ordinal rank, which reduces the impact or leverage of outliers.
![Page 10: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/10.jpg)
Error Type-I error: False Alarm for a bogus effect
reject the null hypothesis when it is really true
Type-II error: Miss a real effect fail to reject our null hypothesis when it is really false
Type-III error: :-) lazy, incompetent, or willful ignorance of the truth
![Page 11: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/11.jpg)
Power
1-alpha
![Page 12: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/12.jpg)
Non-parametric ChoicesData type?
χ2
discrete
Question?
continuous
Number of groups?
Spearman’s Rank
association Different central value
Mann-Whitney UWilcoxon’s Rank Sums
Kruskal-Wallis test
two-groups more than 2
Brown-Forsythe
Difference in ∂2
![Page 13: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/13.jpg)
Non-parametric ChoicesData type?
χ2
discrete
Question?
continuous
Number of groups?
Spearman’s Rank
Like a Pearson’s R
Mann-Whitney UWilcoxon’s Rank Sums
Kruskal-Wallis test
two-groups more than 2Like ANOVA
Like Student’s t
No alternative
Different central value
Brown-Forsythe
Difference in ∂2
Like F-test
association
![Page 14: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/14.jpg)
Chi-Squared (Χ2) χ2 tests the null hypothesis that observed
events occur with an expected frequency in large samples frequencies are distributed as Χ2
e.g. Ho: “This six-sided dice is fair ” Expect all 6 outcomes to occur equally often
Assumptions Observations are independent Outcomes mutually exclusive Sample is not small
Small samples require exact test:, i.e., binomial test
![Page 15: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/15.jpg)
Chi-Squared Χ2 formula
Χ2 = the sum of each squared difference between the observed and expected frequencies divided its expected frequency
![Page 16: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/16.jpg)
Χ2 and contingency tables
Χ2 essentially tests if each cell in a contingency table has its expected value
In a 2-way table, this expectation will be the value of an adjacent cell
![Page 17: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/17.jpg)
Example: coin toss
Random sample of 100 coin tosses, of a coin believed to be fair
We observed number of 45 heads, and and 55 tails
Is the coin fair?
![Page 18: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/18.jpg)
Coin toss
If ho is true, our test statistic is drawn from a Χ2
distribution with df = 1
(45-50)2 + (55-50)2 = 0.5 + 0.5 = 1
50 50
Χ2(1) = 1, p > 0.3
![Page 19: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/19.jpg)
Coin toss Χ2 in R
chisq.test(c(45,55), p=c(.5,.5))
Chi-squared test for given probabilities Χ2 = 1, df = 1, p = 0.3173
![Page 20: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/20.jpg)
Spearman Rank test (ρ (rho)) Named after Charles Spearman,
Non-parametric measure of correlation Assesses how well an arbitrary monotonic
function describes the relationship between two variables,
Does not require the relationship be linear Does not require interval measurement
![Page 21: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/21.jpg)
Spearman Rank test (ρ (rho)) Mathematically, it is simply a Pearson’s r
computed on ranked data d = difference in rank of a given pair n = number of pairs
Alternative test = Kendall's Tau (Kendall's τ)
![Page 22: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/22.jpg)
Mann-Whitney U
AKA: “Wilcoxon rank-sum test Mann & Whitney, 1947; Wilcoxon, 1945
Non-parametric test for difference in the medians of two independent samples Assumptions:
• Samples are independent• Observations can be ranked (ordinal or better)
![Page 23: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/23.jpg)
Mann-Whitney U
U tests the difference in the medians of two independent samples
n1 = number of obs in sample 1
n2 = number of obs in sample 2 R = sum of ranks of the lower-ranked
sample
![Page 24: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/24.jpg)
Mann-Whitney U or t-test? Should you use it over the t-test?
Yes if you have a very small sample (<20)• (central limit assumptions not met)
Possibly if your data are inherently ordinal Otherwise, probably not.
It is less prone to type-I error (spurious significance) due to outliers.
But does not in fact handle comparisons of samples whose variances differ very well (Use unequal variance t-test with rank data)
![Page 25: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/25.jpg)
Aesop: Mann-Whitney U Example
Suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one hare in a race.
He decides to carry out a significance test to discover whether the results could be extended to tortoises and hares in general…
![Page 26: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/26.jpg)
Aesop 2: Mann-Whitney U He collects a sample of 6 tortoises and 6 hares,
and makes them all run his race. The order in which they reach the finishing post (their rank order) is as follows:
tort = c(1, 7, 8, 9, 10,11) hare = c(2, 3, 4, 5, 6, 12)
Original tortoise still goes at warp speed, original hare is still lazy, but the others run truer to stereotype.
![Page 27: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/27.jpg)
Aesop 3: Mann-Whitney U
wilcox.test(tort, hare) Wilcoxon = W = 25, p-value = 0.31
Tortoises are not faster (but neither are hares)
tort = c(1, 7, 8, 9, 10,11) (n2 = 6)
hare = c(2, 3, 4, 5, 6, 12) (n1 = 6, R1 =32)
![Page 28: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/28.jpg)
Aesop 4: Mann-Whitney U Wilcoxon = W = 25, p-value = 0.31
Tortoises are not faster (but neither are hares). Welch Two Sample t-test
t = 1.1355, df = 10, p-value = 0.28 Alternative hypothesis: true difference in means is
not equal to 0 95 percent confidence interval:
-2.25 ~ 6.91 sample estimates:
• mean of x = 7.6 mean of y = 5.3
![Page 29: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/29.jpg)
Power comparison with continuous normal data
tort = 1 74 79 81 100 121 hare = 4 9 16 17 18 144 Wilcoxon
W = 25, p = 0.31 t.test
t.test(tort, hare, var.equal = TRUE) t(10) = 1.5, p = 0.16
![Page 30: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/30.jpg)
Wilcoxon signed-rank test (related samples)
Same idea as MW U, generalized to matched samples
Equivalent to non-independent sample t-test
![Page 31: nonparametric lecture.ppt](https://reader035.fdocuments.net/reader035/viewer/2022081511/563dbb23550346aa9aaa8f6c/html5/thumbnails/31.jpg)
Kruskall-Wallis Non-parametric one-way analysis of variance
by ranks (named after William Kruskal and W. Allen Wallis)
tests equality of medians across groups. It is an extension of the Mann-Whitney U test to
3 or more groups. Does not assume a normal population, Assumes population variances among groups
are equal.