Chapter 5 Nonparametric Post Hoc Test -...
Transcript of Chapter 5 Nonparametric Post Hoc Test -...
![Page 1: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/1.jpg)
Chapter 5Nonparametric Post
Hoc Test
![Page 2: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/2.jpg)
121
When data consist of one nominal variable and one measurement variable,
usually one way ANOVA is used but when the measurement variable does not meet
the normality assumption of a one-way ANOVA then parametric method is not
applicable and when original data set actually consists of one nominal variable and
one ranked variable; we cannot apply ANOVA.The nonparametric techniques which
have been developed for k sample problem require no assumptions beyond continuous
populations and therefore it is applicable under any circumstances.
One of the assumptions of the parametric analysis is that the variability is
approximately the same across all groups. If this assumption does not hold then
researcher should first try to transform the response variable, perhaps using a log or
square root transformation. Hopefully this will be stabilizing the variance across the
groups. However, in certain situations none of the transformation resolves this
problem. In this situation the researcher should consider using a non parametric test
(Newman, 1995).
Non parametric tests are simple and easy to understand. No assumptions are
made regarding the parent population. If the normality assumption is violate or the
sample sizes from each of the k populations are too small to assess normality, Kruskal
Wallis (KW) test is used to compare the distribution of different populations.
Kruskal Wallis (KW) test is the non parametric equivalent to the omnibus F
test in a one way ANOVA (which is used with matrix dependent variable). KW test is
used when the dependent variable consist of ranks. It tests the null hypothesis that the
location of each group is the same in the population. If the null hypothesis is rejected,
then at least one of the locations is different from the others. When the KW test is
significant, perform follow up pair wise tests.
![Page 3: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/3.jpg)
122
It is important to realize that the Kruskal-Wallis H test is an omnibus test
statistic that enables to test the general hypothesis that all population medians are
equal but cannot tell which specific groups of independent variable are statistically
significantly different from each other; it only tells that at least two groups are
different. But the researcher is not just interested in this general hypothesis but in
comparisons amongst the individual groups. Since we may have three, four, five or
more groups in our study design, determining which of these groups differ from each
other is important, then post hoc test is used.
There are two ways to apply nonparametric post hoc procedures, the first
being to use Mann–Whitney tests. However, if we use lots of Mann Whitney tests,
Type I error rate will inflate, therefore not preferable. However, if we want to use lots
of Mann–Whitney tests to follow up a Kruskal–Wallis test, we can if we make some
kind of adjustment to ensure that the type I errors don’t build up to more than .05. The
easiest method is to use a Bonferroni correction, which in its simplest form just means
that instead of using .05 as the critical value for significance for each test, you use a
critical value of .05 divided by the number of tests conducted.
For a long time investigators have been testing comparisons, after a KW test,
by applying Wilcoxon test or the Mann Whitney test. This procedure is equivalent to
performing simple t test among treatment means following an ANOVA. Like t test,
doing multiple Wilcoxon test probably errors on the side of leniency because no
protection is provided for the number of comparisons. This multiple comparison
situation has improved recently with the emergence of a number of procedures i.e.
Nemenyi, 1963; Dunn 1964; Dunn control 1964; Steel Dwass 1960; Steel control
1959. The other way to apply non parametric post hoc test with adjusted p value such
as Bonferroni, Holm, Hochberg, Hommel, Holland, Rom.
![Page 4: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/4.jpg)
123
This chapter is divided in to 2 sections.
1. Nonparametric Post Hoc test. (Discussed in section 5.1)
2. Nonparametric Post Hoc test with adjusted P value. (Discussed in section 5.2)
Figure 4: Non Parametric Conservative Post Hoc Tests
Non Parametric Conservative Post HocTests
SS
Bonferroni
Nemenyi
Dunn Pairwise
Dunn control
Steel Dwass
Steel Control
123
This chapter is divided in to 2 sections.
1. Nonparametric Post Hoc test. (Discussed in section 5.1)
2. Nonparametric Post Hoc test with adjusted P value. (Discussed in section 5.2)
Figure 4: Non Parametric Conservative Post Hoc Tests
Non Parametric Conservative Post HocTests
SU
Hommel
Hochberg
Rom
Holm
Holland & Copenhaver
123
This chapter is divided in to 2 sections.
1. Nonparametric Post Hoc test. (Discussed in section 5.1)
2. Nonparametric Post Hoc test with adjusted P value. (Discussed in section 5.2)
Figure 4: Non Parametric Conservative Post Hoc Tests
SD
Holm
Holland & Copenhaver
![Page 5: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/5.jpg)
124
5.1: Nonparametric Post Hoc Test
5.1.1.: Introduction:
This chapter is devoted to multiple comparison procedures that are distribution
free (also referred to as non parametric) in the sense that their type I error rates don’t
depend on distributions that generate the sample data. In the context of testing of
hypotheses, this means that the marginal or also the joint null distributions of the test
statistics do not depend on the underlying distributions of the observations.
There are two broad groups of nonparametric UMCP (unplanned multiple
comparison procedure) for pair wise comparisons that use two quite different
approaches. One group uses joint ranking i.e. each pair wise comparison is based on
the rank for all k treatments in the study. The result of the comparison of each pair of
treatments depends on the data from the other k-2 treatments, a situation not found in
any of the commonly used parametric UMCP.
The other group uses pair wise ranking, i.e. re ranking the data for each pair
treatments being compared. The test for each pair of treatments does not depend on
the other treatments in the study, as it always the situation in parametric procedures.
This group usually calculates the maximum or minimum rank sum and uses the
wilcoxon or Mann Whitney U statistics.
Difference between pair wise ranking and joint ranking
Test statistics computed from joint rankings do not yield testing families,
while those computed from separate rankings do. The lack of the testing family
property also implies that single step test procedures based on joint rankings do not
have corresponding confidence analogs. Lehmann (1975) has noted that joint rankings
may provide more information than separate rankings in location problems. He has
also noted a lack of transitivity that can arise with separate rankings in which, say,
![Page 6: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/6.jpg)
125
treatment 1 can be declared better than treatment 2, and treatment 2 can be declared
better than treatment 3, but treatment 1 cannot be declared better than treatment 3.
Such intransitivity cannot arise with joint rankings. Despite these limitations separate
rankings are still generally preferred in practice (Hochberg & Tamhane, 1987).
Both methods of assigning ranks pair wise and jointly have well known
drawbacks (Lehmann 2006, Miller 1981). When observations are ranked in a pair
wise fashion, an inconsistency known as cycling can arise where treatment j is
declared superior to treatment i and treatment k superior to j, but without k being
superior to i. When observations are ranked jointly or within blocks, the significance
of a comparison between a pair of treatments depends upon the observations from
treatments not involved in the comparison. Thus, results may change depending upon
the number of treatments being considered. This type of inconsistency is known as the
problem of irrelevant alternatives.
This section is divided into 2 sub sections. In the 1st sub section theoretical
aspects are explained and in the 2nd sub section simulations study is carried out.
5.1.1.1 Nemenyi Test (1963):
This method is analogous to Tukey test and is known as the Nemenyi joint
rank test. It is a nonparametric multiple median comparison test. Like the Wilcoxon
multiple comparison test, it is used to compare the sample groups when the data is
measured on at least an ordinal scale and when the sample size is the same in each of
the group. Nemenyi proposed a test that originally based on rank sums. This method
control inflation of the FWE (Israel, 2008).
Assumptions:
1. Measurement of variables should be at least an ordinal scale.
![Page 7: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/7.jpg)
126
2. There should be an equal sample size for all groups.
Test statistics:
12
)1)((
nknkn
RRq ji
cal
(5.1.1.1.1)
Where,
Ri= Sum of the joint rank from the ith group.
Rj= Sum of the joint rank from the jth group.
n=number of observations in a group.
k= Total number of groups.
Critical Value:
(5.1.1.1.2).
The critical value in this test is known as studentized range, abbreviated q and is
depends upon α (significance level), and k (the total number of groups).
Decision procedure:
Reject the null hypothesis if qcal ≥ qcritical; do not reject H0 otherwise.
Advantages:
1. This test is protected.
2. There is no restriction in the number of groups to be compared with each other
(Israel, 2008).
Disadvantages:
1. This test becomes extremely conservative as the gap between groups increases,
because the joint ranking of a group of treatments with other very different
treatments reduces the relative differences between rank sums within a group.
2. It has very low power in sub groups.
kcritical qq ,,
![Page 8: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/8.jpg)
127
5.1.1.2. Dunn Test (1964):
Dunn (1964) proposed a single step procedure that is based on joint rankings
of observations from all the treatments.When we have unequal sample sizes, we can
no longer use ranks sums. Rather, we must use rank means, since they are adjusted for
sample size. This procedure is called Dunn’s Test. If researcher is interested in
comparing the location parameters of k experimental groups simultaneously and
preserving the FWE, this method is used.
It tests whether pairs of median are equal using a rank test. The error rate is
adjusted on a comparison wise basis to give the family error rate, αFWE. Instead of
using means, it uses average ranks. It is used for all pair wise comparisons. In this test
we compare mean ranks, not sums of ranks that are arranged in order of magnitude
(Dmitrienko, A. et al 2007).
Family wise error which represents a conservative approach in making
multiple comparisons holds the probability of making only correct decisions at 1-α
when null hypothesis of no difference among populations is true. This approach
protects well against error when H0 is true, but it makes more difficult task of
detecting differences that are significant when the null hypothesis is false.
Assumptions:
1. It is protected test.
2. Sample sizes of at least five (but preferably larger) for each treatment are
recommended.
Test Statistics:
SE
RRQ
ji
cal
(5.1.1.2.1)
![Page 9: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/9.jpg)
128
For that, first combine the data, rank it, find the groups mean ranks and then take the
standardized absolute differences of these average ranks.
For equal sample size For unequal sample size
6)1( NkSE
ji nn
NNSE
11
12
)1(
If tied ranks are present,
)1(6
)()1(1
32
NN
ttNNk
SE
m
ii
If tied ranks are present,
)1(12
11)()1(
1
32
N
nnttNN
SEji
m
iii
Where,
iR is the mean of the joint ranks for the ith group.
i
ii n
RR
i=1,2,….k j=1,2,…k i≠j
jRis the mean of the joint ranks for the jth group.
ni=the number of observations for the ith treatment.
nj=the number of observations for the jth treatment.
N is the total number of observations in all groups. N=∑ni
K is the total number of groups.
ti is the number of ties in the ith group of ties
m is the number of groups of tie ranks.
Critical Value:
)1(*
kk
Zz
(5.1.1.2.2)
![Page 10: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/10.jpg)
129
The quantile α is called the FWE or the overall significance level, which is the
probability of at least one erroneous rejection among the k(k-1)/2 pair wise
comparisons.
When making multiple comparisons with a FWE, we usually select a value of α larger
than those customarily encountered in single comparison inference procedures. For
example, 0.15, 0.20 or perhaps 0.25, depending on the size of k. Choose a high
significance level, say, 10 per cent, 15 per cent, 20 per cent, or even 25 per cent
recommended by Dunn (Neave & Worthington, 1988).
Decision Procedure:
Reject the null hypothesis if Qcal ≥ z* ; do not reject H0 otherwise.
Advantages:
1. This test is very flexible as it takes into account ties (Israel, 2008).
2. This test is useful for comparing groups with very small sample size (Israel,
2008). Relatively small total sample sizes may be analyzed with this technique,
i.e. three groups with five experimental units or more than 3 groups with 4 units
(Lehman, 1975).
3. The symmetry assumption, which is often difficult to assess in drug discovery
settings with small sample sizes, may be relaxed or ignored (Dmitrienko, A et al
2007).
4. It is useful for evaluating a few priori comparisons from a large set of possible
comparisons (Edward, 1971).
5. Equal sample sizes are not required.
![Page 11: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/11.jpg)
130
6. This procedure is more powerful for detecting differences between extreme
treatments when there are intermediate treatments present.
Disadvantages:
1. It uses a Bonferroni like correction to the FEW so it might be a too conservative.
2. This method is little conservative for pair wise testing (Edward, 1971). It is overly
conservative on Type I error, so it has very weak power.
3. This method employs joint ranking, thus the comparison of two groups is highly
influenced by the behavior of other groups in the experiment as the data are
initially ranked over the entire experiment.
4. It makes more difficult task of detecting differences that are significant when the
null hypothesis is false.
5. This test is meaningless if the main test of k Independent samples has not revealed
significant results.
6. The larger the k value, the more difficult it is to detect differences.
5.1.1.3 Dunn Control Test (1964):
This method is used if each group of data is to be tested against a control
group. Sometimes research situation is such that one of the k treatments is a control
condition. When this is the case, the investigator is frequently interested in comparing
each treatment with control condition without regard to whether the overall test for a
treatment effect is significant, and irrespective of any potential significant differences
between other pairs of treatments. When interest focuses on comparing all treatments
![Page 12: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/12.jpg)
131
with a control condition, there will be k-1 comparisons to be made.Significant
Kruskal Wallis test not required for this test.
Test statistics:
SE
RRQ
ci
cal
(5.1.1.3.1)
For that, first combine the data, rank it, find the groups mean ranks and then take the
standardized absolute differences of these average ranks.
For equal sample size For unequal sample size
6)1( NkSE
ci nn
NNSE
11
12
)1(
If tied ranks are present,
)1(6
)()1(1
32
NN
ttNNk
SE
m
ii
If tied ranks are present,
)1(12
11)()1(
1
32
N
nnttNN
SE ci
m
iii
Where,
iR is the mean of the joint ranks for group i. i
ii n
RR
cR is the mean of the joint ranks for the control group c.
ni and nc are sample sizes for group i and the control group c respectively.
N is the total sample size N=∑n i
K is the total number of groups.
ti is the number of ties in the ith group of tie.
m is the number of groups of ties ranks.
![Page 13: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/13.jpg)
132
Critical Value:
)1(2*
k
Zz
(5.1.1.3.2)
is the 100α/(2(k-1))th upper quantile from a standard Gaussian (normal) distribution.
Decision Procedure:
Reject the null hypothesis if Qcal ≥ z* ; do not reject H0 otherwise.
Advantages:
1. This test is very flexible as it takes into account ties (Israel, 2008).
2. This test is useful for comparing groups with very small sample size (Israel,
2008). Relatively small total sample sizes may be analyzed with this technique,
i.e. three groups with five experimental units or more than 3 groups with 4 units
(Lehman, 1975).
3. The symmetry assumption, which is often difficult to assess in drug discovery
settings with small sample sizes, may be relaxed or ignored (Dmitrienko, A et al
2007).
4. This procedure can also be used for unequal sample size.
5. This procedure is more powerful for detecting differences between extreme
treatments when there are intermediate treatments present.
Disadvantages:
1. It uses a Bonferroni like correction to the FWE and might be a too conservative.
2. This method is little conservative for pair wise testing (Edward, 1971). It is overly
conservative on Type I error, so it has very weak power.
![Page 14: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/14.jpg)
133
3. This method employs joint ranking, thus the comparison of two groups is highly
influenced by the behavior of other groups in the experiment as the data are
initially ranked over the entire experiment.
4. It makes more difficult task of detecting differences that are significant when the
null hypothesis is false.
5. This test is meaningless if the main test of k independent samples has not revealed
significant results.
6. The larger the k value, the more difficult it is to detect differences.
5.1.1.4. Steel Dwass Test (1960):
This is multiple comparison test for the KW test in a manner analogous to the
Tukey or equivalent to Tukey. This is non parametric version for all pair wise
comparisons. This method use rank sums rather than sample means. It is used for
planned multiple comparison procedure. This method is used only for balanced case
n1=n2=…=nk = n. It is a simultaneous nonparametric inference for all pair wise
comparisons. This method is recommended for making pair wise comparisons after a
significant overall H has been obtained.
If one is interested in comparing the location parameters of the k experimental
groups simultaneously and preserving the FWE, approach suggested by Dwass
(Dwass, 1960) and Steele (Steele, 1960) is used. Steel (1960) and Dwass (1960)
independently proposed a single step procedure for this family that is based on
separate pair wise rankings of observations.
![Page 15: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/15.jpg)
134
Assumptions:
1. All the random samples are independent.
2. Equal sample size required.
Procedure:
H0:Fi=Fi’ kii '1
Let RSii’+ be the rank sum of n observations from treatment i when the 2n
observations from treatments i and i’ are ranked together, and let
)1(,)12( '''' kiiRSnnRSRS iiiiii
(5.1.1.4.1)
Let
)},{max(max '''1
*
'
iiii
kiik RSRSRS
(5.1.1.4.2)
And let)*(
kRS be the upper α point of the distribution of*kRS under the overall null
hypothesis H0: F1=F2=…=Fk. (Note that notation for pair wise comparison is different
from that for comparisons with a control. Here the subscript on RS* denotes the
number of treatments and not the number of pair wise comparisons among them).
The steel Dwass procedure rejects Hii’ : Fi=Fi’ in favor of two sided alternative if
)*('' ),max(
kiiii RSRSRS kii '1
Note that
2
)12(
2
)12(),max( '''
nn
RSnn
RSRS iiiiii
Which shows that)*(
kRS can be determined from the (kc2) variate joint distribution of
the statistics
'iiRS .
Where,
![Page 16: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/16.jpg)
135
24
)12(,
2
1
2
)12( 2)()(* nn
Qnn
RS kk
(5.1.1.4.3)
Where,
,kQ is the upper α point of the ,kQ random variable.
Advantages:
1. It is a simultaneous test procedure so confidence interval can be obtained by this
method.
2. Steel-Dwass procedure is not affected by the presence of other treatments and
hence higher power for detecting differences between adjacent treatments.
3. As this test uses the data in each pair of treatments separately, it should perform
best when the sample size is large compared to the number of treatments.
Disadvantages:
1. It cannot be used for unequal sample size.
2. It tends to be very conservative i.e. having a type I error much less than the stated
α (Zar, 1999).
3. It has very limited exact tables and the large sample approximation can be very
conservative when there are many treatments.
5.1.1.5. Steel control Test (1959):
It is non parametric test analogue to Dunnett procedure. It is a nonparametric
test that compares treatments with a control. It compares the medians of all groups
against a control using the Steel pair wise ranking nonparametric method.
![Page 17: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/17.jpg)
136
It controls the error rate simultaneously for all the k-1 comparisons. It is
generalization of wilcoxon test. Steel present a rank sum multiple comparison test for
comparing treatments with a control.
This procedure is developed to meet the need of those researchers whose
experiments generally include recognized standard treatments for comparison with
each of k treatments; such inclusion is required for where environmental conditions
may change from experiment to experiment.
Assumption:
1. Although equal variance is a formal requirement for this test, they are believed to
be relatively robust to variance heterogeneity (Newman, 1965).
2. It assumes a continuous distribution for the measured variable.
3. This method is applicable when there are equal numbers of observations for all
treatments.
4. This method is recommend for making pair wise comparisons after a significant
over all H has been obtained i.e. It is protected test.
Procedure:
The formal null hypothesis is that all observations come from the same
population regardless of treatment.
Consider a control treatment labeled k and test treatments labeled 1,2,….,k-1
where k≥3.it also assumes that n1=n2=…..=nk-1=n(say), which may be different from
nk. Steel (1959) proposed a single step test procedure for the family of hypotheses
)11(: kiFFH kioi . In this procedure the n observations from Fi and the nk
observations from Fk are pooled and rank ordered from the smallest to the largest.
Because the observations from only the treatments being compared are ranked, this is
![Page 18: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/18.jpg)
137
referred to as the method of separate rankings. Let Rij be the rank of Yij in this ranking
(1≤ j≤ n) and let RS+ik be the wilcoxon rank sum statistic:
n
jijik RRS
1 (1 ≤ i ≤ k-1) (5.1.1.5.1)
Suppose that the alternative to the Hoi are the one sided hypotheses H1i: Fi < Fk (1≤ i≤
k-1)
In this case steel’s procedure reject Hoi if
)(1
kik RSRS (1 ≤ i ≤ k-1) (5.1.1.5.2)
Where)(
1kRS is the upper α point of the distribution of
ikk RSRS max1 (1 ≤ i ≤ k-1)
Under the overall null hypothesis H0:F1=F2=….=Fk.
If the alternative to the H0i’ are the one sided hypothesis H1i-: Fi > Fk (1≤ i≤ k-1)
In this case steel’s procedure reject H0i if
)(1)1(
kikkik RSRSnnnRS(1 ≤ i ≤ k-1) (5.1.1.5.3)
(Note thatikRS is the rank sum for sample i if all n+nk observations from treatment i
and k are assigned ranks in the reverse order .The same critical point)(
1kRS is used
because the joint distribution of theikRS is the same as that of
ikRS under H0)
For the two sided alternative, Steel’s procedure reject H0i if
)(1),max(
kikikik RSRSRSRS (1 ≤ i ≤ k-1) (5.1.1.5.4)
Where)(
1kRS is the upper α point of the distribution of
ikki
k RSRS11
1 max
(1 ≤ i ≤ k-1) (5.1.1.5.5)
![Page 19: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/19.jpg)
138
Steel (1959) computed exact upper tail probabilities of the null distribution of RSk-1
for k=3,4; n=3,4,5 where n=ni (1≤i≤k). Thus for n1=n2=….nk-1=n(say) ,a large sample
approximation to)(
1kRS
is given by
12
)1(
2
1
2
)1( )(
,1
)(* kkk
kk
nnnnZ
nnnRS
(5.1.1.5.6)
Where,
)(
,1
kZ
is the corresponding two sides upper α equicoordinate point.
)11()1)(1(
),( 1
kjinnnn
nnRSRScorr
kjki
jijkik
(5.1.1.5.7)
Steel’s procedure reject H0i if
)*(),max( kikikik RSRSRSRS
Advantage:
This method is relatively robust to variance heterogeneity.
Disadvantages:
1. These tests can only be used for one way designs, in contrast to the joint rank
tests.
2. Equal sample size required.
5.1.2 Comparisons of Tests:
The methods discussed above are compared with respect to different aspects like
Conservatism and Power and Simulation study.
![Page 20: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/20.jpg)
139
Conservative:
A procedure developed by Steel and Dwass is somewhat more advantageous
than the test of Nemenyi and Dunn, but it is less convenient to use and it tends to be
very conservative (Miller, 1981). The Dunn method appears to be most conservative
in that it required larger critical value at every k value than Steel technique (Edward,
1971).
Power:
Dunn method is more powerful than Steel. Steel Dwass test is slightly more
robust than the Nemenyi joint rank test. However both the tests are less robust when
two or more variances are large and unequal variances are expected to have more
effect when sample sizes differ.
Skillings (1983) provided some useful guidelines based on a simulation study.
He found that neither procedure is uniformly superior in terms of power for all non
null configurations. The Dunn procedure is more powerful for detecting differences
between extreme treatments when there are intermediate treatments present. On the
other hand, the Steel-Dwass procedure is not affected by the presence of other
treatments and hence has higher power for detecting differences between adjacent
treatments.
Other:
The Dunn method is different in that it computes ranks on all the data, not just
the pair being compared. Dunn control is similar to Steel with Control option. For
both the method, (i.e. Dunn and Steel Dwass method) the reported p-Value reflects a
![Page 21: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/21.jpg)
140
Bonferroni adjustment. It is the unadjusted p-value multiplied by the number of
comparisons. If the adjusted p-value exceeds 1, it is reported as 1.
Simulation Study 1:
The data was regarding the staff of a mental hospital is concerned with which
kind of treatment is most effective for a particular type of mental disorder (Table no.
41). A battery of tests administered to all patients delineated a group of 40 patients
who were similar as regards diagnosis and also personality, intelligence, projective
and physiological factors. These people were randomly divided into four different
groups of 10 each for treatments. For 6 months the respective groups received (1)
electroshock, (2) Psychotherapy, (3) electroshock plus Psychotherapy, and (4) no type
of treatment. At the end of this period the battery of tests were repeated on each
patient. The only type of measurement possible for these tests is a ranking of all 40
patients on the basis of their relative degree of improvement at the end of the
treatment period; rank 1 indicates the highest level of improvement, rank 2 the second
highest, and so forth. On the basis of these data, does there seem to be any difference
in effectiveness of the types of treatment?
Ho:θ1=θ2=…=θ4 i.e. Four groups have the same location parameter.
H1:θi≠θj i.e. At least one group has different location parameter.
Here we used Kruskal Wallis test (Table no. 43) to see whether the four groups have
the same location parameter. The probability is 0.000 so we reject the null hypothesis
of equal medians for the four groups.
When the null hypothesis is rejected, as in the normal theory case, one can compare
any two groups, say i and j (with 1≤i<j≤k), by a multiple comparison procedure. This
can be done by Nemenyi, Dunn and steel Dwass method.
![Page 22: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/22.jpg)
141
Dunn method:
Table 18: Multiple comparisons through Dunn Test
Pair ji RR StandardError (SE)
Test StatisticsQcal=Differenc
e/SE
CriticalValue
)1(*
kk
Zz
Nullhypothesis
Result
(1,2) 13.8 5.228129 2.6395676 2.13 Reject Significant(1,3) 17 5.228129 3.2516412 2.13 Reject Significant
(1,4) 8.8 5.228129 1.6832025 2.13Do notReject
NotSignificant
(2,3) 3.2 5.228129 0.6120736 2.13Do notReject
NotSignificant
(2,4) 22.6 5.228129 4.3227701 2.13 Reject Significant(3,4) 25.8 5.228129 4.9348438 2.13 Reject Significant
The quantity α is called the family wise error rate or the overall significance level,
which is the probability of at least one erroneous rejection among the k(k-1)/2 pair
wise comparisons. Choose a high significance level. Therefore, ensure that you take
up a higher value of α for the larger number of k, instead of the usual 5 per cent level
of significance. In this case, since k = 4, let us use the α of 20 per cent level of
significance (i.e. 4 × 5 per cent level of significance) to find out the value of Z to an
appropriate upper probability of α/k (k – 1). This is the procedure which you have to
blindly follow. In this way, we calculate first α as 0.2/4 (4 – 1) = 0.2/12 = 0.01667 so
critical value is 2.13.
Having known the value of upper probability, standard normal cumulative
probabilities table is used and run through the values to find out where this 0.01667
lies. See from the table, it is found for Z = 2.13. Therefore, we can say that our null
hypothesis of no difference in effectiveness of the types of treatments will be rejected
if Qcal ≥ 2.13.
![Page 23: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/23.jpg)
142
Compare Qcal and critical value of Z, and make a decision. By comparing the values
of Qcal for each pair with the critical value is 2.13. We find significant differences in
the treatments between the pair 1and 2(i.e. electroshock and psychotherapy), between
pair 1and 3(i.e. electroshock and electroshock plus psychotherapy), between pair 2
and 4 (i.e. psychotherapy and no type of treatment) and between pair 3 and 4(i.e.
electroshock plus psychotherapy and no type of treatment).
Nemenyi method:
Table 19: Multiple comparisons through Nemenyi Test
PairDifference
in Ranksums
StandardError (SE)
12
)1)((..
nknknES
Q=Difference
in RankSums/SE
Critical QValue at
0.05Level
Nullhypothesis
Result
(1,2) 138 23.3809 5.902253 3.633 Reject Significant(1,3) 170 23.3809 7.270891 3.633 Reject Significant(1,4) 88 23.3809 3.763755 3.633 Reject Significant
(2,3) 32 23.3809 1.368638 3.633 Do not RejectNot
Significant(2,4) 226 23.3809 9.666008 3.633 Reject Significant(3,4) 258 23.3809 11.03465 3.633 Reject Significant
The result column in the table shows accordingly the Nemenyi multiple comparison
results indicate that the treatment is same for treatment 2 and 3 but it is different for
treatment 1 and 2, treatment 1 and 3, treatment 1 and 4, treatment 2 and 4, treatment 3
and 4. Looking at the rank sum, we find that treatment 3 is more effective than 1, 2
and 4.
Steel Dwass method:
Table 20: Multiple comparisons through Steel Dwass Test
Pair ijRS
ijRS ),max( ijij RSRS )*(
kRS
Nullhypothesis(Comparecolumn 4with 5)
Result
(1,2) 154 56 154 139.4555 Reject Significant(1,3) 154 56 154 139.4555 Reject Significant
![Page 24: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/24.jpg)
143
The result column in the table shows accordingly the Steel Dwass multiple
comparison results indicate that the treatment is same for treatment 2 and 3 but it is
different for treatment 1 and 2, treatment 1 and 3, treatment 1 and 4, treatment 2 and
4, treatment 3 and 4.
From the simulation study also, we can see that Steel Dwass and Nemenyi reject more
hypothesis than Dunn procedure. Steel Dwass procedure is more advantageous than
the Dunn test. We can also see that Dunn method appears to be most conservative in
that it required larger critical value.
Simulation Study 2:
Experimental group V/s Control group
A fertilizer manufacturer conducted an experiment to compare the effect of four types
of fertilizer on the yield of a certain grain(Table no. 44). Homogeneous equal size
experimental plots of soil were made available for the experiment. They were
randomly assigned to receive one of the five fertilizers, and plots receiving no
fertilizer served as controls. Nine plots were randomly selected from those previously
assigned to each of the fertilizers and control plots. The yields (in coded form) for
each plot are given in appendix.
Dunn control:
Table 21: Ranks, rank totals and mean ranks
Fertilizer1 2 3 4 5
None(0) A B C D10.5 16 28.5 33 45
1 15 23 37 42.52.5 17 23 23 38
(1,4) 62 148 148 139.4555 Reject Significant
(2,3) 121 89 121 139.4555Do notReject
Not Significant
(2,4) 55 155 155 139.4555 Reject Significant(3,4) 55 155 155 139.4555 Reject Significant
![Page 25: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/25.jpg)
144
5 10.5 26 35.5 406 12.5 30 31.5 42.5
2.5 7 21 25 348.5 12.5 20 31.5 42.58.5 19 28.5 42.5 394 14 18 27 35.5
Total(R) 48.5 123.5 218 286 359
Mean ( R ) 5.39 13.72 24.22 31.78 39.89
Table 22: Comparison of yields of plots receiving fertilizer to yields of plots receivingno fertilizer by Dunn Control Test
Pair iRR 0SE
TestStatistics
Qcal=Difference/SE
CriticalValue
)1(*2*
k
Zz
Nullhypothesis
Result
A(2) 8.33 6.187108 1.34634796 1.96 Do not RejectNot
SignificantB(3) 18.83 6.187108 3.04342523 1.96 Reject SignificantC(4) 26.39 6.187108 4.26532086 1.96 Reject SignificantD(5) 34.50 6.187108 5.57611101 1.96 Reject Significant
Since 1.34634796 is less than 1.96, we cannot conclude that fertilizer A is better than
no fertilizer. Since 3.04342523, 4.26532086 and 5.57611101 are all greater than 1.96.
We conclude that fertilizers B, C and D will all result in higher yields than if no
fertilizer at all used.
Steel Control:
Table 23: Comparison of yields of plots receiving fertilizer to yields of plots receivingno fertilizer by Steel Control Method
Pair ikRS
ikRS
),max( ikik RSRS )*(
kRS
Nullhypothesis(Comparecolumn 4with 5)
Result
A(2) 122.5 48.5 122.5 110.4615 Reject SignificantB(3) 126 45 126 110.4615 Reject SignificantC(4) 126 45 126 110.4615 Reject SignificantD(5) 126 45 126 110.4615 Reject Significant
![Page 26: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/26.jpg)
145
From the table, we can see that 154, 148 and 121 are all greater than 110.4615. We
conclude that fertilizers A, B, C and D will all result in higher yields than if no
fertilizer at all used.
From the simulation study of control group, we can see that Steel control procedure is
more advantageous than the Dunn control test. We can also see that Dunn method
appears to be most conservative in that it required larger critical value. Steel control
reject more hypothesis than Dunn control.
Table 24: Comparison of Multiple Comparison Procedure for Non Parametric Test
Test Use Test statisticsCriticalValue
EqualSample
size
JointRanking/Pair
wiseRanking
RankSum/Mean
RankCI
SteelDwass
Pair wise ),max( ''iiii RSRS )*(
kRS Yes Pair wiseRankSum Yes
NemenyiJointrank
Pair wise12
)1)((
nknkn
RR ji
Yes JointRankSum No
Dunn Pair wiseSE
RR ji )1( kk
Z No JointMeanRank No
Dunncontrol
Contrast ofcontrol
group witheach
experimentalgroup
SE
RR ci )1(2 k
Z No JointMeanRank No
Steel
Contrast ofcontrol
group witheach
experimentalgroup
),max( ikik RSRS )(
1kRS Yes Pair wise
RankSum Yes
kq ,,
![Page 27: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/27.jpg)
146
5.2: Nonparametric Post Hoc Test with adjusted p value
An adjusted p-value is defined as the smallest significance level for which the
given hypothesis would be rejected, when the entire family of tests is considered. The
decision rule is to reject the null hypothesis when the adjusted p-value is less then α;
in most cases, this procedure controls the FWE at or below α level.
5.2.1. Introduction:
In this chapter we have discussed tests based on adjusted p values such that, if the
adjusted p value for an individual hypothesis is less than the chosen significance level
α, then the hypothesis is rejected with FWE not more than α. It includes Bonferroni
procedure and modification of that procedure by Holm, Holland & Copenhaver,
Hommel, Hochberg and Rom. From them some of the methods are Single step
procedure and others are step wise methods. Further Step wise methods can be
categorized in two ways i.e. Step up method and step down method.
![Page 28: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/28.jpg)
147
Figure 5: Non parametric Adjusted P Value Methods
5.2.1.1 Bonferroni Test (1961):
Bonferroni based procedure is recommended when data is not continuous
because this procedures have no distributional assumptions (Ludbrook, 1991). The
Bonferroni method applies to both continuous and discrete data. This method is
flexible because it controls the FWE for tests of joint hypotheses about any subset of
m separate hypotheses (including individual contrasts). The procedure will reject a
joint hypothesis H0 if any p-value for the individual hypotheses included in H0 is less
than α/c. Bonferroni method, however, yields conservative bounds on Type I error
hence it has low power. This procedure controls the FWE at α without any further
assumption on the dependence structure of the p value.
The test is discussed in the section 3.1.5.
Single Step
Bonferroni
147
Figure 5: Non parametric Adjusted P Value Methods
5.2.1.1 Bonferroni Test (1961):
Bonferroni based procedure is recommended when data is not continuous
because this procedures have no distributional assumptions (Ludbrook, 1991). The
Bonferroni method applies to both continuous and discrete data. This method is
flexible because it controls the FWE for tests of joint hypotheses about any subset of
m separate hypotheses (including individual contrasts). The procedure will reject a
joint hypothesis H0 if any p-value for the individual hypotheses included in H0 is less
than α/c. Bonferroni method, however, yields conservative bounds on Type I error
hence it has low power. This procedure controls the FWE at α without any further
assumption on the dependence structure of the p value.
The test is discussed in the section 3.1.5.
Adjusted P- Valuemethod
Single Step
Bonferroni
Step Wise
Step Up
Hommel
Hochberg
Rom
Step Down
Holm
Holland
147
Figure 5: Non parametric Adjusted P Value Methods
5.2.1.1 Bonferroni Test (1961):
Bonferroni based procedure is recommended when data is not continuous
because this procedures have no distributional assumptions (Ludbrook, 1991). The
Bonferroni method applies to both continuous and discrete data. This method is
flexible because it controls the FWE for tests of joint hypotheses about any subset of
m separate hypotheses (including individual contrasts). The procedure will reject a
joint hypothesis H0 if any p-value for the individual hypotheses included in H0 is less
than α/c. Bonferroni method, however, yields conservative bounds on Type I error
hence it has low power. This procedure controls the FWE at α without any further
assumption on the dependence structure of the p value.
The test is discussed in the section 3.1.5.
Step Down
Holm
Holland
![Page 29: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/29.jpg)
148
5.2.1.2 Holm Test (1979):
Holm proposed a modification of Bonferroni procedure(discussed in 5.2.1.1)
that yields a more powerful test. The goal of Holm method is to increase the power of
the statistical tests while keeping under control the FWE. It is a step down procedure.
It is also called a sequential rejection method because it examines each hypothesis in
an ordered sequence and the decision to accept or reject the null hypothesis depends
on the results of the previous hypothesis tests (Tamhane et al, 1998). Holm uniformly
improves the Bonferroni approach. He was the first to formally introduce a
sequentially rejective Bonferroni procedure. Bonferroni method does not account for
the correlations between the test statistics, the Holm procedure can be improved.
Holm method can be applied to almost any data because of its non-parametric
nature. This test can be applied in any pair-wise comparison where the classical
Bonferroni test is usually applied. It is applicable when pair wise comparisons of
median or linear combinations or non linear combinations of median are used. It is
used to perform priori comparison. For several a priori contrasts, not necessarily pair
wise, it controls FWE while at the same time maximizes the power (Howell, 2007).
Assumptions:
There are no restrictions on the type of test, the only requirement is that it
should be possible to calculate the obtained level for each separate test. Further, there
are no problems to include in the analysis only for the a priori interesting hypotheses,
while more special multiple tests usually include on all hypotheses of a certain kind.
Holm’s procedure may be used either as a protected test or as an unprotected
test but the protected version is preferred due to the additional power gains. But when
there exist logical implications among the hypotheses, problems arise which we have
to take in to consideration (Holm, 1979). So, Holm’s procedure makes no
![Page 30: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/30.jpg)
149
distributional assumptions, logical assumptions about the hierarchy of the hypotheses
to be tested and does not assume independence of comparisons (Zweifel, 2014).
Procedure:
Order the p values, p( ) ≥……≥ p( ) , and denote the corresponding hypotheses,H( ),…..,H( ). Start with the smallest p value, p( ). If p( ) > α/c, then stop testing and
accept all the hypotheses; otherwise reject H( ) and go to the next step. In general, if
testing has continued to the ith step (1 ≤ i ≤ c) and if p( ) > α/(c − i + 1), then
stop testing and accept all the remaining hypotheses, H( ),….., H( ) ; otherwise
reject H( )and go to the next step.
In short, this procedure rejects the specific hypothesis H(i) for i = 1,2,…,c, provided
both P(i) ≤ α/(c-i+1) and H(1),…, H(i-1) have all been rejected.
Like Bonferroni procedure, Holm’s procedure can also modify p-values directly
multiplying the p-value by the adjusted C-i+1, where i is an index of the step
associated with the p value.
For Example see Cohen (2007) Explaining Psychological Statistics, p. p. 411.
For unequal sample size, the test statistics is same as Bonferroni given by…(3.1.1.1).
For equal sample size, the test statistics is given as
n
MS
xxt
error
ji
2'
(5.2.1.2.1)
Calculate t’ for all contrasts of interest and then arrange the t’ values in
increasing order without regard to sign. This ordering can be represented as ′ ≤′ ≤ ′ ≤ ⋯ ≤ | ′ |, where c is the total number of contrasts to be tasted.
The first significance test is carried out by evaluating against the critical value in
Dunn’s table corresponding to c contrasts. In other words, is evaluated at α’ = α/c.
![Page 31: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/31.jpg)
150
If this largest t’ is significant, then we test the next largest t’ (i.e. ′ ) against the
critical value in Dunn’s table corresponding to c-1 contrasts. Thus, ′ is evaluated
at α’=α/(c-1). The same procedure continues for ′ , ′ , ′ ,…until the test
returns a non-significant result. At that point we stop testing. Holm has shown that
such a procedure continues to keep FWE ≤ α, while offering a more powerful test.
The logic behind the test is that when we reject the null for tc, we declare that
null hypothesis to be false. If it is false, that only leaves c-1 possibly true null
hypotheses, and so we only need to protect against c-1 contrasts. A similar logic
applies as we carry out additional tests. This logic makes particular sense when even
before the experiment is conducted we know that some of the null hypotheses are
almost certain to be false. If they are false, there is no point in protecting from
erroneously rejecting them.
Critical Value:
1 ic (5.2.1.2.2)
Decision procedure:
Reject H(1) to H(i-1) if
P(i) ≤ α(5.2.1.2.3)
α will change at all stages because of its step down nature.
The critical value of this method is based on the Bonferroni inequality.
Advantages:
1. This method is flexible and simple to implement.
2. It controls the FWE in the strong sense, i.e. it guarantees control of generalized
Type I error probability to be at most α (Hochberg, 1988; Schochet, 2008;
Ekenstierna, 2004; Hochberg & Benjamini, 1990; De Muth, 2006).
![Page 32: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/32.jpg)
151
3. It does not require any assumptions regarding the population distribution (Holm,
1979; Qian et al, 2013). This method does not require strong assumptions such as
independence.
4. It is based on the Bonferroni inequality and valid regardless of the joint
distribution of the test statistics (Li, 2009).
5. This procedure maintains the Type I error rate below α for all combinations of
variance heterogeneity, non normality, sample size, effect size and pattern of
mean difference (Zweifel, 2014).
6. This archives lower type II error while keeping the type I error rate at level less
than α (Hochberg & Benjamini, 1990).
7. It can be used for equal as well as for unequal sample size.
Disadvantages:
1. Power of this method is small if all the hypotheses are almost true but it may be
considerable if a number of hypotheses are completely wrong (Holm, 1979).
2. It gives which comparisons are statistically significant but does not compute
confidence intervals.
3. It does not consider the logical interrelationships among the c hypothesis.
4. It becomes very conservative when the numbers of comparisons are large and
when tests are not independent (De Muth, 2006).
5. Holm’s procedure produces low power when conditions are not ideal, such as
when the sample or effects sizes are small (Zweifel, 2014).
![Page 33: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/33.jpg)
152
5.2.1.3. Holland & Copenhaver Test (1987):
It uses the Sidak (1967) inequality to set the criterion for each hypothesis test. It is a
step down procedure. When there is need for further research in situations, where
there is no logical inter relationship among the hypotheses, this method is useful.
Assumptions:
Positive orthant dependence of the test statistics is required.
Procedure:
Let p(1),…,p(c) be the ordered p values (smallest to largest) and H(1),…,H(c) be the
corresponding hypotheses. Suppose i is the smallest integer from 1 to c such that p(i)
> 1 — (1 — α )1/(c-i+1); the Holland-Copenhaver procedure rejects H(1) to H(i-1) and
retains H(i) to H(c) (Olejnik et al,1997).
Test Statistics:
For unequal sample size, the test statistics is same as Bonferroni given by…(3.1.1.1)
For equal sample size, the test statistics is same as Holm given by…(5.2.1.2.1)
Critical Value:
1 — (1 — α)1/(c-i+1) (5.2.1.3.1)
Decision procedure:
Reject H(1) to H(i-1) if
p(i) < 1 — (1 — α )1/(c-i+1) (5.2.1.3.2)
Advantages:
This method is conservative under the condition that the test statistics are positive
orthant dependent.
Disadvantages:
Applicability of this method is slightly less than the Holm procedure because of the
requirement of positive orthant dependent condition for test statistics.
![Page 34: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/34.jpg)
153
5.2.1.4 Hommel Test (1988):
Hommel (1988) employs the closure principle to extend Simes test and
developed a stepwise multiple testing procedure controlling FWE. It is based on the
Simes (l986) equality. This is a step up method and it is protected test. This procedure
is conservative only when the test statistics are independent, because it based on the
Simes equality for independent p values. It is not always necessary to test every
possible combination of hypothesis i.e. it can also be used for few comparisons.
The work of Hommel’s who generalized Simes procedure that it gives strong
control of FWE whenever Simes original procedure does achieve weak control (e.g.
with independent tests).
Assumptions:
Test statistics are independent.
Procedure:
Reject all hypothesis that have a p value ≤ α/j’ where j is defined as
',...1
':,...1max )'(
' ikfori
kpcij kic
If j is non empty, reject Hi whenever Pi ≤ α/j’ with j’=max j. If j is empty, reject all Hi
(i=1,2,…c)
This procedure includes two stages. The first stage uses the obtained p-values to
compute the number of members in J. The second stage obtains the significance level
of rejection using α'=α/j', where j’ is the largest number in J.
Test Statistics:
Test statistics is same as Holm given in…(5.2.1.2.1)
Critical Value
α/j’ (5.2.1.4.1)
![Page 35: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/35.jpg)
154
Advantages:
1. It protects the FWE only when test statistics are independent (Dmitrienko et al,
2009; Olejnik et al, 1997).
2. The uniqueness of this procedure is that it not only considers the order of the tests
but also takes the obtained p values into the calculation while computing the α'.
Disadvantages:
1. This method is relatively complicated.
2. When correlations between variables are negative, the test can sometimes allow
slightly more Type I errors than the stated maximum family wise error.
3. It controls overall type I error rate only when test statistics are independent
(Olejnik et al,1997).
5.2.1.5 Hochberg Test :(1988)
It is a modification of Dunn procedure. This procedure uses critical values
identical to those used in Holm procedure but provides a potential for increased power
by conducting the tests in a step-up rather than step –down sequence. It is a step up
method and based on the Simes (1986) equality.
Assumptions:
Tests are independent of one another.
Procedure:
Hochberg derived an even sharper procedure which uses the ordered pis but in a
different way from Holm's procedure. This procedure starts by examining the largest
p-value p(c). If p(c) ≤ α, then H(c ) and all other hypotheses are rejected. If not, H(c) is
not rejected and one proceeds to compare p(c-1) with α/2. If the former is smaller, then
H(c-1) and all hypotheses with smaller p-values are rejected. Generally, one proceeds
![Page 36: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/36.jpg)
155
from highest to lower p-values, retaining Ho, if its p-value satisfies p(i) > α/(c — i +
1). One stops the procedure at the first ordered hypothesis when that inequality is
reversed. This hypothesis is rejected and all hypotheses with lower or equal p-values.
This is always a sharper procedure than Holm's.
Critical Value:
1 ic (5.2.1.5.1)
Decision procedure:
Reject H(1) to H(i) for any i=c,c-1,….1 if
P(i) ≤ (5.2.1.5.2)
Advantages:
1. This procedure has strong control over the FWE α even if the free combination
condition is not satisfied (Holm, 1979; Holland & Copenhaver, 1987; Olejnik et
al, 1997).
2. It controls the FWE under the same conditions for which the Simes global test
control the Type I error rate.
3. This method always achieves the same type I FWE control and lower type II error
rates (Hochberg & Benjamini, 1990).
4. It has nice characteristic that no adjusted p value can be larger than the largest of
the unadjusted P values (Wright, 1992).
5. This method is able to reject at least one individual hypothesis when the global
null hypothesis is rejected. This property of consonance makes Hochberg
procedure easy to interpret (Rom, 1990).
Disadvantages:
1. It lacks the stability under certain conditions, for example, when the test statistics
are dependent or correlated (Schochet, 2008).
![Page 37: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/37.jpg)
156
2. It only can be applied in the independent hypotheses tests (Olejnik, et al. 1997;
Schochet, 2008).
5.2.1.6 Rom Test (1990):
It is a modification of Hochberg procedure to increase the statistical power. It is a step
up procedure. Increased power is achieved by identifying the appropriate adjusted
significance levels that control the Type I error rate at exactly the nominal level when
test statistics are independent (Olejnik et al, 1997).
Assumptions:
Test statistics are independent.
Procedure:
The Rom procedure differs from the Hochberg procedure when the adjusted
significance level is obtained. Both procedures set α'(m) equal to α and α’(m - 1) equal to
α/2, but the remaining m - 2 adjusted significance levels differ. The adjusted
significance levels are determined recursively as
i
i
j
i
j
jijm
i
j
j
im
1
1
2
1
)('
1'
(5.2.1.6.1)
i=1,2,…m
where αm - 1 = α and αm - 2 = α/2.
It is step up procedure with different critical value of c1=α, c2=α/2, c3=α/3 +α2/12 etc.
First, we denote H(1) as the hypothesis with the largest p-value and H(m) as the
hypothesis with the smallest p-value.
The testing starts by comparing p(1) with α(1) and stops when p(i) < α(i).Then H(1) to
H(i+1) retained and H(i) to H(m) rejected. The computing equation for solving αi’s can be
divided into three parts. The first part is α1+α2+…αi-1 and the second part is
![Page 38: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/38.jpg)
157
)()(....)()()()(2'
)1(2
2')3(
2
1')2(
1
i
i
iiiii
The third part is to solve for αi’, which
subtract the second part from the first part, and divide the difference by i.
Advantages:
1. It exactly control the FWE at α for independent test statistics (Schochet, 2008).
2. It gives motivation of lowering type II error.
3. The Rom procedure having the desired FWE only for independent test, for
complex comparison.
Disadvantages:
1. The calculation of this method is complicated and iterative.
2. It provides adjusted critical values for up to 10 tests when the overall alpha equals
0.05 and 0.01. The numbers of hypothesis test increases, the calculations become
impractical even when a computer is used.
5.2.2. Comparisons of Tests:
The methods discussed above are compared with respect to different aspects like
Conservatism, Power and Confidence Interval estimation and simulation study.
Conservatism:
Bonferroni method has the largest p values and thus most conservative
methods, followed by the Holm (1979), Hochberg (1988), and Hommel (1988)
methods. The Bonferroni and Holm (1979) methods shows the lowest Type I error,
whereas the Hochberg (1988) and Hommel (1988) methods allowed more error but
are still conservative when ρ (correlation) exceeded 0.5.
Holm procedure is a closed testing procedure in which each intersection
hypothesis is tested using a global test based on the Bonferroni procedure. Holm
procedure rejects the global hypothesis if and only if the Bonferroni procedure does
![Page 39: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/39.jpg)
158
and therefore the conclusions regarding the conservative nature of the Bonferroni
procedure also apply to the Holm procedure (Dmitrienko et al, 2009).
Hochberg procedure uses the same criterion for each hypothesis as does the
Holm procedure but tests hypotheses with larger p values first. Consequently this
procedure will test and possibly reject hypotheses not examined by the Holm
procedure while rejecting the same hypotheses that are rejected by the Holm
procedure ((Dunnett & Tamhane, 1992; Hochberg, 1988; Olejnik et al, 1997). In most
real-life cases, the conclusions from the two methods i.e. Holm & Hochberg will
rarely differ.
Power:
Holm procedure is more powerful than Bonferroni method because the bound for this
method sequentially increases whereas the Bonferroni bound remains fixed. Holm
procedure is at least as powerful as Bonferroni because. Statistical power is gained by
sequentially increasing the criterion for statistical significance. Because any
hypothesis rejected by the original Bonferroni procedure will also be rejected by the
Holm procedure, the latter procedure cannot have lower power for an individual
hypothesis test. However, Holm claims that in actual practice the gain in power with
his procedure as compared to Bonferroni is non negligible because α/(c − i − 1) is
much larger than α/k for many values of i (Olejnik et al, 1997).
Any hypothesis rejected by Holm’s procedure will always be rejected by Hochberg’s
procedure (Dunnett & Tamhane, 1992; Hochberg, 1988). However, the power
differences tend to be negligible (Olejnik et al., 1997). Hochberg procedure is
uniformly more powerful than the Holm procedure (Hochberg, 1988) but, on the other
hand, it is uniformly less powerful than the Hommel procedure (Hommel, 1989).
However, due to the independence assumption required by Hochberg, the Holm
![Page 40: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/40.jpg)
159
procedure may be the best choice if independence of tests is not certain. The criterion
used by the Holland and Copenhaver procedure is slightly larger than Holm procedure
thus leading to slightly greater power for an individual hypothesis test (Olejnik et al,
1997).
Hommel method is uniformly more powerful than Holm procedure because
the Simes test is uniformly more powerful than the global test based on the Bonferroni
procedure (Dmitrienko et al, 2009). For n>2, there are situation where Hommel reject
and Hochberg does not reject (Hommel, 1989). Hommel procedure rejects more
hypotheses than either the Rom or the Holland-Copenhaver procedure; however the
difference in the number of tests rejected is very small. Hochberg and Hommel
procedure are more powerful but they are known to have the desired FWE only for
independent test (Hommel, 1989). Rom gives slightly higher critical p-value that can
be used with Hochberg’s procedure, making it somewhat more powerful.
Holm’s procedure is least powerful method, because it is based on the
Bonferroni inequality. Rom procedure and Hommel procedure are more powerful
than Hochberg’s procedure due to the fact that sharp inequalities (or equalities) are
used in both (i.e. Rom & Hommel) procedures; however, the power improvement is
negligible compared to their complexities.
The increase in power for individual hypotheses tests provided by the Hommel
and Rom procedures over the Hochberg approach is at best marginal with the Rom
procedure having only a slight advantage over the Hommel (Dunnet and Tamhane,
1992; Olejnik, 1997).
Holland Copenhaver and Hochberg procedures provide power very close to
that obtained by the Hommel and Rom procedures, particularly when the total number
of hypotheses tested is not too large. If the numbers of false null hypotheses are large,
![Page 41: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/41.jpg)
160
Hochberg procedure might provide a better chance of detecting all of them than the
Holland-Copenhaver procedure.
As the sample size increases, power of statistics increases but when the
number of variables in a matrix increases, the probability of rejecting all of the non-
null hypotheses decreases. All five of the enhancements are more sensitive than the
original Bonferroni procedure in detecting all true nonzero relationships. The
difference between the original Bonferroni procedure and the enhancements increased
as the number of true nonzero relationships increased. Very small differences in
statistical power are found among the five enhancements to the original Bonferroni
procedure. The Holm procedure is having the lowest sensitivity in detecting all true
nonzero relationships, whereas the Rom procedure has the greatest power. When all
the correlations are nonzero, the Hochberg, Hommel, and Rom procedures had the
same estimated power.
Because step-up sequential multiple comparisons are based on the Simes
equality, which assumes independence of comparisons, it is reasonable to suggest that
dependence or correlation between the means of groups should affect the Type I error
control and power (Zweifel, 2014).
In summary, the comparison of (Bonferroni, Holm, Holland, Hochberg,
Hommel, Rom), Bonferroni procedure has the lowest percentage of rejections and
Hommel procedure has the highest percentage of rejections whenever differences
exist among the procedures. Overall, the SU procedures are little more powerful than
the SD procedures. Within the SU procedures, whenever differences occurred, the
Hommel procedure has slightly higher percentage of rejections than the Hochberg
procedure. Within the SD procedure, whenever difference occur, Holland procedure
![Page 42: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/42.jpg)
161
having a slightly higher percentage of rejection than Holm procedure (Olejnik et al,
1997).
Confidence Interval:
All the methods are step wise methods except Bonferroni so confidence interval
cannot be obtained by any of the method so comparison is not possible with respect to
Confidence Interval.
Simulation Study:This section discuss results regarding tests to be reported as adjusted p values such
that, if the adjusted p value for an individual hypothesis is less than the chosen
significance level α, then the hypothesis is rejected with FWE not more than α. It
includes Bonferroni procedure and modification of that procedure by Holm, Holland
& Copenhaver, Hommel, Hochberg and Rom.
As a concrete example, imagine that we have ten p values, and they are (in order
from smallest to largest) as follows: 0.002, 0.0054, 0.007, 0.008, 0.009, 0.0094, 0.012,
0.015, 0.028 and 0.067.
We will compare probability with critical value based on Bonferroni method and
modification of that procedure by Holm, Holland & Copenhaver, Hommel, Hochberg
and Rom.
Table 25: Rejection criteria according to different available Tests (for adjusted pvalue)
No Prob.Bonfer
roniHolm
Holland &Copenhaver
Hommel Hochberg Rom
1 0.002 0.005 0.005 0.005116197 0.025 0.005 0.0051152 0.0054 0.005 0.005556 0.005683045 0.025 0.005556 0.0056813 0.007 0.005 0.00625 0.006391151 0.025 0.00625 0.0063884 0.008 0.005 0.007143 0.007300832 0.025 0.007143 0.00735 0.009 0.005 0.008333 0.008512445 0.025 0.008333 0.0085056 0.0094 0.005 0.01 0.010206218 0.025 0.01 0.01027 0.012 0.005 0.0125 0.012741455 0.025 0.0125 0.01278 0.015 0.005 0.016667 0.016952428 0.025 0.016667 0.016875
![Page 43: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/43.jpg)
162
9 0.028 0.005 0.025 0.025320566 0.025 0.025 0.02510 0.067 0.005 0.05 0.05 0.025 0.05 0.05
Table 26: Hypotheses Rejection by all these multiple comparison procedure withadjusted p value
No. Bonferroni HolmHolland &
CopenhaverHommel Hochberg Rom
1 Reject Reject Reject Reject Reject Reject2 Accept Reject Reject Reject Reject Reject3 Accept Accept Accept Reject Reject Reject4 Accept Accept Accept Reject Reject Reject5 Accept Accept Accept Reject Reject Reject6 Accept Accept Accept Reject Reject Reject7 Accept Accept Accept Reject Reject Reject8 Accept Accept Accept Reject Reject Reject9 Accept Accept Accept Accept Accept Accept10 Accept Accept Accept Accept Accept Accept
From simulation Study also, we can see that Holm procedure is more powerful
than Bonferroni method because the bound for this method sequentially increases
whereas the Bonferroni bound remains fixed. Any hypothesis rejected by the original
Bonferroni procedure will also be rejected by the Holm procedure; the latter procedure
cannot have lower power for an individual hypothesis test. Any hypothesis rejected by
Holm’s procedure will always be rejected by Hochberg’s procedure. Hochberg
procedure is uniformly more powerful than the Holm procedure but, on the other hand,
it is uniformly less powerful than the Hommel procedure. The criterion used by the
Holland and Copenhaver procedure is slightly larger than Holm procedure thus leading
to slightly greater power for an individual hypothesis test.
Hommel procedure rejects more hypotheses than either the Rom or the
Holland-Copenhaver procedure; however the difference in the number of tests rejected
is very small. Rom gives slightly larger critical p-value that can be used with
Hochberg’s procedure, making it somewhat more powerful.
![Page 44: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/44.jpg)
163
Table 27: Comparison of Multiple Comparison procedure with adjusted p value
TestSS/SW Based on
Modificationof Critical Value Remarks
Bonferroni(1961)
SSBonferroniinequality
____ α/cPlanned contrasts,both simple and
complex.Holm (1979)
SDBonferroniinequality
Bonferroni 1 ic comparisons are
not independent
Holland(1987)
SDSidak
inequalityBonferroni
11)1(1 ic Positive orthantdependence
Hommel(1988)
SUSimes
inequality Holm α/j’When
comparisons areindependent
Hochberg(1988)
SUSimes
inequalityHolm 1 ic
When
comparisons areindependent
Rom(1990)
SUSimes
InequalityHochberg i
i
j
i
j
jijm
i
j
j
im
1
1
2
1
)('
1'
Whencomparisons are
independent
![Page 45: Chapter 5 Nonparametric Post Hoc Test - Shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/171827/13/13_chapter 5.… · Non parametric tests are simple and easy to understand.](https://reader035.fdocuments.net/reader035/viewer/2022081402/5f0b8aa87e708231d43106c0/html5/thumbnails/45.jpg)
164
Figure 6: Non Parametric Post Hoc Tests
By Adjusting p value (forequal and unequal sample
size)
Bonferroni
Holm
Holland & Copenhaver
Hommel
Hochberg
Rom
164
Figure 6: Non Parametric Post Hoc Tests
NON PARAMETRIC POSTHOC TEST
By Adjusting p value (forequal and unequal sample
size)
Bonferroni
Holm
Holland & Copenhaver
Hommel
Hochberg
Rom
Equal Sample Size
Nemenyi
Dunn control
Steel Dwass
Steel Control
Unequal Sample Size
164
Figure 6: Non Parametric Post Hoc Tests
Unequal Sample Size
Dunn Pairwise
Dunn Control