A Note on Approximating the p-values of the k-sample Modified Baumgartener Statistic
-
Upload
shirley-wang -
Category
Documents
-
view
212 -
download
0
description
Transcript of A Note on Approximating the p-values of the k-sample Modified Baumgartener Statistic
www.srl-journal.org Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013
96
A Note on Approximating the p-values of the k-sample Modified Baumgartener Statistic Augustine Wong
Department of Mathematics and Statistics, York University 4700 Keele Street, Toronto, Ontario, Canada. M3J 1P3 [email protected]
Abstract
Murakami et al. (2009) proposed a k-sample modified Baumgartner statistic (Vk) to test the equality of k independent distributions. In this paper, the Barndorff-Nielsen formula was proposed to approximate p-values from the limiting distribution of Vk. The main advantages of the proposed method are the efficiency in computations and the implementation simplicity to standard statistical software.
Keywords
Anderson-Darling Test; Barndorff-Nielson Formula; Lugannani and Rice Formula; Saddlepoint Approximation; Singularity
Introduction
Let xjh be the hth observation from the jth population with cumulative distribution function Fj, where h = 1, …, nj and j = 1, …, k. Moreover, Rjh is denoted as the combined-ranks of the k random samples. Assume that F1, …, Fk are independent.
For testing H0: F1 = … = Fk vs Ha: not all equal,
Murakami et al. (2009) proposed the k-sample modified Baumgartner statistic
where
and . They showed that the limiting distribution of Vk is a weighted Chi-square distribution with k-1 degrees of freedom and the corresponding characteristic function of Vk is
Note that c2(t) is the asymptotic characteristic function of the Anderson-Darling test statistic given in Anderson and Darling (1954).
Following the derivation of the limiting distribution of the Anderson-Darling test statistic in Anderson and Darling (1952), Murakami et al. (2009) derived the limiting distribution of V3. Moreover, they pointed out the problems associated to the methodology and recommended the saddlepoint method used in Giles (2001) with the characteristic function (1), to approximate the limiting distribution of Vk for k ≥ 4. More specifically, the moment generating function for the limiting distribution of Vk is written as
and the corresponding cumulant generating function is
By applying the Lugananni and Rice (1980) formula, we have
where and are the density and cumulative distribution functions of the standard normal distribution, respectively,
and is the saddlepoint satisfying
It is well-known that the Lugannani and Rice (1980) method has third-order accuracy. Another well-known fact is that the Lugannani and Rice formula has a singularity point at . Daniels (1987) provided a formula to calculate at the singularity which required the third derivative of . Numerically,
Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013 www.srl-journal.org
97
the calculations are unstable in the neighborhood of the singularity point, which may result in having
outside the interval [0, 1]. Fraser et al. (2003) gives a bridging formula to smooth the approximated cumulative distribution function obtained by the Lugannani and Rice formula.
Murakami et al. (2009) also considered using the characteristic function (2) and applied the Lugannani and Rice formula to approximate . In Section 2, we will point out Murakami et al. (2009) did not use the complete moment generating function of Vk. Upon correcting the mistake, an althernate third order method is proposed to approximate . This proposed approximation still has the singularity point at , but for all other values of v, it will always be within [0, 1]. Numerical results are presented in Section 3, and some concluding remarks are given in Section 4.
Alternate Method of Obtaining the Asymptotic Cumulative Distribution Function of Vk
Since
we have
Hence the asymptotic mgf for Vk based on (2) can be written as
and the corresponding cumulant generating function is
Note that for -1/8 < t < 1 the moment generating function of Vk given above is equation (2.2) of Murakami et al. (2009). In other words, the complete moment generating function is exclusive from consideration.
The Lugannani and Rice formula can be applied to approximate . Computationally, using (1) has the advantage that the derivatives of the cumulant generating function can be obtained easily. However, it involves infinite summation and this leads to two problems: slower computation as a large upper limit of the summation has to be used, and problem in solving for the saddlepoint. Although the derivatives
of the cumulant generating function obtained from (2) are more complicated, the saddlepoint computation is more straight forward and can be easily handled by simple numerical methods. In the Appendix, a sample R code is given to illustrate the simplicity of the proposed calculations, without inputing the explicit forms of the derivatives of the cumulant generating function.
Since the Lugannani and Rice formula may give outside the range of [0, 1], an alternate
approximation is preferred. In literature, there has many other methods that have third-order accuracy among which, the method developed in Barndorff-Nielsen (1981, 1986) is proposed because it has the same input as the Lugannani and Rice formula. More specifically, the Barndorff-Nielsen formula takes the form
where r and u are defined in (6) and (7) respectively.
Jensen (1992) showed that the Lugannani and Rice formula and the Barndorff-Nielsen formula are asymptotically equivalent up to third-order of accuracy. Notice approximated by the Barndorff-Nielsen formula will always be between [0, 1].
Numerical Study
For k = 2, Lewis (1961) used extensive Monte Carlo simulation to provide critical points for the Anderson-Darling test. His result is generally treated as the bench mark of all the other approximations. Table 1
www.srl-journal.org Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013
98
records obtained from 1. Lewis (1961) method 2. Giles-saddlepoint approximation proposed in
Giles (2001) which is the same as that obtained in Murakami et al. (2009) with k = 2
3. LR-Lugannani and Rice formula using (8) as the mgf for V2
4. BN - Barndorff-Nielsen formula using (8) as the mgf for V2.
The three approximations (Giles, LR, and BN) are almost identical to the results obtained by Lewis (1961).
As in Murakami et al. (2009), we compare obtained from
1. Limiting distribution-method derived from Murakami et al. (2009)
2. Murakami-saddlepoint approximation proposed in Murakami et al. (2009)
3. LR-Lugannani and Rice formula using (8) as the mgf for V3
4. BN-Barndorff-Nielsen formula using (8) as the mgf for V3.
Results are recorded in Table 2. As expected, all the methods give very similar results.
Table 3 records the critical value v obtained from the methods discussed in this paper for various k. Again the results are almost indifferentiable.
Although numerical results obtained by the proposed method are almost identical to those obtained by Maurakami et al. (2009), the advantages of the proposed method are the efficiency in computation because it does not require infinite summation and the simplicity in implementation of the proposed method to standard statistical software such as R. Moreover, the preference of the Barndorff-Nielsen formula over the Lugannani and Rice formula is that
calculated from the Barndorff-Nielsen formula will always be between [0, 1] whereas the Lugannani and Rice formula may produce results beyond this range. Theoretically, both methods have the singularity point but the Daniels (1987) result can be applied to obtain at the singularity point.
Conclusion
In this paper, we corrected the mistake in Maurakami et al. (2009) and proposed an alternate way to approximate the limiting distribution of the k-sample modified Baumgartner statistic to test the equality of k independent distributions. The proposed method is more efficient in computation and can be easily implemented to commonly used statistical software such as R.
Appendix
This is the R code to approximate P(Vk > v) = P(Vk > 3.39934). # specify k and v k <- 3 v <- 3.39934 # cumulant generating function and its first two derivatives cgf <- function(s) { if (s < -0.125) (k-1)/2*log(-2*pi*s/cosh(pi/2*sqrt(-1-8*s))) else (k-1)/2*log(-2*pi*s/cos(pi/2*sqrt(1+8*s))) } dcgf <- function(s){ if (s < -0.125) eval(D(expression((k-1)/2*log(- 2*
Statistics Research Letters (SRL) Volume 2 Issue 4, November 2013 www.srl-journal.org
99
pi*s/cosh(pi/2*sqrt(-1-8*s)))), "s")) else eval(D(expression((k-1)/2*log(-2* pi*s/cos(pi/2*sqrt(1+8*s)))), "s")) } d2cgf <- function(s) { if (s < -0.125) eval(D(D(expression((k-1)/2*log(-2* pi*s/cosh(pi/2*sqrt(-1-8*s)))), "s"), "s")) else eval(D(D(expression((k-1)/2*log(-2* pi*s/cos(pi/2*sqrt(1+8*s)))), "s"), "s")) } # solving for saddlepoint if (v <= 1) { t1 <- -0.2 f1 <- dcgf(t1) – v tol <- 0.000001 error <- 1 while (abs(error) > tol) { t0 <- t1 f0 <- f1 t1 <- t0 - f0/d2cgf(t0) f1 <- dcgf(t1) – v error <- t1-t0 } that <- t1 } if (v > 1) that <- uniroot(function(t0) dcgf(t0)-v, lower=0.1, upper=0.99)$root # proposed methods r <- sign(that)*sqrt(2*(that*v - cgf(that))) u <- that*sqrt(d2cgf(that)) cdflr <- pnorm(r)-dnorm(r)*(1/u-1/r) cdfbn <- pnorm(r + log(u/r)/r) print(cbind(that, v, 1-cdflr, 1-cdfbn))
REFERENCES
Anderson, T.W. and D.A. Darling. “Asymptotic Theory of
Certain Goodness-of-Fit Criteria Based on Stochastic
Processes.” Annals of Mathematical Statistics 23 (1952):
193-212.
Anderson, T.W. and D.A. Darling. “A Test of Goodness of
Fit.” Journal of the American Statistics Association 49
(1954): 765-769.
Barndorff-Nielsen, O.E. “Inference on Full and Partial
Parameters, Based on the Standardized Signed Log-
Likelihood Ratio.” Biometrika 73 (1986): 307-322.
Barndorff-Nielsen, O.E. “Modified Signed Log-Likelihood
Ratio.” Biometrika 78 (1991): 557-563.
Daniels, H.E. “Tail Probability Approximations.”
International Statistics Review 55 (1987): 37-48.
Fraser, D.A.S., N. Reid, R. Li and A. Wong A. “P-Value
Formulas from Likelihood Asymptotics: Bridging the
Singularities.” Journal of Statistical Research 37 (2003): 1-
15.
Giles, D.A.E. “A Saddlepoint Approximation to the
Distribution Function of the Anderson-Darling Test
Statistic.” Communications in Statistics: Simulation and
Computation 30 (2001): 899-905.
Jensen, J.L. “The Modified Signed Log Likelihood Statistic
and Saddlepoint Approximations.” Biometrika 79 (1992):
693-704.
Lewis, P.A.W. “Distribution of the Anderson-Darling
Statistic.” Annals of Mathematical Statistics 32 (1961):
1118-1124.
Lugannani, R. and S.O. Rice. “Saddlepoint Approximation
for the Distribution of the Sum of Independent Random
Variables.” Advance Applied Probability 12 (1980): 475-
490.
Murakami, H., T. Kamakura and M. Taniguchi. “A
Saddlepoint Approximation to the Limiting Distribution
of a k-Sample Baumgartner Statistic.” Journal of the
Japan Statistical Society 39 (2009): 133-141. Augustine Wong is currently a professor in the Department of Mathematics and Statistics of York University, Toronto, Ontario, Canada.