UNIVERSITI TEKNOLOGI MARA TEST OF NORMALITY: A POWER ...
Transcript of UNIVERSITI TEKNOLOGI MARA TEST OF NORMALITY: A POWER ...
UNIVERSITI TEKNOLOGI MARA
TEST OF NORMALITY: A POWER COMPARISON OF KOLMOGOROV-SMIRNOV, ANDERSON-DARLING, SHAPIRO-
WILK AND LILLIEFORS TESTS
NORNADIAH BINTI MOHD RAZALI
Project submitted in fulfillment of the requirements for the degree of
Master of Science (Applied Statistics)
Faculty of Computer and Mathematical Sciences
November 2009
CANDIDATE'S DECLARATION
I declare that the work in this dissertation was carried out in accordance with the
regulations of Universiti Teknologi MARA. It is original and is the result of my own
work, unless otherwise indicated or acknowledged as referenced work. This
dissertation has not been submitted to any other academic institution or non-
academic institution for any other degree or qualification.
In the event that my dissertation be found to violate the conditions mentioned above,
I voluntarily waive the right of conferment of my degree and agree to be subjected to
the disciplinary rules and regulations of Universiti Teknologi MARA.
Name of candidate
Candidate's ID No.
Programme
Faculty
Dissertation Title
Signature of Candidate Date .^rr^^. p . ^_^ g ^ 3p.°l
Nornadiah Binti Mohd Razali
2007131509
Master of Science (Applied Statistics)
Faculty of Computer and Mathematical Sciences
Test of Normality: A Power Comparison of
Kolmogorov-Smirnov, Anderson-Darling, Shapiro-
Wilk and Lilliefors Tests
COPYRIGHT © UiTM
APPROVED BY:
ASSOC. PROF. DR. YAP BEE WAH Supervisor
Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA
COPYRIGHT © UiTM
ABSTRACT
The importance of normal distribution is undeniable since it is an underlying assumption of many statistical procedures such as t-tests, linear regression analysis, discriminant analysis and Analysis of Variance (ANOVA). When the normality assumption is violated, interpretation and inference may not be reliable or valid. The three common procedures in assessing whether a random sample of independent observations of size n come from a population with a normal N(^,<r2) distribution are: graphical methods (histograms, box-plots, Q-Q plots), numerical methods (skewness and kurtosis indices) and formal normality tests. This study compares the power of four tests of normality: Shapiro-Wilk (SW) test, Kolmogorov-Smirnov (KS) test, Lilliefors (LF) test and Anderson-darling (AD) test. Power comparisons of these four tests were obtained via Monte Carlo simulation of sample data generated from alternative distributions that follow symmetric and asymmetric distributions. The significance levels considered are 5% and 10%. First, critical values for power comparisons were obtained based on 50000 simulated samples from a standard normal distribution. As the SW test is a left-tailed test, the critical values are the 100(a)111 percentiles of the empirical distributions of the SW test statistic. The AD, KS, and LF tests are right-tailed tests and the critical values are the lOOtl-a)411
percentiles of the empirical distribution of the respective test statistics. Then, 10000 samples each of size n = 10, 15, 20, 25, 30, 40, 50, 100, 200, 300, 400, 500, 1000, 1500 and 2000 were generated from each of the given alternative symmetric and symmetric distributions. The power of each test was then obtained by comparing the test of normality statistics with the respective critical values. Simulation results show that SW test is the most powerful test followed by AD and LF tests in detecting departures from the normality assumption while KS test is the least powerful test. This study also shows that LF test performs better than the KS test. For sample sizes « > 50, the performance of SW and AD tests are quite similar. Results also show that KS and LF tests require large sample size (at least 2000 or more) to achieve similar power with SW and AD tests.
i
COPYRIGHT © UiTM
ACKNOWLEDGEMENTS
In The Name of Allah, The Most Gracious, The Most Merciful
Praise to Allah for His blessing until I managed to complete this report successfully.
My highest gratitude and appreciation goes to my supervisor, Assoc. Prof. Dr. Yap Bee Wah for her guidance throughout the period of study. She is always there to assist me whenever I need help. She helped me to go through many obstacles that I have faced while completing this study. I feel honored to be given an opportunity to work with her.
I would also like to extend my appreciation to all Msc. (Applied Statistics) lecturers who have been very supportive and encouraging. I would also like to thank them for their opinions and suggestions during the proposal defense to further improve my dissertation.
To my dearest fianc6 and family, thanks for being so understanding that I have to sacrifice the valuable time which is supposed to be spent with them to complete this study. Thanks also for their prayers to see me succeed in this study.
Finally, I would like to thank my friends for their assistance, support and comments on my dissertation. I appreciate every single help from them whether directly or indirectly.
ii
COPYRIGHT © UiTM
TABLE OF CONTENTS
ABSTRACT
ACKNOWLEDGEMENTS
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
LIST OF APPENDICES
LIST OF ABBREVIATIONS
1
ii
iii
v
vii
x
xi
CHAPTER 1 INTRODUCTION
1.1 Background of Study
1.2 Problem Statement
1.3 Research Questions
1.4 Objectives of Study
1.5 Significance of Study
1.6 Scope and Limitations of Study
1.7 Layout of Report
1
1
5
5
6
6
7
7
CHAPTER 2 LITERATURE REVDZW 8
2.1 Introduction 8
2.2 A Review of Various Tests of Normality 11
2.2.1 Skewness and Kurtosis Coefficients 12
2.2.2 Empirical Distribution Function (EDF) Tests 18
2.2.3 Regression and Correlation Tests 24
2.2.4 Moment Tests 27
2.2.5 Other Normality Tests 32
2.3 Test of Normality in Statistical Software 35
2.4 Some Previous Studies on the Power Comparison of 38
Various Tests of Normality
iii
COPYRIGHT © UiTM
2.5 A Review of Properties of Alternative Distributions 44
2.5.1 Normal Distribution 44
2.5.2 Chi-square Distribution 47
2.5.3 Student's t Distribution 49
2.5.4 Uniform Distribution 51
2.5.5 Beta Distribution 55
2.5.6 Gamma Distribution 51
2.5.7 Laplace Distribution 57
CHAPTER 3 METHODOLOGY
3.1 Introduction
3.2 Simulation Methodology
3.3 List of Alternative Distributions
61
61
61
64
CHAPTER 4 RESULTS AND ANALYSIS 69
4.1 Introduction 69
4.2 Comparison of Type I Error against the Standard 71
Normal Distribution
4.3 Comparison of Power against the Symmetric Non- 73
normal Distributions
4.4 Comparison of Power against the Asymmetric 83
Distributions
CHAPTER 5
REFERENCES
APPENDICES
CONCLUSIONS AND RECOMMENDATIONS
5.1 Introduction
5.2 Conclusions
5.3 Guidelines for Practitioners
5.4 Recommendations
96
96
96
98
101
103
108
iv
COPYRIGHT © UiTM
LIST OF TABLES
Table 2.1 Descriptive Statistics for Variable Previous Experience of 16 Employee Data
Table 2.2 Summary of Methods to Estimate Skewness and Kurtosis 17
Table 2.3 Availability of Normality Test in Different Statistical Software 35
Table 2.4 Commands for Performing Test of Normality in Different 37 Software
Table 2.5 Summary of Results from Previous Studies 41
Table 2.6 Summary of the Properties of Alternative Distributions 59
Table 3.1 Steps involved in the Simulation Study 63
Table 3.2 Characteristics of the Standard Normal Distribution 64
Table 3.3 List of Symmetric Non-normal Distributions with their 65 Corresponding Characteristics
Table 3.4 List of Asymmetric Distributions with their Corresponding 67 Characteristics
Table 4.1 Empirical Critical Values (a = 0.05) 70
Table 4.2 Empirical Critical Values (a = 0.10) 70
Table 4.3 Comparison of Type I Error for Different Normality Tests 72 against the Standard Normal Distribution
v
COPYRIGHT © UiTM
Table 4.4 Comparison of Power for Different Normality Tests against the 74 Symmetric Non-normal Distributions
Table 4.5 Comparison of Power for Different Normality Tests against the 84 Asymmetric Distributions
Table 4.6 Rank of Power Based on Type of Alternative Distribution 94
Table 4.7 Rank of Power Based on Sample Size for All Alternative 94 Distribution
vi
COPYRIGHT © UiTM
LIST OF FIGURES
Figure 1.1 The Shape of the Standard Normal Distribution 2
Figure 2.1 The Development of Tests of Normality in Chronological 10 Order
Figure 2.2(a) Negatively Skewed Distribution 13
Figure 2.2(b) Positively Skewed Distribution 13
Figure 2.3 Various Forms of Kurtosis - (a)Platykurtic (b)Mesokurtic 15 (c)Leptokurtic
Figure 2.4 Classification of Tests of Normality Based on the Statistical 34 Literature
Figure 2.5 A Plot of Standard Normal Distribution 46
Figure 2.6 A Plot of Chi-square Distribution with Different Skewness 48 and Kurtosis
Figure 2.7 A Plot oft Distribution with Various Kurtosis Values 50
Figure 2.8 A Plot of Uniform Distribution 52
Figure 2.9 Plot of Beta Distribution with Different Skewness and 54 Kurtosis
Figure 2.10 A Plot of Gamma Distribution with Different Skewness and 56 Kurtosis
Figure 2.11 A Plot of Laplace Distribution with Different Parameters 58
vii
COPYRIGHT © UiTM
Figure 4.1 Comparison of Power for Different Normality Tests against 73 Standard Normal Distribution (a = 0.05)
Figure 4.2 Comparison of Power for Different Normality Tests against 80 Uniform Distribution (a = 0.05)
Figure 4.3 Comparison of Power for Different Normality Tests against 80 Beta (2,2) Distribution (a = 0.05)
Figure 4.4 Comparison of Power for Different Normality Tests against 81 t (300) Distribution (a = 0.05)
Figure 4.5 Comparison of Power for Different Normality Tests against 81 t (10) Distribution (a = 0.05)
Figure 4.6 Comparison of Power for Different Normality Tests against 82 t (7) Distribution (a = 0.05)
Figure 4.7 Comparison of Power for Different Normality Tests against 82 Laplace (0,1) Distribution (a = 0.05)
Figure 4.8 Comparison of Power for Different Normality Tests against 83 t (5) Distribution (a = 0.05)
Figure 4.9 Comparison of Power for Different Normality Tests against 90 Beta (6, 2) Distribution (a = 0.05)
Figure 4.10 Comparison of Power for Different Normality Tests against 90 Beta (2, 1) Distribution (a - 0.05)
Figure 4.11 Comparison of Power for Different Normality Tests against 91 Beta (3, 2) Distribution (a = 0.05)
Figure 4.12 Comparison of Power for Different Normality Tests against 91 Chi-square (20) Distribution (a = 0.05)
Figure 4.13 Comparison of Power for Different Normality Tests against 92 Gamma (4,5) Distribution (a = 0.05)
viii
COPYRIGHT © UiTM
Figure 4.14 Comparison of Power for Different Normality Tests against 92 Chi-square (4) Distribution (a = 0.05)
Figure 4.15 Comparison of Power for Different Normality Tests against 93 Gamma (1,5) Distribution (a = 0.05)
Figure 4.16 Illustration of the Ranking 93
Figure 5.1 Guidelines for Choosing Test of Normality Based on Type 99 of Distribution
Figure 5.2 Guidelines for Choosing Test of Normality Based on Sample 100 Size
COPYRIGHT © UiTM
LIST OF APPENDICES
APPENDIX Al Plot of Power for a = 0.10 (Standard Normal Distribution) 108
APPENDIX A2 Plot of Power for a = 0.10 (Symmetric Distributions) 109
APPENDIX A3 Plot of Power for a = 0.10 (Asymmetric Distributions) 113
APPENDIX Bl Rank of Power for the Different Tests of Normality against 117 Symmetric Distributions
APPENDIX B2 Rank of Power for the Different Tests of Normality against 121 Asymmetric Distributions
APPENDIX CI Sample FORTRAN Code 1: To Obtain the Critical Values 125
APPENDIX C2 Extra Notes: Required Modifications for Other 137 Significance Level
APPENDIX C3 Sample FORTRAN Code 2: To Obtain the Power of Tests 140
APPENDIX C4 Extra Notes: Required Modifications for Other 153 Distributions
APPENDLX D SPSS Output 161
x
COPYRIGHT © UiTM
LIST OF ABBREVIATIONS
sw KS
AD
LF
JB
EDF
CDF
SES
SEK
Shapiro-Wilk
Kolmogorov-Smirnov
Anderson-Darling
Lilliefors
Jarque-Bera
Empirical distribution function
Cumulative distribution function
Probability density function
Standard error of skewness
Standard error of kurtosis
XI
COPYRIGHT © UiTM
CHAPTER 1
INTRODUCTION
1.1 Background of Study
When carrying out statistical analysis using parametric methods, the assumption of
normality is a fundamental concern for the analyst. As a statistician, we often
conclude that 'the data are normal' or 'not normal' based on some test of normality
results. For those without statistical background, this statement might be
questionable. By definition, normal data are data that come from a population that
has a normal distribution. The normal distribution is also referred to as Gaussian or
bell-shaped distribution.
If X is a random variable which comes from a population with normal
distribution, or in notation, X~N(ji, <r2), then the probability distribution of X is
(Hogg & Tanis, 2006),
f(x) = ^ e - ! ^ 2 ; -oo < x < ao (1.1)
where JU and a are the mean and the standard deviation of the distribution,
respectively. This well-known distribution takes the form of a symmetric bell-shaped
curve. The illustration of this distribution can be seen in Figure 1.1. As can be seen
from the figure, the mean of the distribution is the point at the centre of the curve
1
COPYRIGHT © UiTM
whereas the standard deviation describes the spread of the curve or the variation of
the data points around the mean.
0.4
0.35
0.3
I" i:
0.1
0.05
0
Standard Normal PDF
' ' / ,
/ /
/
/
/ "
.
\ 1
\
s~^v V > -4 -3 -2 -1 1 2 3 4 J
U
Figure 1.1: The Shape of the Standard Normal Distribution
The importance of normal distribution is undeniable since it is an underlying
assumption assumed of many statistical procedures. It is also the most frequently
used distribution in statistical theory and applications. Though it is important for
certain statistical procedures to assume that data should come from a normal
distribution, but in real life it is indeed impossible for the data to be perfectly normal.
Geary (1947) suggested that in front of all statistical texts should be printed,
"Normality is a myth. There never was and will never be a normal distribution."
Hart and Hart (2002) agreed with Geary and emphasized that the normal distribution
never reflects real life data since it yields values that range from minus infinity to
-
COPYRIGHT © UiTM
plus infinity. They also added that the data should only necessary to be
approximately normal instead of perfectly normal.
Assessing the assumption of normality is required by most statistical procedures.
Parametric statistical analysis is one of the best examples to show the importance of
assessing the normality assumption. Parametric statistics are those which assume a
certain distribution of the data, usually the normal distribution. If the assumption of
normality is violated, interpretation and inference may not be reliable or valid. Some
of the most well-known statistical tests such as t-test, F-test and Analysis of Variance
(ANOVA) are parametric tests. Parametric statistical analysis is more powerful
compared to non-parametric statistical analysis.
Since the assumption of normality is fundamental to the use of many statistical
tests and inferences, therefore it is important to check for this assumption before
proceeding with any relevant statistical procedures. Testing the normality of the data
should be the first step of any analysis which requires the assumption of normality.
Basically, there are three common ways to check the normality assumption. The
easiest way is by using the graphical method. The normal quantile-quantile plot (Q-
Q plot) is the most commonly used and effective diagnostic tool for checking
normality of the data. Other common graphical methods that can be used to assess
the normality assumption include histogram, box-plot and stem-and-leaf plot. Even
though the graphical methods can serve as a useful tool in checking normality for
sample of n independent observations, they are still not sufficient to provide
conclusive evidence that the normal assumption holds. This method is very
3
COPYRIGHT © UiTM
subjective as it lies in the eyes of the analyst. What seems 'normal' to one analyst
may not necessarily be so for others. In fact, experience and good statistical
knowledge are needed in order to interpret the graph.
Therefore, to support the graphical methods, a more formal method which is the
numerical methods and formal normality tests should be performed before making
any conclusion about the normality of the data. Our judgement on the normality of
the data will be much improved by combining the graphical methods, numerical
methods and normality tests. The numerical methods include the skewness and
kurtosis coefficients whereas for the normality tests, there are various procedures
available for testing the assumption of normality. However, the most common
normality test procedures available in most statistical software are the Shapiro-Wilk
test, Kolmogorov-Smirnov test, Lilliefors test and Anderson-Darling test. The focus
of this study is to compare the power of several normality tests via Monte Carlo
simulation. Further discussions on these normality tests are available in the following
chapter. The problem statement is given in Section 1.2 while the research questions
and objectives are stated in Section 1.3 and Section 1.4, respectively. Section 1.5
states the significance of the study while the scope and limitations are explained in
Section 1.6.
4
COPYRIGHT © UiTM
1.2 Problem Statement
There are significant amount of tests of normality available in the literature. Some of
these tests can only be applied under a certain condition or assumption. Moreover,
different test of normality often produce different results i.e. some test reject while
others fail to reject the null hypothesis of normality. The contradicting results are
misleading and often confuse practitioners. Therefore, the choice of test of normality
to be used should indisputably be given tremendous attention. The preparation of
some guidelines will be very helpful to solve this problem. However, the guidelines
provided in the literature especially in terms of the power of the test are still
ambiguous and contradictory. Therefore, the purpose of this study is to provide
knowledge and guidelines on the choice of normality tests which will be useful for
practitioners.
1.3 Research Questions
This study seeks answers to the following questions:
1. What are the characteristics of various tests of normality available in the
statistical literature?
2. How does the performance of Kolmogorov-Smirnov test, Anderson-Darling
test, Shapiro-Wilk test and Lilliefors test varies among different sample sizes
and alternative distributions?
3. What are the guidelines that should be followed in choosing the test of
normality? 5
COPYRIGHT © UiTM
1.4 Objectives of Study
The main objectives of the study include:
1. To investigate the characteristics of the tests of normality identified in
statistical literature.
2. To perform a simulation study to compare the performance of the
Kolmogorov-Smirnov test, Anderson-Darling test, Shapiro-Wilk test and
Lilliefors test of normality.
3. To provide guidelines to practitioners on the choice of normality test.
1.5 Significance of Study
The various tests of normality revealed through this study may increase the
awareness of the practitioners on more recent and perhaps better test of normality
than the ones most commonly used. At the end of this study, it is hoped that the
results will be able to provide some guidelines to practitioners on the choice of test of
normality. This study is also expected to provide a clearer idea on the test of
normality that should be used under different sample size conditions. Finally, this
study should be able to provide some idea to software developers such as the SAS
Institute and SPSS Inc. on new tests of normality that should be included in their
software.
6
COPYRIGHT © UiTM
1.6 Scope and Limitations of Study
Due to the time constraints, the scope of this study has been narrowed down. For the
first objective of the study, only some normality tests that were found to be most
well-known in the statistical literature were explained in the report. For the
simulation study, only four most commonly available tests of normality in statistical
software packages were considered: Kolmogorov-Smirnov test, Anderson-Darling
test, Shapiro-Wilk test and Lilliefors test. The Monte Carlo simulation was carried
out using FORTRAN programming language since it was the only software which
had the subroutines for all the four tests.
1.7 Layout of Report
This chapter provides the background of the study and defines the problem or main
issues which are the main concern of this study. In addition, the research questions
and objectives are also stated. The significance as well as the scope and limitations of
the study are also discussed. Chapter 2 presents a review of relevant literature,
including a summarization of previous studies on normality test and the development
of normality test statistics. Chapter 3 discusses in detail the Monte Carlo simulation
methodology for power comparisons of the normality tests. The algorithms involved
in this simulation study are described here. The simulation results are presented in
Chapter 4. Some analyses to rank the power of the tests are also conducted and
presented in this chapter. Finally, Chapter 5 includes a discussion of the results and a
conclusion based on the findings obtained. The strategy for assessing normality and
recommendations for future research are also proposed in this final chapter.
7
COPYRIGHT © UiTM
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
There are nearly 40 tests of normality available in the statistical literature (Dufour et
al., 1998). The effort of developing techniques to detect departures from normality
has begun as early as the late 19th century. This effort was initiated by Pearson
(1895) who worked on the skewness and kurtosis coefficients (Althouse et al, 1998).
In 1900, Pearson extended his work and introduced the chi-square test of normality.
Kolmogorov and Smirnov then introduced the Kolmogorov-Smirnov test of
normality in 1933. Conover (1999) stated that the Cramer-von Mises test was
developed based on the contributions by Cramer (1928), von Mises (1931) and
Smirnov (1936). In 1954, Anderson and Darling proposed their test which was the
modification of the Cramer-von Mises test (Farrel & Stewart, 2006).
According to Yazici and Yolacan (2007), another test of normality was proposed
in 1962 which was called the Kuiper test. This was followed by the introduction of
the Shapiro-Wilk test in 1965. In 1967, the Kolmogorov-Smirnov test was modified
by Lilliefors. Lilliefors test of normality differs from that of the Kolmogorov-
Smirnov whereby the parameters of the hypothesized distribution are estimated
rather than initially specified (Abdi & Molin, 2007).
8
COPYRIGHT © UiTM
Then, in 1971, D'Agostino proposed an omnibus test for moderate and large size
samples, or widely known as D'Agostino test. This test is similar to the Shapiro-Wilk
test which is based on regression (Coin and Corradetti, 2006). However, it requires
no table of weights and it can be used for sample size of more than 50 (D'Agostino,
1971). Shapiro and Francia (1972) then developed another test of normality which
was a modification of the Shapiro-Wilk test. The D'Agostino-Pearson (1973) test
that took into consideration both the skewness and kurtosis values was proposed in
the following year. Another test of normality which was known as Vasicek test
emerged in 1976. This test was constructed based on the sample entropy of the data.
Jarque-Bera test which was also based on skewness and kurtosis was introduced in
1987.
Royston (1982a) modified the Shapiro-Wilk test to broaden the restriction of the
sample size to 2000 as the original test was only limited to sample size only up to 50.
Royston (1982b, c) provided algorithm AS 181 in FORTRAN 66 for computing the
SW test statistic and p-value for sample sizes 3 to 2000. Later, Royston (1992)
observed mat Shapiro-Wilk's (1965) approximation for the weights a used in the
algorithms was inadequate for n > 50. He then gave an improved approximation to
the weights and provided algorithm AS R94 (Royston, 1995) which can be used for
any n in the range 3<n<5000. Rahman and Govindarajulu (1997) proposed a
modification of the Shapiro-Wilk test and claimed that the computation of their
statistic was much simpler than computing Royston's (1982a) approximation.
9
COPYRIGHT © UiTM
In the same year, Declercq and Duvaut introduced another test which was based
on the Hermite polynomial. The history of the test of normality discussed above is
based on the articles acquired during the period of this study. It should be noted that
there may be more tests of normality that emerged in between or after the period
mentioned above. Figure 2.1 shows the development of tests of normality in
chronological order.
<1930's
1930's
1950's
1960's
1970's
I980's
1990's
Skewness and kurtosis coefficients (1895) Chi-square test (1900)
• Kolmogorov-Smirnov test (1933) • Cramer-von Mises test (1928,1931,1936)
Anderson-Darling test (1954)
Kuiper test (1962) Shapiro-Wilk test (1965) Lilliefors test (1967)
• D'Agostino test (1971) • Shapiro-Francia test (1972) • D'Agostino-Pearson test (1973) • Vasicek test (1976)
• Jarque-Bera test (1987)
Modified Shapiro-Wilk - Royston (1995) Modified Shapiro-Wilk - Rahman and Govindarajulu (1997) Hermite test (1997)
Figure 2.1: The Development of Tests of Normality in Chronological Order
ID
COPYRIGHT © UiTM