7Tests of Outliers

CHAPTER 6

TESTS FOR OUTLIERS

"Actually, the rejection of "outlying" observations may be just as mucha practical (or common sense) problem as a statistical one and sometimesthe practical or experimental viewpoint may naturally outweigh any statis-tical contributions."

F.E. Grubbs, 1950

There is perhaps as much or more interest in identifying outliers in a samplethan there is for determining whether a sample is normal. Outlier tests areseldom used as tests for normality per se. Their usual function is theidentification of an observation(s) which may have undue influence on anestimation or testing procedure; what to do with them once we find themis sometimes open to debate.

6.1 On the Need to Identify Outliers

Usually when performing a statistical test or estimation procedure, weassume that the data are all observations of iid random variables, oftenfrom a normal distribution. Sometimes, however, we notice in a sample oneor more observations that stand out from the crowd. These observationsare commonly called outliers.

Copyright © 2002 by Marcel Dekker, Inc. All Rights Reserved.

Where do outliers come from, why do we care about them, and whatcan we do if we have them?

Outliers can stem from a variety of sources. Some of the most commonsources are:

1. Purely by chance. In normal samples, about one out of every 20 ob-servations can be expected to be greater than 2 standard deviationsfrom the mean, about one in a hundred can be 2.5 standard deviationsfrom the mean. Presence of an outlier does not necessarily mean thatthere is a problem.

2. Failure of the data generating process. In this instance, outlier identi-fication may be of extreme importance in isolating a breakdown in anexperimental process.

3. A subject being measured may not be homogeneous with the othersubjects. In this case, we would say the sample comes from a con-taminated distribution, i.e., the sample we are observing has beencontaminated with observations which do not belong.

4. Failure of the measuring instrument, whether mechanical or human.5. Error in recording the measurement, for example, on the data sheet

or log, or in computer entry.David and Paulson (1965) summarized why outliers should be identi-

fied:

(1) to screen the data for further analysis;(2) to flag problems in the data generation process (see (2) above);(3) the extreme points may be the only points of real interest.

What do we do if we have outliers? If possible, we should try to iden-tify the reason for being so extreme. It is of utmost importance that thefirst thing to be done when identifying an observation as an outlier that thedata be checked as to their accuracy in recording and data entry. This isusually, but not always, the easiest data verification that can be made. Forexample, Gibbons and McDonald (1980a, 1980b) performed some diagnos-tics on the regression of mortality rates on socioeconomic and air pollutiondata for some Standard Metropolitan Statistical Areas (SMSA's), as pre-viously reported by Lave and Seskin (1970, 1977). SMSA's which had highinfluence on one or more coefficients in the resulting regression were iden-tified. A check of the socioeconomic data from an easily obtainable source(U.S. Bureau of the Census, 1973) showed that some of the data for one ofthe outlier SMSA's (Providence, Rhode Island) had had the decimal pointtransposed one place to the left.

If an identified outlier(s) has been caused by data recording error, thevalue should be corrected and analysis can proceed. If it is determinedthat the data have been recorded correctly, investigation into other reasons


why an observation is extreme should be done if at all possible. If thedata have been investigated thoroughly and other sources of error havebeen identified, it may not be possible to recover the true data value (e.g.,a scale was found to be incorrectly calibrated). In this instance, the onlyalternative is to throw out the contaminated data. However, all data whichhave been subject to the same source of error should also be eliminated,not just the extreme values.

If no source of error is discovered, several alternatives are available: forexample, robust methods of analysis can be used (see Chapter 12). Otherways of accommodating outliers include (Tietjen, 1986): (1) removing theoutliers and proceeding with analysis; (2) removing the outliers and treat-ing the reduced sample as a censored sample; (3) Winsorize the outliers,that is, replace them with the nearest non-outlier value; (4) replace theoutliers with new observations; (5) use standard methods for analyzingthe data, both including and excluding the outliers, and reporting bothresults. Interpretation of results becomes difficult if the two results differdrastically.

Subjective or ad hoc methods of outlier identification are often used,such as graphical analysis or standardized observations. For exploratoryor descriptive analyses, these are usually sufficient. Histograms, stem-and-leaf plots or box plots can be used to pinpoint extreme values. Usingstandardized values and flagging those values of 2 or greater as possibleoutliers is common.

Outlier tests are more formal procedures which have been developedfor detecting outliers when a sample comes from a normal distribution andthere is reason to believe a "contaminated" observation or "slippage" hasoccurred. Outlier tests are also useful as general tests of normality in thedetection of skewed distributions with small samples (David, 1981). Inaddition, it has been shown (Thode, Smith and Finch, 1983) that certainoutlier tests are nearly as powerful or more powerful than general testsof normality when sampling from scale mixtures of normal distributions.However, comparison of outlier tests (except the range test) with testsof normality for general alternatives and comparison of outlier tests witheach other for outlier testing has been relatively rare (e.g., Ferguson, 1961;Johnson and Hunt, 1979; Thode, Smith and Finch, 1983; Thode, 1985).

Since an outlier test is often used as an "objective" method of identi-fying an outlier after already having examined the data themselves or aresometimes used sequentially without regard to the true a level of the se-quential test, Grubbs recommended that tests for outliers always be doneat a more conservative a level than the common 0.05 level.

Barnett and Lewis (1994) have presented a summary of tests for out-liers and their critical values, many of which are specific to the detection


of outliers in normal samples. They include tables of critical values formany variations of the same test, including those where one or both of theparameters are known. Here we will only consider those tests where bothmean and variance are unknown.

Several outlier tests also turn out to be optimal tests in some wayfor detecting a non-normal alternative other than those with slippage. Forexample, outliers may indicate a contaminated distribution with small con-taminating fraction p (relative to n), or may be indicative of a heavy-tailedor skewed distribution in a small sample. Therefore, outlier tests can alsobe used as tests for alternative distributions rather than simply as tests foraberrant observations.

6.2 Tests for k Specified Outliers

For certain outlier tests, it is known (or assumed) that the number ofpossible outliers in a sample is known a priori] in addition, some of thetests also assume knowledge of the direction of the outlying observation(s).In this section we describe outlier tests of these two types, which consistmainly of Grubbs' and Dixon's tests and extensions thereof.

When two or more outliers occur in a data sample, a test for a singleoutlier may not detect one outlier since the other outlier(s) inflates thevariance of the sample (in the case of tests like Grubbs' test or the rangetest) or the numerator (in the case of Dixon's test), thereby "masking"the detection of the largest outlier. For this reason, it is also necessary toconsider testing for more than one outlier. On the other hand, if k > 1outliers are tested for and there are less than k outliers, one of two thingsmay occur:

(1) The test may reject the null hypothesis of no outliers becauseof a large influence the tnie outliers have on the test statistic, therebyidentifying more outliers than there really are. This is an effect known asswamping.

(2) The true extreme values may not have enough influence to attain asignificant test statistic, so no outliers are identified. This can be thoughtof as reverse masking.

6.2.1 Grubbs' Test for a Single Outlier

Grubbs' (1950) test is perhaps the most widely known and intuitively obvi-ous test for identifying a single outlier in a sample of n observations. It alsoseems to have high potential as a test for normality against other types of


alternatives. As a test for an outlying observation, regardless of direction,Grubbs' test is given by the maximum deviation from the sample mean,

T = max(Ti,Tn)

where

\ X ( rt\ X Irp _ V (n) >-*- J~l "~~

S

where x and s are the sample mean and standard deviation, respectively.In the event that the direction of the possible outlier is known, Ti wouldbe used for identifying an extreme low observation, and Tn would be usedto identify an extreme high observation (Grubbs, 1950; 1969). In each casethe test statistic is compared to the upper a percentile of the distribution,with large values of the test statistic indicating the presence of an outlier.

Grubbs (1950) showed that Tn was identical to the ratio of the squaredsum of differences with and without the suspected outlier,

where xn is the mean of the sample excluding x^ny The analogous teststatistic for testing for a lower-tail outlier,

has a similar relation to TI .

Example 6.1. Thirteen average annual erosion rates (ra/year)were estimated for the East Coast states (Data Set 11). For thesedata, negative values indicate erosion and positive values indicateaccretion. Virginia seems to have a very high erosion rate com-pared to the other states (Figure 6.1). For these data,


D DD

, D D D

- 3 - 2 - 1 0 1

Expected Quantile

Figure 6.1 Q-Q plot of average annual erosion rates (m/yr) for 13 EastCoast states.

so that T = 2.74, which is just significant at the 0.01 level.

Uthoff (1970) showed that Ti and Tn were the MPLSI goodness of fittests for left and right exponential alternatives to the normal distribution,respectively (Section 4.2.4). Surprisingly, Box (1953) suggested that T is"like" the LRT for a uniform alternative to a normal distribution, implyingthat it is also a powerful test against short-tailed distributions. Whereasthe test statistic is compared to the upper critical value of the null distribu-tion (either TI , Tn or T, as appropriate) when testing for an outlier, testingfor a short-tailed or skewed distributions requires the use of the lower tailof the null distribution of the test statistic.

Critical values for T, TI and Tn (Grubbs and Beck, 1972) are given inTable B8.

6.2.2 Dixon Tests for a Single Outlier

Dixon's test statistic for a single upper-tail outlier (Dixon, 1950; 1951) isthe ratio of the distance between the two largest observations to the range,

Similarly, to test for a single lower-tail outlier,

rlQ =


is used. A test for a single outlier in an unknown direction is obtained byusing r = max.(riQ,r'lo). This is the simplest of the outlier tests computa-tionally.

Example 6.2. For the 15 height differences between cross- and self-fertilized Zea mays plants (Data Set 2), two lower values may beoutliers. As a test for a single lower-tail outlier,

-48 - (-67)75 - (-67)

r'10 = - / ::.' = 0.134

which does not exceed the 5% critical value of 0.339.

Upon inspection, it would not be expected that r'lo would identify anoutlier in Example 6.2 because of the masking effect of x^)- To avoidmasking, Dixon also proposed alternative tests of a type similar to T\Qwhich would bypass masking effects of other observations. The generalform of these test statistics (for detecting an upper-tail outlier) is definedby

which eliminates the influence of j — I other large observations and kextreme small observations. Dixon suggested tests using j = 1,2 andk = 0,1,2. To avoid a masking effect due to two upper outliers one shoulduse j = 2; k = 1 or 2 should be considered for robustness against lower-tail outliers or long tails. Similarly, test statistics r'^k are used for singlelower-tail outliers. In general, Dixon suggested using r\§ for very smallsamples, r^\ for sample sizes 8 to 13, and r^ for samples sizes greater than15. Critical values for TJ^ are given in Table B17.

Example 6.3. To test for a lower outlier while guarding againstthe masking effect of x^) for ^ne plant height differences,

14 - (-67)75 - (-67)

r'20 =:: ] ::: = 0.570.

which well exceeds the 1% critical value of 0.523; therefore, wehave identified x = —67 as a lower-tail outlier.


The sequential use of r'1Q and r20 in Examples 6.2 and 6.3 is indicativeof a masking effect, and hence there is the possibility of two outliers inthis sample. However, more formal sequential procedures for identifyingmultiple outliers are given below (Section 6.4).

6.2.3 Range Test

The range test has been discussed earlier (Section 4.1.2, 4.2.2) as the LRand MPLSI test for a uniform alternative. The range test statistic is

One of the earlier suggested uses for u was as a test for outliers and asan alternative to b? (David, Hartley and Pearson, 1954), where for someexamples they showed that u was identifying samples with a single outlier(in either direction) where b^ was not rejecting normality. Alternatively,Barnett and Lewis (1994) suggested the use of the range test as a test "ofa lower and upper outlier-pair .T(I), X(n) in a normal sample".

While the use of u for detecting short-tailed alternatives requires com-parison to the lower percentiles of the test statistic distribution, testingfor an outlier or upper/lower outlier pair requires comparison of u to theupper a percentile of the distribution (Table BIO).

6.2.4 Grubbs Test for k Outliers in a Specified Tail

Grubbs (1950) presented extensions of his test for a single outlier to a testfor exactly two outliers in a specified tail (S2_1 n/S

2 and Si2/S2 for two

upper and two lower outliers, respectively); this was subsequently extendedto a test for a specified number k, 1 < k < n, of outliers in a sample (Tietjenand Moore, 1972). Here it is assumed that we know (or suspect we know)how many observations are outliers as well as which of the tails contains theoutlying observations. The test, denoted Lk (and its opposite tail analogZ/£), is the ratio of the sum of squared deviations from the sample mean ofthe non-outliers to the sum of squared deviations for the entire sample,

(n — l),si2

where xn-k is the mean of the lowest n — k order statistics. This is the testfor k upper-tail outliers. Similarly,

En , , , (r.t;\ — r* , "l2

. . '•=LIL, —


- 3 - 2 - 1 0 1 2 3Expected Quantile

Figure 6.2 Q-Q plot of leukemia latency period in months for 20 patientsfollowing chemotherapy.

is the test statistic for k lower-tail outliers, where x^_k is the mean of thehighest n — k order statistics. Obviously, Lk and L£ are always less than1; if the variance of the subsample is sufficiently smaller than that of thefull sample, then there is an indication that the suspected observations areindeed outliers. Therefore, the test statistic is compared to the lower tail ofthe null test distribution, and the null hypothesis of no outliers is rejectedif Lk (LI.} is smaller than the appropriate critical value (Table B18).

Example 6.4. Data on the latency period in months of leukemiafor 20 patients following chemotherapy (Data Set 9) may contain3 large values (over 100 months), whereas the remaining caseshave no latency period greater than 72 (Figure 6.2). For this dataset,

Lk = Sl8tlgj20/S2 = 6724/30335 = 0.222

which is well below the 0.01 significance level of 0.300 for k = 3and n = 20.

6.2.5 Grubbs Test for One Outlier in Each Tail

Grubbs (1950) also presented a test for identifying one outlier in each tail,using

^-\n— \ / _2 \C2 /c-2 _ 2^i=2 (X(i) ~ Xl,n)

1 nl \—\n / -\o •


This is essentially L^ where the numerator is based on the n—2 observationsX(2) through .T(n_1). When k outliers are specified, with m < k in the lowertail and k — m, in the upper tail, a similar test can be defined using theappropriately trimmed numerator sum of squares. Critical values for thistest are given in Table B19.

6.2.6 Tietjen-Moore Test for k Outliers in One or Both Tails

Tietjen and Moore (1972) presented an outlier test statistic, denoted E^,which is similar to Lk but which examines the sample for the A: largestdeviations from the sample mean, without regard to which tail they comefrom, i.e., the outliers can be from either or both tails. To compute E^.,first it is necessary to obtain r^, the absolute deviation from the samplemean of each of the observations,

These residuals are sorted to obtain the order statistics of the r;, and theordered observations z^ are defined as the signed value of r^y The teststatistic for k outliers is then computed similarly to Z/&, using the z<i\,

(,;) - In_fc) :

where z^ are the deviations from the mean, ordered irrespective of sign, zis the mean of all of the z^ and zn-k is the mean of Z(\), 2(2), • • • , Z(n-fc)-

Since this test statistic is the ratio of a reduced sum of squares to afull sum of squares, the comparison to determine significance is whetherEk is less than the appropriate critical value (Table B20).

Example 6.5. For the 31 observations in the July wind data (DataSet 8) there is one observation below 5 rnph (3.8) and one above14 mph (17.1). These two observations happen to be the largestdeviations (in absolute value) from the sample mean of 9.01, withZ(si} = 8.09 and Z^Q) — —5.21. The full sample sum of squares ofz is 217.05, while the sum of squares based on the reduced sampleof 29 observations is 124-19. This results in the test statistic

E2 = 124.19/217.05 = 0.572.


Therefore, the reduction in the sum of squares is not quite suffi-cient, at the 0.05 level, to identify the two largest observations assimultaneous outliers, in comparison to the stated critical value of0.568.

6.3 Tests for an Unspecified Number of Outliers

In practice it is relatively rare that the number or the direction of theoutliers is known a priori. Suppose that there is more than one outlier in asample; then testing for a single outlier may give misleading results becauseof a masking effect. Tietjen and Moore's tests require that the number ofoutliers be specified in advance. These tests give misleading results if thereare more (or fewer) outliers than the number specified.

Certain tests have been suggested for outlier detection when the num-ber of outliers is unknown. Use of outlier labeling based on box plots canidentify a number of outliers, although this is an exploratory data analysistechnique and therefore cannot be used when a significance level for thetest is desired. Skewness and kurtosis have been suggested as outlier tests,but other types of departure from normality can also lead to significantresults; further, the number of outliers is not identified when using thosetests.

6.3.1 EDA - Outlier Labeling

Resistant outlier labeling is an exploratory data analysis technique (Tukey,1977) related to the description of data using box plots (Section 2.1).Hoaglin, Iglewicz and Tukey (1986) discussed the use of the inner fencesas a method of labeling possible outliers in an otherwise normal sample.

They used the definition of the lower and upper fourths (approximatequartiles, see Section 12.3.2) of the data to be those order statistics of asample defined by hi — x^fj and hu = X(n+1_^), where / = [(n + 3)/2]/2,brackets denoting the greatest integer less than or equal to the value. Thesefourths are the medians of the half-sample defined by splitting the data onthe median, including the median in each half-sample if n is odd. In thecase where / is noninteger, the fourths are taken as the average of the twoadjacent order statistics. The /-spread is then defined as df = hu — hi.

The identification of possible outliers is accomplished by labeling allobservation outside of the inner fences


MINIMUM MAXIMUM

-I X I-

21 31 35 41 50 72

Figure 6.3 Box plot of average water well alkalinity for 58 wells.

as "outside" , or possible outliers. Observations outside of the interval

are denoted "far outside" .In a normal population, Hoaglin, Iglewicz and Tukey (1986) showed

that for samples sizes between 20 and 75, 1-2% of the observations wouldbe expected to be outside, purely by chance; for samples of size 100 andover, less than 1% of the observations would be outside, with the expectedproportion under an infinite population being 0.7%.

The proportion of far outside values in a normal samples was shownto be less than 1% for sample sizes between 6 and 19, and less than 0.1%for sample sizes 20 and greater (Table B21). The population proportion offar outside values is 0.00023%.

Example 6.6. A box plot of the 58 well average alkalinity values(Data Set 10) is shown in Figure 6.3. The fourths of these data are31 and 41, so the inner fences are 31 — 15 = 16 and 41 + 15 = 56(truncated to the nearest inner observations, 21 and 50). Thebody of the data, both within the fourths and within the innerfences, is symmetric. There are three outside observations (5.2%)including one (1.7%) far outside observation. Under an assumednormal population, then, there is an overabundance of outside andfar outside values compared to what would be expected purely bychance.

Outlier labeling is resistant to masking, since the cutoff values arebased on centrally located order statistics, rather than on the variance.


Unlike the outlier detection methods discussed in the previous section orin the sequential tests discussed below, neither the number nor the max-imum number of outlying observations need to be specified in advance ofconducting the procedure. One other advantage of outlier labeling is thatthe number of outliers identified are not based on a preliminary inspectionof the data.

Other fence lengths are briefly discussed in Hoaglin, Iglewicz andTukey (1986), e.g., those based on l.Od/ and 2.0d/.

6.3.2 Skewness Test

Ferguson (1961) showed that when a normal sample with unknown meancontains some observations which have a shift in mean (also unknown),the locally best invariant single-sided test for outliers is the skewness test(Section 3.2.1). It is not necessary that all of the shifted observations havethe same mean, although it is required that they are all shifted in the samedirection. Note that for this test the number of outliers is not specifiedbeforehand, although the direction of the outliers must be known.

6.3.3 Kurtosis Test

Ferguson (1961) also showed that if less than 21% of the observations ina normal sample have a shift in mean, regardless of direction, then anupper-tailed kurtosis test (Section 3.2.2) is the locally best invariant test fordetecting outliers. Here also it is not necessary that the shifted observationshave the same mean.

In the case of a normal sample where all observations have the samemean but some have a shift in variance, then the upper-tailed kurtosis testis the locally best invariant test for identifying outliers regardless of howmany spurious observations there are.

6.3.4 Normal Mixtures

The likelihood ratio test for mixtures of two normal components has beenused to detect a small number of outliers (Aitkin and Wilson, 1980), i.e.,when the mixing proportion p is small relative to the sample size. Thetest does not require a pre-specification of the number of outliers, andclassification methods can be used to separate the outlier group from theremainder of the sample; furthermore, the overall significance level is basedon the single test.


Similar to sequential tests, however, tests for normal mixtures are notas sensitive to outliers as tests with the correct number of outliers specified.Normal mixtures and tests are described more fully in Chapter 11.

6.4 Sequential Procedures for Detecting Outliers

Sequential or "many outlier" procedures can be used when the numberof outliers in a sample is unknown, but the maximum number of outlierscan be specified beforehand. If there are more outliers than the maximumspecified, then the techniques may not work due to masking.

The sequential procedures described below are all performed in thesame manner: after specifying fc, the maximum number of outliers ex-pected, the test is performed (step 1). Then, after deleting the observationfarthest from the sample mean, the test is repeated for the resulting sub-sample. This process is repeated k times. The appropriate critical valuesare compared to determine if any of the steps gave a significant result.

Because it is usually desired that an overall rather than individuala level be attained, critical values are obtained by using the marginaldistributions of the test statistics, say t\ , . . . , i^, to determine critical valuesA(/3) such that,

P[ti>\i(P)]=P, t = l , . . . , A:

so that

If all ti < A,;(/3), then no outliers are declared; if any of the £7; > A^(/3)then m outliers are declared, where ra is the largest value of i such thatU > \t(P).

Since sequential methods are defined to give an overall a significancelevel over every step in the procedure, sequential tests will not be as sen-sitive at picking out rn outliers as a test designed for detecting exactly moutliers. Prescott (1979) suggested that a sequential procedure for no morethan three outliers is sufficient, since if "there are more than a few outlierspresent the problem ceases to be one of outlier detection and perhaps theunderlying structure of the data should be examined in some other way" .

6.4.1 Extreme Studentized Deviate Procedure

The extreme studentized deviate (BSD) procedure is essentially a stepwiseGrubbs T test for a single outlier in either tail. At each step, the ESD is


calculated

ESDi = max I Xj — X{ \ Isi, i — 1, . . . , kj=l,...,n—i+l

where Xi and s? are the mean and variance, respectively, of the subsampleremaining after the first i — 1 steps and observation deletions. Jain (1981)gave critical values for the ESD procedure for k up to 5 (Table B21).

6.4.2 Studentized Range Procedure

The studentized range (STR) procedure is a stepwise test-and-deletionmethod using the range statistic,

STRi = (x^n_i+l^ — x-^/Si, i — 1, . . . , k.

Here the x\ are the order statistics of the subsample at step z, and s? isthe variance of the subsample. This method has been found to have poorpower properties (Jain, 1981).

6.4.3 Sequential Kurtosis

A sequential kurtosis (KUR) procedure was discussed by Rosner (1975)and Jain (1981); critical values were given for k up to 5 by Jain (TableB23). This procedure consists of obtaining the kurtosis of the subsampleat step i after the previous i — 1 deletions,

where the m7' are the sample moments for subsample i at step i.

6.4.4 Prescott's Sequential Procedure

Prescott (1979) suggested the sequential use of Grubbs-type sum of squaresratios, using the ratios of sums of squares obtained by deleting the k ob-servations one at a time. The steps in Prescott's procedure are conductedby using the criteria


where S?-^ is the sum of squares of the sample obtained by deleting the jobservations furthest from the original sample mean (in either direction).The number of outliers identified by this test is m, where m is the maximumj such that Dj < A 7 .

Prescott also gave the formula for calculating S?-\ from the full samplesum of squares, using only functions of the residuals. By letting

rv.11 —_ fy* . rv*I .1*1 .<-,

i.e., the signed residual from the full sample mean, then

V ) /(n-j)

where the summations are over the j most extreme deviations from thesample mean (note that the sign of r' is retained in this calculation).

Example 6.7. In the leukemia latency data (Figure 6.2), as m,anyas 8 outliers in the sample of 20 cases can be identified. The threemost extreme observations from the sample m,ean of 60.4 are a^in the upper tail, with values of 168, 132 and 120 (Data Set 9).The sums of squares for the full sam,ple is S2 = 30334.8 and forthe reduced samples are

Sfl} = 18147.7

5(22) = 11846.4

5(23) = 6723.9

These result in the sequential test statistics

DI = 18147.7/30334.4 = 0.598

D2 = 11846.4/18147.7 = 0.653

£>3 = 6723.9/11846.4 = 0.568

From, the table of critical values for a sample size of 20 with amaximum of 3 outliers, the critical values for comparison to D\,D-2 and DZ are 0.545, 0.605 and 0.630, respectively, for an overall0.05 significance level. Although DI and D^ are not less than theirrespective critical values, D% is less than 0.630, indicating that


these three observations are outliers relative to the remainder ofthe sample. Note that these outliers would not have been detectedhad a significance level of 0.01 been used, while using L% resultedin a significance level well below 0.01 (Example 6.4)-

In keeping with his suggestion, Prescott gave critical values for hisprocedure only for k = 2 and 3 (Table B24).

6.4.5 Rosner's RST Procedure

Rosner (1975, 1977) suggested a modified sequence of Grubbs-type T statis-tics, Ri, for use as a sequential procedure. The method he suggested wasdifferent than the previously discussed sequential procedures in that heused a trimmed mean and variance which did not have to be recalculatedafter each step.

As with all sequential procedures, to perform the RST (R-statistic)procedure the maximum number of outliers, fc, is determined. Then atrimmed mean and variance, based on the original sample trimmed oneach side by k observations, are obtained

n— k

a = Z £(i)/(n ~ 2fc)i=k+l

n—k

i=fc+i

Then, if IQ is the full sample,

= max | Xi — a \ /b =\ x° — a \ /bIo

R-2 = max | Xi — a \ /b =\ x1 — a \ /b . . .ii

so that the Ri are sequentially the largest standardized residuals from thetrimmed mean. These test statistics are then compared to the criticalvalues corresponding to the specified (a, n, k) to determine whether anyoutliers exist in the original sample.


Example 6.8. For the July wind data (Data Set 8), two outlierswill again be specified, as in Example 6.5. The trimmed mean andstandard deviation based on the 27 middle observations are

b = 1.872

which result in test statistic values of

Hi = (17.1 - 8.89)/1.872 = 4.386

R2 = (8.89 - 3.8)/1.872 = 2.719.

From the table of critical values for a sample size of 31, usinga = 0.05 and k = 2 (Table B25), the critical values for comparisonto RI and R-2 are about 4-60 and 8.55, respectively, indicatingthat there neither of the observations should be considered outliers.However, note that Grubbs' T — 3.01 would have just identified asingle outlier at the 0.05 level, had that been the model specified.

Rosner (1977) gave critical values for selected sample sizes up to 100for a = 0.10,0.05 and 0.01 and k = 2,3 and 4; Jairi (1981) gave similar-values for k up to 5 (Table B25).

6.5 Further Reading

The most comprehensive (and recent) discourse on the subject of outliers isthat of Barnett and Lewis (1994). In addition to the subject matter coveredin this chapter, they discuss robust methods of estimation and testing in thepresence of outliers; multivariate outliers (see Chapter 10); outlier detectionin samples when one or more parameters are known; outliers when the nulldistribution is other than normal; and outliers in regression, time seriesand contingency tables. Hawkins (1980) and Tietjen (1986) are also useful,but more limited, references on this topic.

References

Aitken, M., and Wilson, G.T. (1980). Mixture models, outliers and theEM algorithm. Technometrics 22, 325-331.


Barnett, V., and Lewis, T. (1994). Outliers in Statistical Data, 3rd ed.John Wiley and Sons, New York.

Box, G.E.P. (1953). A note on regions for tests of kurtosis. Biometrika 40,465-468.

David, H.A. (1981). Order Statistics 2nd ed. John Wiley and Sons, NewYork.

David, H.A., Hartley, H.O. and Pearson, E.S. (1954). The distributionof the ratio, in a single normal sample, of the range to the standarddeviation. Biometrika 41, 482-493.

David, H.A. and Paulson, A.S. (1965). The performance of several testsfor outliers. Biometrika 52, 429-436.

Dixon, W. (1950). Analysis of extreme values. Annals of MathematicalStatistics 21, 488-505.

Dixon, W. (1951). Ratios involving extreme values. Annals of Mathemat-ical Statistics 22, 68-78.

Ferguson, T.S. (1961). On the rejection of outliers. Proceedings, FourthBerkeley Symposium on Mathematical Statistics and Probability, Uni-versity of California Press, Berkeley, 253-287.

Gibbons, D.I., and McDonald, G.C. (1980a). Examining regression re-lationships between air pollution and mortality. GMR-3278, GeneralMotors Research Laboratories, Warren, Michigan.

Gibbons, D.I., and McDonald, G.C. (1980b). Identification of influentialgeographical regions in an air pollution and mortality analysis. GMR-3455, General Motors Research Laboratories, Warren, Michigan.

Grubbs, F. (1950). Sample criteria for testing outlying observations. An-nals of Mathematical Statistics 21, 27-58.

Grubbs, F. (1969). Procedures for detecting outlying observations in sam-ples. Technometrics 11, 1-19.

Grubbs, F., and Beck, G. (1972). Extension of sample sizes and percentagepoints for significance tests of outlying observations. Technometrics 14,847-859.

Hawkins, D.M. (1980). Identification of Outliers. Chapman and Hall,New York.

Hoaglin, D.C., Iglewicz, B., and Tukey, J.W. (1986). Performance of someresistant rules for outlier labeling. Journal of the American StatisticalAssociation 81, 991-999.


Jain, R.B. (1981). Percentage points of many-outlier detection procedures.Technomctrics 23, 71-76.

Johnson, B.A., arid Hunt, H.H. (1979). Performance characteristics forcertain tests to detect outliers. Proceedings of the Statistical Comput-ing Section, Annual Meeting of the American Statistical Association,Washington, B.C.

Lave, L.B. and Seskin, E.P. (1970). Air pollution and human health. Sci-ence 169, 723-733.

Lave, L.B. and Seskin, E.P. (1977). Air Pollution and Human Health.John Hopkins University Press, Baltimore.

Prescott, P. (1979). Critical values for a sequential test for many outliers.Applied Statistics 28, 36-39.

Rosner, B. (1975). On the detection of many outliers. Technometrics 17,221-227.

Rosner, B. (1977). Percentage points for the RST many outlier procedure.Technomctrics 19, 307-312.

Thodc, Jr., H.C. (1985). Power of absolute moment tests against symmetricnon-normal alternatives. Ph.D. dissertation, University Microfilms, AnnArbor, Michigan.

Thode, Jr., H.C., Smith, L.A. and Finch, S.J. (1983). Power of tests ofnormality for detecting scale contaminated normal samples. Communi-cations in Statistics - Simulation and Computation 12, 675-695.

Tietjen G.L. (1986). The analysis and detection of outliers. In D'Agostino,R.B., and Stephens, M.A., eds., Goodness-of-Fit Techniques, MarcelDekker, New York.

Tietjen, G.L., and Moore, R.H. (1972). Some Grubbs-type statistics forthe detection of outliers. Technometrics 14, 583-597.

Tukey, J.W. (1977). Exploratory Data Analysis. Addison-Wesley,Reading, MA.

U.S. Bureau of the Census (1972). City and County Data Book, 1972.U.S. Government Printing Office, Washington, D.C.

Uthoff, V.A. (1970). An optimum test property of two well-known statis-tics. Journal of the American Statistical Association 65, 1597-1600.


7Tests of Outliers

Documents

Transcript of 7Tests of Outliers