Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some...

55
Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics Penn State University

Transcript of Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some...

Page 1: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Summer School in Statisticsfor Astronomers VI

June 7-11, 2010

Robustness, Nonparametrics and Some Inconvenient Truths

Tom Hettmansperger

Dept. of Statistics

Penn State University

Page 2: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Robust methods

t-tests and F-test

rank tests

Least squares

Nonparametrics

Page 3: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Some ideas we will explore:

Robustness

Nonparametric Bootstrap

Nonparametric Density Estimation

Nonparametric Rank Tests

Tests for (non-)Normality

The goal: To make you worry or at least thinkcritically about statistical analyses.

Page 4: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

AbstractPopulation Distribution, Model

Real World Data

Probability andExpectation

StatisticalInference

Page 5: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Statistical Model, Population Distribution

Research Hypothesis or Question in English

Measurement, Exp. Design, Data Collection

Translate Res. Hyp. or Quest. into a statement in terms of the model parameters

Select a relevant statistic

Carry out statistical inference Graphical displays Model criticism

Sampling Distributions P-values Significance levels Confidence coefficients

State Conclusions and Recommendations in English

Page 6: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Parameters in a population or model

Typical Values: mean, median, mode

Spread: variance (standard deviation), interquartile range (IQR)

Outliers

Shape: probability density function (pdf), cumulative distribution function (cdf)

Page 7: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

NGC 4382 (n = 59) 26.215 26.506 26.542 26.551 26.553 26.607 26.612 26.674 26.687 26.699 26.703 26.727 26.740 26.747 26.765 26.779 26.790 26.800 26.807 ... 27.161 27.169 27.179

Research Question: How large are the luminosities in NGC 4382?

Measure of luminosity (data below)

Traditional model: normal distribution of luminosity

Translate Res. Q.: What is the mean luminosity of the population?(Here we use the mean to represent the typical value.)

The relevant statistic is the sample mean.

n

SX 2

orig: 26.905 + .0524no: 26.917 + .047424.000: 26.867 + .1094

Statistical Inference: 95% confidence interval for the meanusing a normal approximation to the sampling distribution of the mean.

Page 8: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

27.227.026.826.626.426.2

Median

Mean

27.00026.97526.95026.92526.90026.87526.850

1st Quartile 26.765Median 26.9743rd Quartile 27.042Maximum 27.179

26.853 26.957

26.915 27.010

0.170 0.246

A-Squared 1.54P-Value < 0.005

Mean 26.905StDev 0.201Variance 0.040Skewness -1.06045Kurtosis 1.08094N 59

Minimum 26.215

Anderson-Darling Normality Test

95% Confidence I nterval for Mean

95% Confidence I nterval for Median

95% Confidence I nterval for StDev95% Confidence I ntervals

Summary for NGC 4382

Page 9: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

NGC 4382_24NGC 4382_25NGC 4382_26NGC 4382_origNGC 4382_no

27.5

27.0

26.5

26.0

25.5

25.0

24.5

24.0

Data

Boxplot of NGC 4382 no, NGC 4382, NGC 4382_26, NGC 4382_25, ...

Page 10: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Variable N N* Mean SE Mean StDevNGC 4382_no 58 0 26.917 0.0237 0.181NGC 4382_orig 59 0 26.905 0.0262 0.201NGC 4382_26 59 0 26.901 0.0280 0.215NGC 4382_25 59 0 26.884 0.0400 0.307NGC 4382_24 59 0 26.867 0.0547 0.420

Minimum Q1 Median Q326.506 26.776 26.974 27.04626.215 26.765 26.974 27.04226.000 26.765 26.974 27.04225.000 26.765 26.974 27.04224.000 26.765 26.974 27.042

Page 11: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Outliers can have arbitrarily large impact on the sample mean, samplestandard deviation, and sample variance.

A single outlier can increase the width of the t-confidence interval and inflate the margin of error for the sample mean. Inference canbe adversely affected.

First Inconvenient Truth:

Second Inconvenient Truth:

It is bad for a small portion of the data to dictatethe results of a statistical analysis.

Page 12: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Third Very Inconvenient Truth:

The construction of a 95% confidence interval for the population variance is very sensitive to theshape of the underlying model distribution.

The standard interval computed in most statisticalpackages assumes the model distribution is normal.

If this assumption is wrong, the resulting confidencecoefficient can vary significantly.

I am not aware of a stable 95% confidence interval for the population variance.

Page 13: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

The ever hopeful statisticians

Page 14: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Robustness: structural and distributional

Influence or sensitivity curves: The rate of change in a statistic as an outlier is varied.

Breakdown: The smallest fraction of the data thatmust be altered to carry the statistic beyond anypreset bound.

We want bounded influence and high breakdown.

Structural: We would like to have an estimator and a test statistic that are not overly sensitiveto small portions of the data.

Page 15: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Distributional robustness:

We want a sampling distribution for the teststatistic that is not sensitive to changes ormisspecifications in the model or populationdistribution.

This type of robustness provides stablep-values for testing and stable confidencecoefficients for confidence intervals.

Page 16: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

NGC 4382_24NGC 4382_25NGC 4382_26NGC 4382_origNGC 4382_no

27.00

26.95

26.90

26.85

26.80

26.75

Data

95% conf int for pop mean, x denotes the sample median

Mean

Median

Page 17: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

NGC 4382_24NGC 4382_25NGC 4382_26NGC 4382_origNGC 4382_no

27.025

27.000

26.975

26.950

26.925

26.900

26.875

26.850

Data

95% conf int for pop median, + denotes sample mean

Page 18: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Message: The sample mean is not structurally robust; whereas, the median is structurally robust.

It takes only one observation to move the sample mean anywhere. It takes roughly 50% of the data to move the median. (Breakdown)

Sensitivity Curve: ]ˆˆ)[1()( 1 nnnxSC

SCmean(x) = x

SCmedian(x) = (n+1)x(r) if x < x(r)

(n+1)x if x(r) < x < x(r+1)

(n+1)x(r+1) if x(r+1) < x when n = 2r

Page 19: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Mean

Median

x

Influence

Mean has linear, unbounded influence.Median has bounded influence.

Page 20: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Some good news: The sampling distribution of the sample mean depends only mildly on the populationor model distribution. (A Central Limit Theorem effect)

Provided our data come from a model with finitevariance, for large sample size

S

Xn

][ has an approximate standardnormal distribution (mean 0 and variance 1).

This means that the sample mean enjoys distributionalrobustness, at least approximately. We say that thesample mean is asymptotically nonparametric.

Page 21: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

More inconvenient truth: the sample variance is neitherstructurally robust (unbounded sensitivity and breakdowntending to 0), but also lacks distributional robustness.

Again, from the Central Limit Theorem:

][ 22 Sn

Provided our data come from a model with finitefourth moment, for large sample size

has an approximate normal distribution with mean 0 and variance:

4

44 )(

),1(

XEwhere

is called the kurtosis

Page 22: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Approx true

Conf Coeff

3 .948

4.2 .877

5 .834

9 .674

The kurtosis and is a measure of the tail weight of a model distribution. It is independent of location and scale and has value 3 for any normal model.

Assuming 95% confidence:

Page 23: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

43210-1-2-3

99

95

90

80

70

605040

30

20

10

5

1

t5

Perc

ent

Mean 0.08663StDev 1.105N 50AD 0.211P-Value 0.850

Normal - 95% CIProbability Plot of t5, Kurtosis = 9

A very inconvenient truth:A test for normality will alsomislead you!!

Page 24: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Some questions:

1. If statistical methodology based on sample meansand sample variances is non robust, what can we do?

Are you concerned about the last least squares analysisyou carried out? (t-tests and F-tests) If not, you should be!

2. What if we want to simply replace the mean by themedian as the typical value? The sample median is robust, at least structurally. What about the distribution?

3. The mean and the t-test go together. What test goeswith the median?

Page 25: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

We know that:

n

Sbyestimated

nmeanSE ,)(

How to find SE(median) and estimate it.

Two ways:

1. Nonparametric Bootstrap (computational)

2. Estimate the standard deviation of the approximating normal distribution. (theoretical)

Page 26: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Result for NGC4382: SE(median) = .028

(.027 w/o the outlier, .028 w outlier = 24)

Nonparametric Bootstrap;

1. Draw a sample of size 59 from the original NGC4382data. Sample with replacement.

2. Compute and store the sample median.

3. Repeat B times. (I generally take B = 4999)

4. The histogram of the B medians is an estimate of sampling distribution of the sample median.

5. Compute the standard deviation of the B medians.This is the approximate SE of the sample median.

Page 27: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Theoretical (Mathematical Statistics) Moderately Difficult

Let M denote the sample median. Provided the density (pdf) of the model distribution is not 0 at the model median,

][ Mn has an approximate normal distributionwith mean 0 and variance 1/[4f2(

In other words, SE(median) = )(2

1

fn

and we must estimate the value of the density at the population median.

where f(x) is the density and is the model median.

Page 28: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Let f(x) denote a pdf. Based on a sample of sizen we wish to estimate f(x0) where x0 is given.

Define:

n

ii

h

XxK

hnxf

10

0

11)(ˆ

Where K(t) is called the kernel and

dttKtdtttKdttK K )(,0)(,1)( 22

Nonparametric density estimation:

Page 29: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Then a bit of calculation yields:

...)(2

1)())(ˆ( 2

02

00 KxfhxfxfE

And a bit more:

...)()(1

)(ˆ( 200 dttKxf

nhxfV

And so we want: nhandh 0

Page 30: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

The density estimate does not much depend on K(t), thekernel. But it does depend strongly on h, the bandwidth.

We often choose a Gaussian (normal) kernel: 2

2

1

2

1 tetK

Next we differentiate the integrated mean squared error andset it equal to 0 to find the optimal bandwidth (indept of x0).

5/1

5/1

22

2

)]([

)(

n

dxxf

dttKh

K

opt

If we choose the Gaussian kernel and if f is normal then:

IQRSwherenhn 75.,minˆ)ˆ06.1( 5/1

Page 31: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Recall, SE(median) =)(2

1

fn

For NGC4382: n = 59, M = 26.974

approxasyMf

MSE .,031.)(ˆ259

1)(

Bootstrap result for NGC4382: SE(median) = .028 finite sample approx

Final note: both bootstrap and density estimate are robust.

Page 32: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

The median and the sign test (for testing H0: = 0) are related through the L1 norm.

2)]([#)(

][#2)sgn()()(

))]([sgn(||)(

nSEwithXS

nXXSDd

d

XXXD

i

ii

iii

To test H0: = 0 we use S+(Xi > 0 which has a nullbinomial sampling distribution with parameters n and .5.

This test is nonparametric and very robust.

Page 33: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Research Hypothesis: NGC4494 and NGC4382 differin luminosity.

Luminosity measurements (data)

NGC 4494 (m = 101) 26.146 26.167 26.173…26.632 26.641 26.643NGC 4382 (n = 59) 26.215 26.506 26.542…27.161 27.169 27.179

Statistical Model Two normal populations with possiblydifferent means but with the same variance.

Translation: H0: 4494 = vs. H0:

Page 34: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Select a statistic: The two sample t statisticNGC 4494 (m = 101), NGC 4382 (n = 59)

m

S

n

S

XX

nmnm

SnSm

XXt

24382

24494

43824494

24382

24494

43824494

)11

2

)1()1(

The two sided t-test with significanc level .05rejects the null hyp when |t| > 2.

Recall that means and variances are not robust.

VERY STRANGE!

Page 35: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Ratio of variances

Ratio

of

sample

sizes

1/4 1/1 3/1

1/4 .01 .05 .15

1/1 .05 .05 .05

4/1 .18 .05 .01

Table of true values of the significance level whenthe assumed level is .05.

Another inconvenient truth: the true significance levelcan differ from .05 when some model assumptions fail.

Page 36: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

An even more inconvenient truth:

These problems extend to analysis of variance and regression.

Seek alternative tests and estimates.

We already have alternatives to the mean and t-test: the robust median and sign test.

Page 37: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

We next consider nonparametric rank tests and estimatesfor comparing two samples. (Competes with the two sample t-test and difference in sample means.)

)(,...,

)(,...,

1

1

yGfromsampleaYY

xFfromsampleaXX

n

m

Generally suppose: )()( yFyG

To test H0: or to estimate we introduce

2

)1()(]0[#)0(

nnYRXYS jij

)( jYR is the rank of Yj in the combined data.

Page 38: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

The robust estimate of is

)(ˆ, ijji XYmedian

Provides the robustness Provides the comparison

As opposed to

)(1

ij XYmn

XY

which is not robust.

Page 39: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Research Hypothesis: NGC4494 and NGC4382 differin luminosity.

Luminosity measurements (data)

X: NGC 4494 (n = 101) 26.146 26.167 26.173…26.632 26.641 26.643Y: NGC 4382 (n = 59) 26.215 26.506 26.542…27.161 27.169 27.179

Statistical Model Two normal populations with possiblydifferent medians but with the same scale.

Translation: H0: vs. H0: 0

Page 40: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Mann-Whitney Test and CI: NGC 4494, NGC 4382

N MedianX: NGC 4494 101 26.659Y: NGC 4382 59 26.974

Point estimate for Delta is 0.253

95.0 Percent CI for Delta is (0.182, 0.328)

Mann-Whitney test:Test of Delta = 0 vs Delta not equal 0 is significant at 0.0000 (P-Value)

Page 41: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

NGC 4382NGC 4494

27.2

27.0

26.8

26.6

26.4

26.2

26.0

Data

85% CI-Boxplot of NGC 4494, NGC 4382

Page 42: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

What to do about truncation.

1. See a statistician

2. Read the Johnson, Morrell, and Schick reference. and thensee a statistician.

Here is the problem: Suppose we want to estimate the difference in locationsbetween two populations: F(x) and G(y) = F(y – d).

But (with right truncation at a) the observations come from

ayforandayfordaF

dyFyG

axforandaxforaF

xFxF

a

a

1)(

)()(

1)(

)()(

Suppose d > 0 and so we want to shift the X-sample to the right toward the truncation point. As we shift the Xs, some will pass the truncation point andwill be eliminated from the data set. This changes the sample sizes and requires adjustment when computing the corresponding MWW to see ifit is equal to its expectation. See the reference for details.

Page 43: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Computation of shift estimate with truncation

d m n S+ E(S+)

.253 88 59 .510 4750.5 4366.0

.283 84 59 .360 4533.5 4248.0

.303 83 59 .210 4372.0 4218.5

.323 81 59 .080 4224.5 4159.5

.333 81 59 -.020 4144.5 4159.5

.331 81 59 -.000 4161.5 4159.5

Comparison of NGC4382 and NGC 4494

Point estimate for d is .253 W = 6595.5 (sum of ranks of Ys)

S+ = 4825.5

m = 101 and n = 59

Page 44: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Recall the two sample t-test is sensitive to the assumptionof equal variances.

The Mann-Whitney test is less sensitive to the assumptionof equal scale parameters.

The null distribution is nonparametric. It does not dependon the common underlying model distribution.

It depends on the permutation principle: Under the nullhypothesis, all (m+n)! permutations of the data areequally likely. This can be used to estimate thep-value of the test: sample the permutations,compute and store the MW statistics, then find theproportion greater than the observed MW.

Page 45: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Here’s a bad idea:

Test the data for normality using, perhaps, the Kolmogorov-Smirnov test.

If the test ‘accepts’ normality then use a t-test, and if it rejects normality then use a rank test.

You can use the K-S test to reject normality.

The inconvenient truth is that it may accept manypossible models, some of which can be very disruptive to the t-test and sample means.

Page 46: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Absolute MagnitudePlanetary Nebulae

Milky WayAbs Mag (n = 81) 17.537 15.845 15.449 12.710 15.499 16.450 14.695 14.878 15.350 12.909 12.873 13.278 15.591 14.550 16.078 15.438 14.741 …

Abs Mag-6.0-7.2-8.4-9.6-10.8-12.0-13.2-14.4

Dotplot of Abs Mag

Page 47: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Abs Mag

Perc

ent

-5.0-7.5-10.0-12.5-15.0-17.5

99.9

99

95

90

80706050403020

10

5

1

0.1

Mean

0.567

-10.32StDev 1.804N 81AD 0.303P-Value

Probability Plot of Abs Mag

Normal - 95% CI

Page 48: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Abs Mag - Threshold

Perc

ent

101

99.999

9080706050403020

10

5

32

1

0.1

Shape

0.224P-Value >0.500

2.680Scale 5.027Thresh -14.79N 81AD

Probability Plot of Abs Mag

3-Parameter Weibull - 95% CI

But don’t be too quick to “accept” normality:

Page 49: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

)68.2(

0)(exp{)(

)(

:

1

exampletheinshapec

scaleb

thresholdt

otherwiseandtxforb

tx

b

txcxf

onDistributiWeibull

cc

c

Page 50: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Null Hyp: Pop distribution, F(x) is normal

)())](1)(([))()(( 12 xdFxFxFxFxFnAD n

|)()(|max xFxFD n

The Kolmogorov-Smirnov Statistic

The Anderson-Darling Statistic

Page 51: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

A Strategy:

Use robust statistical methods whenever possible.

If you must use traditional methods (sample means,t and F tests) then carry out a parallel analysis usingrobust methods and compare the results. Start toworry if they differ substantially.

Always explore your data with graphical displays.Attach probability error statements whenever possible.

Page 52: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

What more can we do robustly?

1. Multiple regression

2. Analysis of designed experiments (AOV)

3. Analysis of covariance

4. Multivariate analysis

Page 53: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

There’s more:

The rank based methods are 95% efficient relative to the least squares methods when the underlying model isnormal.

They may be much more efficient when the underlying model has heavier tails than a normal distribution.

But time is up.

Page 54: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

References:1. Hollander and Wolfe (1999) Nonpar Stat Methods

2. Sprent and Smeeton (2007) Applied Nonpar Stat Methods

3. Kvam and Vidakovic (2007) Nonpar Stat with Applications to Science and Engineering

4. Johnson, Morrell, and Schick (1992) Two-Sample Nonparametric Estimation and Confidence Intervals Under Truncation, Biometrics, 48, 1043-1056.

5. Hettmansperger and McKean (2010) Robust Nonparametric Statistics, 2nd Ed.

6. Efron and Tibshirani (1993) An Introduction to the bootstrap

7. Arnold Notes, Bendre Notes

Page 55: Summer School in Statistics for Astronomers VI June 7-11, 2010 Robustness, Nonparametrics and Some Inconvenient Truths Tom Hettmansperger Dept. of Statistics.

Thank you for listening!