Barbara Mascialino, INFN Genova Comparison of data distributions: the power of Goodness-of-Fit Tests...

Barbara Mascialino, INFN Genova

Comparison of data distributions: the power of

Goodness-of-Fit Tests

B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3

1INFN Genova, Italy 2CERN, Geneva, Switzerland

3IST – National Institute for Cancer Research, Genova, Italy

IEEE – NSS 2006IEEE – NSS 2006San Diego, October 29-November 5, 2006San Diego, October 29-November 5, 2006


Goodness of Fit testing

Regression testing Throughout the software life-cycle

Online DAQ Monitoring detector behaviour w.r.t. a reference

Simulation validation Comparison with experimental data

Reconstruction Comparison of reconstructed vs. expected distributions

Physics analysis Comparisons of experimental distributions Comparison with theoretical distributions

Goodness-of-fitGoodness-of-fit testing is the mathematical foundationmathematical foundation for the comparison of data distributionscomparison of data distributions

THEORETICALDISTRIBUTION

SAMPLE

ONE-SAMPLE PROBLEM

SAMPLE 2SAMPLE 1

TWO-SAMPLE PROBLEM

Use cases in experimental physics


G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo

“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.

B. Mascialino, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo“New developments of the Goodness-of-Fit Statistical Toolkit”

IEEE- Transactions on Nuclear Science (2006), 53 (6), to be published

http://www.ge.infn.it/statisticaltoolkit/


GoF algorithms in the Statistical Toolkit

Unbinned distributions Anderson-Darling test Anderson-Darling approximated test Cramer-von Mises test Generalised Girone test Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test Kuiper test Tiku test (Cramer-von Mises test in chi-squared approximation) Weighted Kolmogorov-Smirnov test Weighted Cramer-von Mises test

TWO-SAMPLE PROBLEM

Binned distributions Anderson-Darling test Chi-squared test Fisz-Cramer-von Mises test Tiku test (Cramer-von Mises test in chi-squared approximation)


Power of GoF testsPower of GoF tests

Do weDo we really need really need such a wide such a wide collection of GoF tests? collection of GoF tests? Why?Why?

Which is the most appropriatemost appropriate test to compare two distributions?

How “goodgood” is a test at recognizing real equivalent distributions and rejecting fake ones?

Which test

Which test

to use?to use?

No comprehensive study of the relative power of GoF tests exists in literature novel research in statisticsnovel research in statistics (not only in physics data analysis!)

SystematicSystematic study of allall existing GoF tests in progress made possible by the extensive collection of tests in the Statistical Toolkit


Method for the evaluation of Method for the evaluation of powerpower

N=10000Monte Carlo

replicas

Pseudo-experiment: a random drawing

of two samples from two parent distributions

Confidence Level = 0.05

Parent distribution 1

Sample 1n

Sample 2n

GoF testGoF test

Parent distribution 2

PowerPower = = # pseudoexperiments with p-value < (1-CL)

# pseudoexperiments

The power of a test is the probability of rejecting the null hypothesis correctly


Analysis cases

Data samples drawn from different parent distributions Data samples drawn from the same parent distribution

Applying a scale factor Applying a shift

Use cases in experimental physics Signal over background “Hot channel”, dead channel etc.

Is there any recipe to identify the best test to use?Is there any recipe to identify the best test to use?

Power analysis on some Power analysis on some typical physics applications typical physics applications

Power analysis on a set of Power analysis on a set of reference mathematical reference mathematical

distributions distributions


Parent reference distributionsParent reference distributions

1)(1 xf

UniformUniform)

2(

2

2

2

1)(

x

exf

GaussianGaussian

||3

2

1)( xexf

Double ExponentialDouble Exponential

24

1

11)(

xxf

CauchyCauchy

xexf )(5

ExponentialExponential

Contaminated Normal Distribution 2Contaminated Normal Distribution 2

)1,1(5.0)4,1(5.0)(7 xf

)9,0(1.0)1,0(9.0)(6 xf

Contaminated Normal Distribution 1Contaminated Normal Distribution 1

xexf )(8

ExponentialExponentialLeft TailedLeft Tailed

ParetoPareto

xxf

11)(12,9

α= 1.0 α= 2.0 α= 3.0 α= 4.0


Parent Parent

DistributionDistributionf12(x) Pareto 4 0.037 1.647

f11(x) Pareto 3 0.076 1.488

f10(x) Pareto 2 0.151 1.351

f9(x) Pareto 1 0.294 1.245

f1(x) Uniform 1.000 1.267

f2(x) Gaussian 1.000 1.704

f6(x) Contamined Normal 1 1.000 1.991

f3(x) Double Exponential 1.000 2.161

f4(x) Cauchy 1.000 5.263

f7(x) Contamined Normal 2 1.769 1.693

f5(x) Exponential 4.486 1.883

f8(x) Exponential left tailed 6.050 1.501

025.05.0

5.0975.0

xx

xxS

125.0875.0

025.0975.0

xx

xxT

SKEWNESSSKEWNESS TAILWEIGHTTAILWEIGHT


Compare different distributionsCompare different distributions Parent1 Parent1 ≠ Parent2≠ Parent2

Unbinned distributionsUnbinned distributions


The power increases as a function of the sample The power increases as a function of the sample sizesize

Em

piric

al p

ower

(%

)

Symmetricvs

SkewedMedium tailed

vsMedium tailed

CN1 vs

CN2

KK

WW

Em

piric

al p

ower

(%

)

Symmetricvs

skewed

Medium tailedvs

Medium tailed

GAUSSIAN vs

CN2

KK

WW

Em

piric

al p

ower

(%

)

ADAD

CvMCvMPARETO1

vsPARETO2

Short tailedvs

Short tailed

Skewedvs

Skewed

WCvMWCvM

EXPONENTIAL LEFT TAILEDvsPARETO1

Em

piric

al p

ower

(%

)

KSKS

WKSWKSBB

WKSWKSADAD

Short tailedvs

Medium tailed

Skewedvs

Skewed

KS

WKSB

WKSAD

CvM

WCvM

AD

W

K


The power varies as a function of the The power varies as a function of the parent distributions’ characteristicsparent distributions’ characteristics

POWERCORRELATIONCOEFFICIENTS

SS11 – S – S22 TT11 – T – T22 NN

0.409 0.091 0.181p<0.0001 p<0.0001 p<0.0001

Em

piric

al p

ower

(%

)

Samples size = 50

EXPONENTIALvs

PARETO

S1 – S2

Samples size = 15

Em

piric

al p

ower

(%

)

FLATvs

OTHER DISTRIBUTIONS

T1 – T2

jkikjikjiiijk NTTSSPower )()( 2121

p<0.0001

General recipe


Quantitative evaluation of GoF tests Quantitative evaluation of GoF tests powerpower

We propose an alternative quantitativealternative quantitative method to evaluate the power of various GoF tests.

jkikjikjiiijk NTTSSPower )()( 2121

p<0.0001

)( iz )( jkNz)( iz < <

Standardised coefficients analysis:

LINEAR MULTIPLE

REGRESSION

INCLUDE BOTHPARENT DISTRIBUTIONS’

CHARACTERISATION

INCLUDE SAMPLES SIZE


Binned distributionsBinned distributions

Compare different distributionsCompare different distributions Parent1 Parent1 ≠ Parent2≠ Parent2


PreliminaryPreliminary results

Sample size = 1000Number of bins = 20

GAUSSIANGAUSSIAN DOUBLEDOUBLEEXPONENTIALEXPONENTIAL CAUCHYCAUCHY CN1CN1

DOUBLEDOUBLEEXPONENTIALEXPONENTIAL

CAUCHYCAUCHY

CN1CN1

CN2CN2

χ2 = (38.91±0.49)CvM = (92.9 ± 0.26)

χ2 = (98.67±0.12)CvM = (100.0 ± 0.0)

χ2 = (50.32±0.50)CvM = (99.79 ± 0.05)

χ2 = (100.0±0.0)CvM = (100.0 ± 0.0)

χ2 = (77.72±0.42)CvM = (99.98 ± 0.02)

χ2 = (65.04±0.48)CvM = (79.55 ± 0.40)

χ2 = (100.0±0.0)CvM = (100.0 ± 0.0)

χ2 = (33.23±0.47)CvM = (88.57 ± 0.32)

χ2 = (92.83±0.26)CvM = (99.97 ± 0.02)

χ2 = (99.95±0.02)CvM = (100.0 ± 0.0)


Physics use case


ConclusionsConclusions

No clear winner clear winner for all the considered distributions in general the performance of a test depends on its intrinsic features as well as on

the features of the distributions to be compared

Practical recommendations

1) first classify the type of the distributions in terms of skewness and tailweight

2) choose the most appropriate test given the type of distributions evaluating the best test by means of the quantitative model proposed

Systematic study of the power in progress for both binned and unbinned distributions

Topic still subject to research activity in the domain of statistics

Barbara Mascialino, INFN Genova Comparison of data distributions: the power of Goodness-of-Fit Tests...

Documents

Transcript of Barbara Mascialino, INFN Genova Comparison of data distributions: the power of Goodness-of-Fit Tests...