Barbara Mascialino, INFN Genova Comparison of data distributions: the power of Goodness-of-Fit Tests...
-
Upload
kerry-collins -
Category
Documents
-
view
219 -
download
3
Transcript of Barbara Mascialino, INFN Genova Comparison of data distributions: the power of Goodness-of-Fit Tests...
Barbara Mascialino, INFN Genova
Comparison of data distributions: the power of
Goodness-of-Fit Tests
B. Mascialino1, A. Pfeiffer2, M.G. Pia1, A. Ribon2, P. Viarengo3
1INFN Genova, Italy 2CERN, Geneva, Switzerland
3IST – National Institute for Cancer Research, Genova, Italy
IEEE – NSS 2006IEEE – NSS 2006San Diego, October 29-November 5, 2006San Diego, October 29-November 5, 2006
Barbara Mascialino, INFN Genova
Goodness of Fit testing
Regression testing Throughout the software life-cycle
Online DAQ Monitoring detector behaviour w.r.t. a reference
Simulation validation Comparison with experimental data
Reconstruction Comparison of reconstructed vs. expected distributions
Physics analysis Comparisons of experimental distributions Comparison with theoretical distributions
Goodness-of-fitGoodness-of-fit testing is the mathematical foundationmathematical foundation for the comparison of data distributionscomparison of data distributions
THEORETICALDISTRIBUTION
SAMPLE
ONE-SAMPLE PROBLEM
SAMPLE 2SAMPLE 1
TWO-SAMPLE PROBLEM
Use cases in experimental physics
Barbara Mascialino, INFN Genova
G.A.P Cirrone, S. Donadio, S. Guatelli, A. Mantero, B. Mascialino, S. Parlati, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo
“A Goodness-of-Fit Statistical Toolkit”IEEE- Transactions on Nuclear Science (2004), 51 (5): 2056-2063.
B. Mascialino, M.G. Pia, A. Pfeiffer, A. Ribon, P. Viarengo“New developments of the Goodness-of-Fit Statistical Toolkit”
IEEE- Transactions on Nuclear Science (2006), 53 (6), to be published
http://www.ge.infn.it/statisticaltoolkit/
Barbara Mascialino, INFN Genova
GoF algorithms in the Statistical Toolkit
Unbinned distributions Anderson-Darling test Anderson-Darling approximated test Cramer-von Mises test Generalised Girone test Goodman test (Kolmogorov-Smirnov test in chi-squared approximation) Kolmogorov-Smirnov test Kuiper test Tiku test (Cramer-von Mises test in chi-squared approximation) Weighted Kolmogorov-Smirnov test Weighted Cramer-von Mises test
TWO-SAMPLE PROBLEM
Binned distributions Anderson-Darling test Chi-squared test Fisz-Cramer-von Mises test Tiku test (Cramer-von Mises test in chi-squared approximation)
Barbara Mascialino, INFN Genova
Power of GoF testsPower of GoF tests
Do weDo we really need really need such a wide such a wide collection of GoF tests? collection of GoF tests? Why?Why?
Which is the most appropriatemost appropriate test to compare two distributions?
How “goodgood” is a test at recognizing real equivalent distributions and rejecting fake ones?
Which test
Which test
to use?to use?
No comprehensive study of the relative power of GoF tests exists in literature novel research in statisticsnovel research in statistics (not only in physics data analysis!)
SystematicSystematic study of allall existing GoF tests in progress made possible by the extensive collection of tests in the Statistical Toolkit
Barbara Mascialino, INFN Genova
Method for the evaluation of Method for the evaluation of powerpower
N=10000Monte Carlo
replicas
Pseudo-experiment: a random drawing
of two samples from two parent distributions
Confidence Level = 0.05
Parent distribution 1
Sample 1n
Sample 2n
GoF testGoF test
Parent distribution 2
PowerPower = = # pseudoexperiments with p-value < (1-CL)
# pseudoexperiments
The power of a test is the probability of rejecting the null hypothesis correctly
Barbara Mascialino, INFN Genova
Analysis cases
Data samples drawn from different parent distributions Data samples drawn from the same parent distribution
Applying a scale factor Applying a shift
Use cases in experimental physics Signal over background “Hot channel”, dead channel etc.
Is there any recipe to identify the best test to use?Is there any recipe to identify the best test to use?
Power analysis on some Power analysis on some typical physics applications typical physics applications
Power analysis on a set of Power analysis on a set of reference mathematical reference mathematical
distributions distributions
Barbara Mascialino, INFN Genova
Parent reference distributionsParent reference distributions
1)(1 xf
UniformUniform)
2(
2
2
2
1)(
x
exf
GaussianGaussian
||3
2
1)( xexf
Double ExponentialDouble Exponential
24
1
11)(
xxf
CauchyCauchy
xexf )(5
ExponentialExponential
Contaminated Normal Distribution 2Contaminated Normal Distribution 2
)1,1(5.0)4,1(5.0)(7 xf
)9,0(1.0)1,0(9.0)(6 xf
Contaminated Normal Distribution 1Contaminated Normal Distribution 1
xexf )(8
ExponentialExponentialLeft TailedLeft Tailed
ParetoPareto
xxf
11)(12,9
α= 1.0 α= 2.0 α= 3.0 α= 4.0
Barbara Mascialino, INFN Genova
Parent Parent
DistributionDistributionf12(x) Pareto 4 0.037 1.647
f11(x) Pareto 3 0.076 1.488
f10(x) Pareto 2 0.151 1.351
f9(x) Pareto 1 0.294 1.245
f1(x) Uniform 1.000 1.267
f2(x) Gaussian 1.000 1.704
f6(x) Contamined Normal 1 1.000 1.991
f3(x) Double Exponential 1.000 2.161
f4(x) Cauchy 1.000 5.263
f7(x) Contamined Normal 2 1.769 1.693
f5(x) Exponential 4.486 1.883
f8(x) Exponential left tailed 6.050 1.501
025.05.0
5.0975.0
xx
xxS
125.0875.0
025.0975.0
xx
xxT
SKEWNESSSKEWNESS TAILWEIGHTTAILWEIGHT
Barbara Mascialino, INFN Genova
Compare different distributionsCompare different distributions Parent1 Parent1 ≠ Parent2≠ Parent2
Unbinned distributionsUnbinned distributions
Barbara Mascialino, INFN Genova
The power increases as a function of the sample The power increases as a function of the sample sizesize
Em
piric
al p
ower
(%
)
Symmetricvs
SkewedMedium tailed
vsMedium tailed
CN1 vs
CN2
KK
WW
Em
piric
al p
ower
(%
)
Symmetricvs
skewed
Medium tailedvs
Medium tailed
GAUSSIAN vs
CN2
KK
WW
Em
piric
al p
ower
(%
)
ADAD
CvMCvMPARETO1
vsPARETO2
Short tailedvs
Short tailed
Skewedvs
Skewed
WCvMWCvM
EXPONENTIAL LEFT TAILEDvsPARETO1
Em
piric
al p
ower
(%
)
KSKS
WKSWKSBB
WKSWKSADAD
Short tailedvs
Medium tailed
Skewedvs
Skewed
KS
WKSB
WKSAD
CvM
WCvM
AD
W
K
Barbara Mascialino, INFN Genova
The power varies as a function of the The power varies as a function of the parent distributions’ characteristicsparent distributions’ characteristics
POWERCORRELATIONCOEFFICIENTS
SS11 – S – S22 TT11 – T – T22 NN
0.409 0.091 0.181p<0.0001 p<0.0001 p<0.0001
Em
piric
al p
ower
(%
)
Samples size = 50
EXPONENTIALvs
PARETO
S1 – S2
Samples size = 15
Em
piric
al p
ower
(%
)
FLATvs
OTHER DISTRIBUTIONS
T1 – T2
jkikjikjiiijk NTTSSPower )()( 2121
p<0.0001
General recipe
Barbara Mascialino, INFN Genova
Quantitative evaluation of GoF tests Quantitative evaluation of GoF tests powerpower
We propose an alternative quantitativealternative quantitative method to evaluate the power of various GoF tests.
jkikjikjiiijk NTTSSPower )()( 2121
p<0.0001
)( iz )( jkNz)( iz < <
Standardised coefficients analysis:
LINEAR MULTIPLE
REGRESSION
INCLUDE BOTHPARENT DISTRIBUTIONS’
CHARACTERISATION
INCLUDE SAMPLES SIZE
Barbara Mascialino, INFN Genova
Binned distributionsBinned distributions
Compare different distributionsCompare different distributions Parent1 Parent1 ≠ Parent2≠ Parent2
Barbara Mascialino, INFN Genova
PreliminaryPreliminary results
Sample size = 1000Number of bins = 20
GAUSSIANGAUSSIAN DOUBLEDOUBLEEXPONENTIALEXPONENTIAL CAUCHYCAUCHY CN1CN1
DOUBLEDOUBLEEXPONENTIALEXPONENTIAL
CAUCHYCAUCHY
CN1CN1
CN2CN2
χ2 = (38.91±0.49)CvM = (92.9 ± 0.26)
χ2 = (98.67±0.12)CvM = (100.0 ± 0.0)
χ2 = (50.32±0.50)CvM = (99.79 ± 0.05)
χ2 = (100.0±0.0)CvM = (100.0 ± 0.0)
χ2 = (77.72±0.42)CvM = (99.98 ± 0.02)
χ2 = (65.04±0.48)CvM = (79.55 ± 0.40)
χ2 = (100.0±0.0)CvM = (100.0 ± 0.0)
χ2 = (33.23±0.47)CvM = (88.57 ± 0.32)
χ2 = (92.83±0.26)CvM = (99.97 ± 0.02)
χ2 = (99.95±0.02)CvM = (100.0 ± 0.0)
Barbara Mascialino, INFN Genova
Physics use case
Barbara Mascialino, INFN Genova
ConclusionsConclusions
No clear winner clear winner for all the considered distributions in general the performance of a test depends on its intrinsic features as well as on
the features of the distributions to be compared
Practical recommendations
1) first classify the type of the distributions in terms of skewness and tailweight
2) choose the most appropriate test given the type of distributions evaluating the best test by means of the quantitative model proposed
Systematic study of the power in progress for both binned and unbinned distributions
Topic still subject to research activity in the domain of statistics