Influence function of the error rate of classification …...Influence function of the error rate of...
Transcript of Influence function of the error rate of classification …...Influence function of the error rate of...
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Influence function of the error rate of
classification based on clustering
Joint work with G. Haesbroeck
Ch. Ruwet
Department of Mathematics - University of Liège
19 May 2009
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Classification setting
Suppose
X ∼ F arises from G1 and G2 with πi (F ) = IPF [X ∈ Gi ]
then
F is a mixture of two distributions
F = π1(F )F1 + π2(F )F2
with density f = π1(F )f1 + π2(F )f2.
Additional assumption : one dimension !
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
The generalized 2-means clustering method
Aim of clustering : Find estimations C1(F ) and C2(F )(called clusters) of the two underlying groups.
The clusters’ centers (T1(F ), T2(F )) are solutions of
mint1,t2⊂R
∫
Ω
(
inf1≤j≤2
‖x − tj‖
)
dF (x)
for a suitable nondecreasing penalty functionΩ : R
+ → R+.
Classical penalty functions :
Ω(x) = x2 → 2-means method
Ω(x) = x → 2-medoids method
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Classification rule
The classification rule is
RF (x) = Cj(F ) ⇔ Ω(‖x−Tj(F )‖) = min1≤i≤2
Ω(‖x−Ti (F )‖)
In one dimension, the clusters are simply :
C1(F ) =] −∞, C (F )[
C2(F ) =]C (F ), +∞[
where C (F ) =T1(F ) + T2(F )
2is the cut-off point.
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Example (from Hand et al., 1991)
1988 Olympic games in long jump
Length of jump
5.0 5.5 6.0 6.5 7.0 7.5
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Example
1988 Olympic games in long jump
Length of jump
5.0 5.5 6.0 6.5 7.0 7.5
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example with the 2-means
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example with the 2-medoids
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Example
1988 Olympic games in long jump
Length of jump
5.0 5.5 6.0 6.5 7.0 7.5
MenWomen
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Example with the 2-means
1988 Olympic games in long jump
Length of jump
5.0 5.5 6.0 6.5 7.0 7.5
MenWomen
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Example with the 2-means
1988 Olympic games in long jump
Length of jump
5.0 5.5 6.0 6.5 7.0 7.5
MenWomen
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Example with the 2-means
1988 Olympic games in long jump
Length of jump
5.0 5.5 6.0 6.5 7.0 7.5
MenWomen
ER=0.1
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
MenWomen
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example with the 2-means
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
MenWomen
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example with the 2-means
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
MenWomen
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example with the 2-means
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
MenWomen
ER=0.45
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Contaminated example with the 2-medoids
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
MenWomen
ER=0.12
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Error rate
Training sample according to F : estimation of the rule
Test sample according to Fm : evaluation of the rule
In ideal circumstances : F = Fm
ER(F , Fm) =
2∑
j=1
πj(Fm)IPFm[RF (X ) 6= Cj(F )|Gj ]
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Optimality in classification
A classification rule is optimal if the corresponding errorrate is minimal
The optimal classification rule is the Bayes rule :
x ∈ C1(F ) ⇔ π1(F )f1(x) > π2(F )f2(x)
(Anderson, 1958)
The 2-means procedure is optimal under the model
FN = 0.5 N(µ1, σ2) + 0.5 N(µ2, σ
2) with µ1 < µ2
(Qiu and Tamhane, 2007)
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Definition and properties of the first order influence
function
Hampel et al (1986) : For any statistical functional T and anydistribution F ,
IF(x ; T, F ) = limε→0
T(Fε) − T(F )
ε=
∂
∂εT(Fε)
∣
∣
∣
∣
ε=0
where
Fε = (1 − ε)F + ε∆x
(under condition of existence);
EF [IF(X ; T, F )] = 0;
T(Fε) ≈ T(F ) + εIF(x ; T, F ) for ε small enough (Firstorder von Mises expansion of T at F ).
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Definition and properties of the first order influence
function
Hampel et al (1986) : For any statistical functional T and anydistribution F ,
IF(x ; T, F ) = limε→0
T(Fε) − T(F )
ε=
∂
∂εT(Fε)
∣
∣
∣
∣
ε=0
where
Fε = (1 − ε)F + ε∆x
(under condition of existence);
EF [IF(X ; T, F )] = 0;
T(Fε) ≈ T(F ) + εIF(x ; T, F ) for ε small enough (Firstorder von Mises expansion of T at F ).
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
First order influence function of the error rate
Now, the training sample is distributed as Fε which is acontaminated mixture.
ER(Fε, Fm) =2
∑
j=1
πj(Fm)IPFm[RFε
(X ) 6= Cj(Fε)|Gj ]
= π1(Fm)1 − Fm,1(C (Fε)) + π2(Fm)Fm,2(C (Fε))
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
First order influence function of the error rate
Now, the training sample is distributed as Fε which is acontaminated mixture.
ER(Fε, Fm) =2
∑
j=1
πj(Fm)IPFm[RFε
(X ) 6= Cj(Fε)|Gj ]
= π1(Fm)1 − Fm,1(C (Fε)) + π2(Fm)Fm,2(C (Fε))
ER(Fε, FN) ≈ ER(FN , FN) + εIF(x ; ER, FN)
ER(Fε, FN) ≥ ER(FN , FN)
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
First order influence function of the error rate
Now, the training sample is distributed as Fε which is acontaminated mixture.
ER(Fε, Fm) =2
∑
j=1
πj(Fm)IPFm[RFε
(X ) 6= Cj(Fε)|Gj ]
= π1(Fm)1 − Fm,1(C (Fε)) + π2(Fm)Fm,2(C (Fε))
ER(Fε, FN) ≈ ER(FN , FN) + εIF(x ; ER, FN)
ER(Fε, FN) ≥ ER(FN , FN)
⇒ IF(x ; ER, FN) ≡ 0
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
First order influence function of the error rate
Proposition
The influence function of the error rate of the generalized
2-means classification procedure is given by
IF(x ; ER, F ) =1
2IF(x ; T1, F ) + IF(x ; T2, F )
π2(F )f2(C (F )) − π1(F )f1(C (F ))
for all x 6= C (F ).
Expressions of IF(x ; T1, F ) and IF(x ; T2, F ) were computed byGarcía-Escudero and Gordaliza (1999).
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
New contaminated example with the 2-medoids
1988 Olympic games in long jump
Length of jump
0 2 4 6 8
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Penalty functions
2-means : Ω(x) = x2
2-medoids : Ω(x) = x
2-Tukey’s : Ω(x) = b2
6
1 −[
1 −(
xb
)2]3
if |x | ≤ b
1 if |x | > b
with b = 2.795
0 1 2 3 4
01
23
4
2−means2−medoids2−Tukey’s
Ω(x
)
x
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Model distributions
F = π1 N(−∆/2, 1) + (1 − π1) N(∆/2, 1)
with
π1 = 0.4 and ∆ = 3
π1 is varying and ∆ = 3
π1 = 0.4 and ∆ is varying
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Graph of IF(x ; ER, F )
−4 −2 0 2 4
−0.
2−
0.1
0.0
0.1
0.2
x
IF E
R
2−means2−medoids2−Tukey’s
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Graph of IF(x ; ER, F )
−4 −2 0 2 4
−2.
0−
1.0
0.0
0.5
2−means
x
IF E
R
π1=0.4
π1=0.6
π1=0.2
π1=0.8
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Graph of IF(x ; ER, F )
−4 −2 0 2 4
−0.
2−
0.1
0.0
0.1
0.2
2−medoids
x
IF E
R∆=1
∆=3
∆=5
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Definition of the second order influence function
Under the model FN , IF(x ; ER, FN) ≡ 0
One needs to go a step further !
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Definition of the second order influence function
Under the model FN , IF(x ; ER, FN) ≡ 0
One needs to go a step further !
For any statistical functional T and any distribution F ,
IF2(x ; T, F ) =∂2
∂ε2T(Fε)
∣
∣
∣
∣
ε=0
where Fε = (1 − ε)F + ε∆x (under condition of existence).
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Definition of the second order influence function
Under the model FN , IF(x ; ER, FN) ≡ 0
One needs to go a step further !
For any statistical functional T and any distribution F ,
IF2(x ; T, F ) =∂2
∂ε2T(Fε)
∣
∣
∣
∣
ε=0
where Fε = (1 − ε)F + ε∆x (under condition of existence).Second order von Mises expansion of ER at FN :
ER(Fε, FN) ≈ ER(FN , FN) +ε2
2IF2(x ; T, FN)
for ε small enough.
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Second order influence function of the error rate
under optimality
Proposition
Under the optimal model FN , the second order influence
function of the error rate of the generalized 2-means
classification procedure is given by
IF2(x ; ER, FN) = −1
4(fN1)
′
(
µ1 + µ2
2
)
IF(x ; T1, FN) + IF(x ; T2, FN)2
for all x 6= µ1+µ2
2. This expression is always positive.
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Asymptotic loss
A measure of the expected increase in error rate whenestimating the optimal clustering rule from a finite sample withempirical cdf Fn :
A-Loss = limn→+∞
n EFN[ER(Fn, FN) − ER(FN , FN)].
As in Croux et al. (2008) :
Proposition
Under some regularity conditions of the clusters’centers
estimators,
A-Loss =1
2EFN
[IF2(X ; ER, FN)]
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Asymptotic relative classification efficiencies
A measure of the price one needs to pay in error rate forprotection against the outliers when using a robust procedureinstead of the classical one :
ARCE(Robust,Classical) =A-Loss(Classical)
A-Loss(Robust).
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
First order IF
Second orderIF
ARCE
Conclusion
Graph of ARCE
FN = 0.5 N(−∆, 1) + 0.5 N(∆, 1)
0 2 4 6 8
0.55
0.60
0.65
0.70
0.75
0.80
0.85
AR
CE
2−medoids2−Tukey’s
∆
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Conclusion
The generalized 2-means procedure can give a more robustestimator of the error rate with a good choice of thepenalty function;
The price to pay is a loss in efficiency (depending also onthe penalty function).
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Outline
1 Introduction
2 Error rate
3 Influence functions of the error rate
First order influence function
Second order influence function
Asymptotic relative classification efficiencies
4 Conclusions
Conclusions
Future research
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Future research
Generalized trimmed 2-means procedure : for α ∈ [0, 1],(T1(F ), T2(F )) are solutions of
minA:F (A)=1−α
mint1,t2⊂R
∫
A
Ω
(
inf1≤j≤2
‖x − tj‖
)
dF (x)
(Cuesta-Albertos, Gordaliza, and Matrán, 1997);
Other robustness properties of the generalized 2-meansmethod defined with nondecreasing penalty functioninstead of strictly increasing;
More than 1 dimension and/or more than 2 groups.
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Thank you for your attention!
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Bibliography
Anderson T.W., An Introduction to Multivariate Statistical
Analysis, Wiley, New-York, 1958, pp. 126-133.
Croux C., Filzmoser P., and Joossens K. (2008),Classification efficiences for robust linear discriminantanalysis, Statistica Sinica 18, pp. 581-599.
Croux C., Haesbroeck G., and Joossens K. (2008), Logisticdiscrimination using robust estimators : an influencefunction approach, The Canadian Journal of Statistics, 36,pp. 157-174.
Fernholz L. T., On multivariate higher order von Misesexpansions in Metrika 2001, vol. 53, pp. 123-140.
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Bibliography
García-Escudero L. A., and Gordaliza A., RobustnessProperties of k Means and Trimmed k Means, Journal of
the American Statistical Association, September 1999, Vol.94, n 447, pp. 956-969.
Hampel F.R., Ronchetti E.M., Rousseeuw P.J., and StahelW.A., Robust Statistics : The Approach Based on InfluenceFunctions, John Wiley and Sons, New-York, 1986.
Hand D.J., Daly F., Lunn A.D., McConway K.J., andOstrowski E.,A Handbook of Small Data Sets, Chapmanand Hall, London, 1991.
Influencefunction of
the error rateof
classificationbased onclustering
Ch. Ruwet
Introduction
Error rate
IF of the
error rate
Conclusion
Conclusion
Futureresearch
Bibliography
Pollard D., Strong Consistency of k-Means Clustering, The
Annals of Probability, 1981, Vol.9, nř4, pp.919-926.
Pollard D., A Central Limit Theorem for k-MeansClustering, The Annals of Probability, 1982, Vol.10, nř1,pp.135-140.
Qiu D. and Tamhane A. C. (2007), A comparative study ofthe k-means algorithm and the normal mixture model forclustering : Univariate case, Journal of Statistical Planning
and Inference, 137, pp. 3722-3740.