Influence function of the error rate of classification …...Influence function of the error rate of...

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Influence function of the error rate of

classification based on clustering

Joint work with G. Haesbroeck

Ch. Ruwet

Department of Mathematics - University of Liège

19 May 2009

[email protected]


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Classification setting

Suppose

X ∼ F arises from G1 and G2 with πi (F ) = IPF [X ∈ Gi ]

then

F is a mixture of two distributions

F = π1(F )F1 + π2(F )F2

with density f = π1(F )f1 + π2(F )f2.

Additional assumption : one dimension !


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

The generalized 2-means clustering method

Aim of clustering : Find estimations C1(F ) and C2(F )(called clusters) of the two underlying groups.

The clusters’ centers (T1(F ), T2(F )) are solutions of

mint1,t2⊂R

∫

Ω

(

inf1≤j≤2

‖x − tj‖

)

dF (x)

for a suitable nondecreasing penalty functionΩ : R

+ → R+.

Classical penalty functions :

Ω(x) = x2 → 2-means method

Ω(x) = x → 2-medoids method


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Classification rule

The classification rule is

RF (x) = Cj(F ) ⇔ Ω(‖x−Tj(F )‖) = min1≤i≤2

Ω(‖x−Ti (F )‖)

In one dimension, the clusters are simply :

C1(F ) =] −∞, C (F )[

C2(F ) =]C (F ), +∞[

where C (F ) =T1(F ) + T2(F )

2is the cut-off point.


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example (from Hand et al., 1991)

1988 Olympic games in long jump

Length of jump

5.0 5.5 6.0 6.5 7.0 7.5


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example


Length of jump

5.0 5.5 6.0 6.5 7.0 7.5


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example


Length of jump

0 2 4 6 8


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-means


Length of jump

0 2 4 6 8


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-medoids


Length of jump

0 2 4 6 8


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Outline

1 Introduction

2 Error rate





4 Conclusions

Conclusions

Future research


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example


Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

MenWomen


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example with the 2-means


Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

MenWomen


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example with the 2-means


Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

MenWomen

ER=0.1


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example


Length of jump

0 2 4 6 8

MenWomen


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion



Length of jump

0 2 4 6 8

MenWomen


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion



Length of jump

0 2 4 6 8

MenWomen

ER=0.45


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-medoids


Length of jump

0 2 4 6 8

MenWomen

ER=0.12


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Error rate

Training sample according to F : estimation of the rule

Test sample according to Fm : evaluation of the rule

In ideal circumstances : F = Fm

ER(F , Fm) =

2∑

j=1

πj(Fm)IPFm[RF (X ) 6= Cj(F )|Gj ]


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Optimality in classification

A classification rule is optimal if the corresponding errorrate is minimal

The optimal classification rule is the Bayes rule :

x ∈ C1(F ) ⇔ π1(F )f1(x) > π2(F )f2(x)

(Anderson, 1958)

The 2-means procedure is optimal under the model

FN = 0.5 N(µ1, σ2) + 0.5 N(µ2, σ

2) with µ1 < µ2

(Qiu and Tamhane, 2007)


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Outline

1 Introduction

2 Error rate





4 Conclusions

Conclusions

Future research


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Definition and properties of the first order influence

function

Hampel et al (1986) : For any statistical functional T and anydistribution F ,

IF(x ; T, F ) = limε→0

T(Fε) − T(F )

ε=

∂

∂εT(Fε)

∣

∣

∣

∣

ε=0

where

Fε = (1 − ε)F + ε∆x

(under condition of existence);

EF [IF(X ; T, F )] = 0;

T(Fε) ≈ T(F ) + εIF(x ; T, F ) for ε small enough (Firstorder von Mises expansion of T at F ).


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

First order influence function of the error rate

Now, the training sample is distributed as Fε which is acontaminated mixture.

ER(Fε, Fm) =2

∑

j=1

πj(Fm)IPFm[RFε

(X ) 6= Cj(Fε)|Gj ]

= π1(Fm)1 − Fm,1(C (Fε)) + π2(Fm)Fm,2(C (Fε))


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion



ER(Fε, Fm) =2

∑

j=1

πj(Fm)IPFm[RFε

(X ) 6= Cj(Fε)|Gj ]


ER(Fε, FN) ≈ ER(FN , FN) + εIF(x ; ER, FN)

ER(Fε, FN) ≥ ER(FN , FN)


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion



ER(Fε, Fm) =2

∑

j=1

πj(Fm)IPFm[RFε

(X ) 6= Cj(Fε)|Gj ]


ER(Fε, FN) ≈ ER(FN , FN) + εIF(x ; ER, FN)

ER(Fε, FN) ≥ ER(FN , FN)

⇒ IF(x ; ER, FN) ≡ 0


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion


Proposition

The influence function of the error rate of the generalized

2-means classification procedure is given by

IF(x ; ER, F ) =1

2IF(x ; T1, F ) + IF(x ; T2, F )

π2(F )f2(C (F )) − π1(F )f1(C (F ))

for all x 6= C (F ).

Expressions of IF(x ; T1, F ) and IF(x ; T2, F ) were computed byGarcía-Escudero and Gordaliza (1999).


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

New contaminated example with the 2-medoids


Length of jump

0 2 4 6 8


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Penalty functions

2-means : Ω(x) = x2

2-medoids : Ω(x) = x

2-Tukey’s : Ω(x) = b2

6

1 −[

1 −(

xb

)2]3

if |x | ≤ b

1 if |x | > b

with b = 2.795

0 1 2 3 4

01

23

4

2−means2−medoids2−Tukey’s

Ω(x

)

x


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Model distributions

F = π1 N(−∆/2, 1) + (1 − π1) N(∆/2, 1)

with

π1 = 0.4 and ∆ = 3

π1 is varying and ∆ = 3

π1 = 0.4 and ∆ is varying


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Graph of IF(x ; ER, F )

−4 −2 0 2 4

−0.

2−

0.1

0.0

0.1

0.2

x

IF E

R

2−means2−medoids2−Tukey’s


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion


−4 −2 0 2 4

−2.

0−

1.0

0.0

0.5

2−means

x

IF E

R

π1=0.4

π1=0.6

π1=0.2

π1=0.8


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion


−4 −2 0 2 4

−0.

2−

0.1

0.0

0.1

0.2

2−medoids

x

IF E

R∆=1

∆=3

∆=5


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Outline

1 Introduction

2 Error rate





4 Conclusions

Conclusions

Future research


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Definition of the second order influence function

Under the model FN , IF(x ; ER, FN) ≡ 0

One needs to go a step further !


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion




For any statistical functional T and any distribution F ,

IF2(x ; T, F ) =∂2

∂ε2T(Fε)

∣

∣

∣

∣

ε=0

where Fε = (1 − ε)F + ε∆x (under condition of existence).


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion




For any statistical functional T and any distribution F ,

IF2(x ; T, F ) =∂2

∂ε2T(Fε)

∣

∣

∣

∣

ε=0

where Fε = (1 − ε)F + ε∆x (under condition of existence).Second order von Mises expansion of ER at FN :

ER(Fε, FN) ≈ ER(FN , FN) +ε2

2IF2(x ; T, FN)

for ε small enough.


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Second order influence function of the error rate

under optimality

Proposition

Under the optimal model FN , the second order influence

function of the error rate of the generalized 2-means

classification procedure is given by

IF2(x ; ER, FN) = −1

4(fN1)

′

(

µ1 + µ2

2

)

IF(x ; T1, FN) + IF(x ; T2, FN)2

for all x 6= µ1+µ2

2. This expression is always positive.


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Outline

1 Introduction

2 Error rate





4 Conclusions

Conclusions

Future research


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Asymptotic loss

A measure of the expected increase in error rate whenestimating the optimal clustering rule from a finite sample withempirical cdf Fn :

A-Loss = limn→+∞

n EFN[ER(Fn, FN) − ER(FN , FN)].

As in Croux et al. (2008) :

Proposition

Under some regularity conditions of the clusters’centers

estimators,

A-Loss =1

2EFN

[IF2(X ; ER, FN)]


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion


A measure of the price one needs to pay in error rate forprotection against the outliers when using a robust procedureinstead of the classical one :

ARCE(Robust,Classical) =A-Loss(Classical)

A-Loss(Robust).


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Graph of ARCE

FN = 0.5 N(−∆, 1) + 0.5 N(∆, 1)

0 2 4 6 8

0.55

0.60

0.65

0.70

0.75

0.80

0.85

AR

CE

2−medoids2−Tukey’s

∆


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Outline

1 Introduction

2 Error rate





4 Conclusions

Conclusions

Future research


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Conclusion

The generalized 2-means procedure can give a more robustestimator of the error rate with a good choice of thepenalty function;

The price to pay is a loss in efficiency (depending also onthe penalty function).


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Outline

1 Introduction

2 Error rate





4 Conclusions

Conclusions

Future research


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Future research

Generalized trimmed 2-means procedure : for α ∈ [0, 1],(T1(F ), T2(F )) are solutions of

minA:F (A)=1−α

mint1,t2⊂R

∫

A

Ω

(

inf1≤j≤2

‖x − tj‖

)

dF (x)

(Cuesta-Albertos, Gordaliza, and Matrán, 1997);

Other robustness properties of the generalized 2-meansmethod defined with nondecreasing penalty functioninstead of strictly increasing;

More than 1 dimension and/or more than 2 groups.


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Thank you for your attention!


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Bibliography

Anderson T.W., An Introduction to Multivariate Statistical

Analysis, Wiley, New-York, 1958, pp. 126-133.

Croux C., Filzmoser P., and Joossens K. (2008),Classification efficiences for robust linear discriminantanalysis, Statistica Sinica 18, pp. 581-599.

Croux C., Haesbroeck G., and Joossens K. (2008), Logisticdiscrimination using robust estimators : an influencefunction approach, The Canadian Journal of Statistics, 36,pp. 157-174.

Fernholz L. T., On multivariate higher order von Misesexpansions in Metrika 2001, vol. 53, pp. 123-140.


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Bibliography

García-Escudero L. A., and Gordaliza A., RobustnessProperties of k Means and Trimmed k Means, Journal of

the American Statistical Association, September 1999, Vol.94, n 447, pp. 956-969.

Hampel F.R., Ronchetti E.M., Rousseeuw P.J., and StahelW.A., Robust Statistics : The Approach Based on InfluenceFunctions, John Wiley and Sons, New-York, 1986.

Hand D.J., Daly F., Lunn A.D., McConway K.J., andOstrowski E.,A Handbook of Small Data Sets, Chapmanand Hall, London, 1991.


the error rateof


Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Bibliography

Pollard D., Strong Consistency of k-Means Clustering, The

Annals of Probability, 1981, Vol.9, nř4, pp.919-926.

Pollard D., A Central Limit Theorem for k-MeansClustering, The Annals of Probability, 1982, Vol.10, nř1,pp.135-140.

Qiu D. and Tamhane A. C. (2007), A comparative study ofthe k-means algorithm and the normal mixture model forclustering : Univariate case, Journal of Statistical Planning

and Inference, 137, pp. 3722-3740.

Influence function of the error rate of classification …...Influence function of the error rate of...

Documents

Transcript of Influence function of the error rate of classification …...Influence function of the error rate of...