Influence function of the error rate of classification …...Influence function of the error rate of...

55
Influence function of the error rate of classification based on clustering Ch. Ruwet Introduction Error rate IF of the error rate Conclusion Influence function of the error rate of classification based on clustering Joint work with G. Haesbroeck Ch. Ruwet Department of Mathematics - University of Liège 19 May 2009 [email protected]

Transcript of Influence function of the error rate of classification …...Influence function of the error rate of...

Page 1: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Influence function of the error rate of

classification based on clustering

Joint work with G. Haesbroeck

Ch. Ruwet

Department of Mathematics - University of Liège

19 May 2009

[email protected]

Page 2: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 3: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 4: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Classification setting

Suppose

X ∼ F arises from G1 and G2 with πi (F ) = IPF [X ∈ Gi ]

then

F is a mixture of two distributions

F = π1(F )F1 + π2(F )F2

with density f = π1(F )f1 + π2(F )f2.

Additional assumption : one dimension !

Page 5: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

The generalized 2-means clustering method

Aim of clustering : Find estimations C1(F ) and C2(F )(called clusters) of the two underlying groups.

The clusters’ centers (T1(F ), T2(F )) are solutions of

mint1,t2⊂R

Ω

(

inf1≤j≤2

‖x − tj‖

)

dF (x)

for a suitable nondecreasing penalty functionΩ : R

+ → R+.

Classical penalty functions :

Ω(x) = x2 → 2-means method

Ω(x) = x → 2-medoids method

Page 6: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Classification rule

The classification rule is

RF (x) = Cj(F ) ⇔ Ω(‖x−Tj(F )‖) = min1≤i≤2

Ω(‖x−Ti (F )‖)

In one dimension, the clusters are simply :

C1(F ) =] −∞, C (F )[

C2(F ) =]C (F ), +∞[

where C (F ) =T1(F ) + T2(F )

2is the cut-off point.

Page 7: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example (from Hand et al., 1991)

1988 Olympic games in long jump

Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

Page 8: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example

1988 Olympic games in long jump

Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

Page 9: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

Page 10: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-means

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

Page 11: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-medoids

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

Page 12: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 13: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example

1988 Olympic games in long jump

Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

MenWomen

Page 14: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example with the 2-means

1988 Olympic games in long jump

Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

MenWomen

Page 15: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example with the 2-means

1988 Olympic games in long jump

Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

MenWomen

Page 16: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Example with the 2-means

1988 Olympic games in long jump

Length of jump

5.0 5.5 6.0 6.5 7.0 7.5

MenWomen

ER=0.1

Page 17: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

MenWomen

Page 18: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-means

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

MenWomen

Page 19: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-means

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

MenWomen

Page 20: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-means

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

MenWomen

ER=0.45

Page 21: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Contaminated example with the 2-medoids

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

MenWomen

ER=0.12

Page 22: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Error rate

Training sample according to F : estimation of the rule

Test sample according to Fm : evaluation of the rule

In ideal circumstances : F = Fm

ER(F , Fm) =

2∑

j=1

πj(Fm)IPFm[RF (X ) 6= Cj(F )|Gj ]

Page 23: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Optimality in classification

A classification rule is optimal if the corresponding errorrate is minimal

The optimal classification rule is the Bayes rule :

x ∈ C1(F ) ⇔ π1(F )f1(x) > π2(F )f2(x)

(Anderson, 1958)

The 2-means procedure is optimal under the model

FN = 0.5 N(µ1, σ2) + 0.5 N(µ2, σ

2) with µ1 < µ2

(Qiu and Tamhane, 2007)

Page 24: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 25: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 26: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Definition and properties of the first order influence

function

Hampel et al (1986) : For any statistical functional T and anydistribution F ,

IF(x ; T, F ) = limε→0

T(Fε) − T(F )

ε=

∂εT(Fε)

ε=0

where

Fε = (1 − ε)F + ε∆x

(under condition of existence);

EF [IF(X ; T, F )] = 0;

T(Fε) ≈ T(F ) + εIF(x ; T, F ) for ε small enough (Firstorder von Mises expansion of T at F ).

Page 27: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Definition and properties of the first order influence

function

Hampel et al (1986) : For any statistical functional T and anydistribution F ,

IF(x ; T, F ) = limε→0

T(Fε) − T(F )

ε=

∂εT(Fε)

ε=0

where

Fε = (1 − ε)F + ε∆x

(under condition of existence);

EF [IF(X ; T, F )] = 0;

T(Fε) ≈ T(F ) + εIF(x ; T, F ) for ε small enough (Firstorder von Mises expansion of T at F ).

Page 28: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

First order influence function of the error rate

Now, the training sample is distributed as Fε which is acontaminated mixture.

ER(Fε, Fm) =2

j=1

πj(Fm)IPFm[RFε

(X ) 6= Cj(Fε)|Gj ]

= π1(Fm)1 − Fm,1(C (Fε)) + π2(Fm)Fm,2(C (Fε))

Page 29: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

First order influence function of the error rate

Now, the training sample is distributed as Fε which is acontaminated mixture.

ER(Fε, Fm) =2

j=1

πj(Fm)IPFm[RFε

(X ) 6= Cj(Fε)|Gj ]

= π1(Fm)1 − Fm,1(C (Fε)) + π2(Fm)Fm,2(C (Fε))

ER(Fε, FN) ≈ ER(FN , FN) + εIF(x ; ER, FN)

ER(Fε, FN) ≥ ER(FN , FN)

Page 30: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

First order influence function of the error rate

Now, the training sample is distributed as Fε which is acontaminated mixture.

ER(Fε, Fm) =2

j=1

πj(Fm)IPFm[RFε

(X ) 6= Cj(Fε)|Gj ]

= π1(Fm)1 − Fm,1(C (Fε)) + π2(Fm)Fm,2(C (Fε))

ER(Fε, FN) ≈ ER(FN , FN) + εIF(x ; ER, FN)

ER(Fε, FN) ≥ ER(FN , FN)

⇒ IF(x ; ER, FN) ≡ 0

Page 31: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

First order influence function of the error rate

Proposition

The influence function of the error rate of the generalized

2-means classification procedure is given by

IF(x ; ER, F ) =1

2IF(x ; T1, F ) + IF(x ; T2, F )

π2(F )f2(C (F )) − π1(F )f1(C (F ))

for all x 6= C (F ).

Expressions of IF(x ; T1, F ) and IF(x ; T2, F ) were computed byGarcía-Escudero and Gordaliza (1999).

Page 32: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

New contaminated example with the 2-medoids

1988 Olympic games in long jump

Length of jump

0 2 4 6 8

Page 33: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Penalty functions

2-means : Ω(x) = x2

2-medoids : Ω(x) = x

2-Tukey’s : Ω(x) = b2

6

1 −[

1 −(

xb

)2]3

if |x | ≤ b

1 if |x | > b

with b = 2.795

0 1 2 3 4

01

23

4

2−means2−medoids2−Tukey’s

Ω(x

)

x

Page 34: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Model distributions

F = π1 N(−∆/2, 1) + (1 − π1) N(∆/2, 1)

with

π1 = 0.4 and ∆ = 3

π1 is varying and ∆ = 3

π1 = 0.4 and ∆ is varying

Page 35: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Graph of IF(x ; ER, F )

−4 −2 0 2 4

−0.

2−

0.1

0.0

0.1

0.2

x

IF E

R

2−means2−medoids2−Tukey’s

Page 36: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Graph of IF(x ; ER, F )

−4 −2 0 2 4

−2.

0−

1.0

0.0

0.5

2−means

x

IF E

R

π1=0.4

π1=0.6

π1=0.2

π1=0.8

Page 37: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Graph of IF(x ; ER, F )

−4 −2 0 2 4

−0.

2−

0.1

0.0

0.1

0.2

2−medoids

x

IF E

R∆=1

∆=3

∆=5

Page 38: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 39: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Definition of the second order influence function

Under the model FN , IF(x ; ER, FN) ≡ 0

One needs to go a step further !

Page 40: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Definition of the second order influence function

Under the model FN , IF(x ; ER, FN) ≡ 0

One needs to go a step further !

For any statistical functional T and any distribution F ,

IF2(x ; T, F ) =∂2

∂ε2T(Fε)

ε=0

where Fε = (1 − ε)F + ε∆x (under condition of existence).

Page 41: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Definition of the second order influence function

Under the model FN , IF(x ; ER, FN) ≡ 0

One needs to go a step further !

For any statistical functional T and any distribution F ,

IF2(x ; T, F ) =∂2

∂ε2T(Fε)

ε=0

where Fε = (1 − ε)F + ε∆x (under condition of existence).Second order von Mises expansion of ER at FN :

ER(Fε, FN) ≈ ER(FN , FN) +ε2

2IF2(x ; T, FN)

for ε small enough.

Page 42: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Second order influence function of the error rate

under optimality

Proposition

Under the optimal model FN , the second order influence

function of the error rate of the generalized 2-means

classification procedure is given by

IF2(x ; ER, FN) = −1

4(fN1)

(

µ1 + µ2

2

)

IF(x ; T1, FN) + IF(x ; T2, FN)2

for all x 6= µ1+µ2

2. This expression is always positive.

Page 43: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 44: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Asymptotic loss

A measure of the expected increase in error rate whenestimating the optimal clustering rule from a finite sample withempirical cdf Fn :

A-Loss = limn→+∞

n EFN[ER(Fn, FN) − ER(FN , FN)].

As in Croux et al. (2008) :

Proposition

Under some regularity conditions of the clusters’centers

estimators,

A-Loss =1

2EFN

[IF2(X ; ER, FN)]

Page 45: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Asymptotic relative classification efficiencies

A measure of the price one needs to pay in error rate forprotection against the outliers when using a robust procedureinstead of the classical one :

ARCE(Robust,Classical) =A-Loss(Classical)

A-Loss(Robust).

Page 46: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

First order IF

Second orderIF

ARCE

Conclusion

Graph of ARCE

FN = 0.5 N(−∆, 1) + 0.5 N(∆, 1)

0 2 4 6 8

0.55

0.60

0.65

0.70

0.75

0.80

0.85

AR

CE

2−medoids2−Tukey’s

Page 47: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 48: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 49: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Conclusion

The generalized 2-means procedure can give a more robustestimator of the error rate with a good choice of thepenalty function;

The price to pay is a loss in efficiency (depending also onthe penalty function).

Page 50: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Outline

1 Introduction

2 Error rate

3 Influence functions of the error rate

First order influence function

Second order influence function

Asymptotic relative classification efficiencies

4 Conclusions

Conclusions

Future research

Page 51: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Future research

Generalized trimmed 2-means procedure : for α ∈ [0, 1],(T1(F ), T2(F )) are solutions of

minA:F (A)=1−α

mint1,t2⊂R

A

Ω

(

inf1≤j≤2

‖x − tj‖

)

dF (x)

(Cuesta-Albertos, Gordaliza, and Matrán, 1997);

Other robustness properties of the generalized 2-meansmethod defined with nondecreasing penalty functioninstead of strictly increasing;

More than 1 dimension and/or more than 2 groups.

Page 52: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Thank you for your attention!

Page 53: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Bibliography

Anderson T.W., An Introduction to Multivariate Statistical

Analysis, Wiley, New-York, 1958, pp. 126-133.

Croux C., Filzmoser P., and Joossens K. (2008),Classification efficiences for robust linear discriminantanalysis, Statistica Sinica 18, pp. 581-599.

Croux C., Haesbroeck G., and Joossens K. (2008), Logisticdiscrimination using robust estimators : an influencefunction approach, The Canadian Journal of Statistics, 36,pp. 157-174.

Fernholz L. T., On multivariate higher order von Misesexpansions in Metrika 2001, vol. 53, pp. 123-140.

Page 54: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Bibliography

García-Escudero L. A., and Gordaliza A., RobustnessProperties of k Means and Trimmed k Means, Journal of

the American Statistical Association, September 1999, Vol.94, n 447, pp. 956-969.

Hampel F.R., Ronchetti E.M., Rousseeuw P.J., and StahelW.A., Robust Statistics : The Approach Based on InfluenceFunctions, John Wiley and Sons, New-York, 1986.

Hand D.J., Daly F., Lunn A.D., McConway K.J., andOstrowski E.,A Handbook of Small Data Sets, Chapmanand Hall, London, 1991.

Page 55: Influence function of the error rate of classification …...Influence function of the error rate of classification based on ... ... A

Influencefunction of

the error rateof

classificationbased onclustering

Ch. Ruwet

Introduction

Error rate

IF of the

error rate

Conclusion

Conclusion

Futureresearch

Bibliography

Pollard D., Strong Consistency of k-Means Clustering, The

Annals of Probability, 1981, Vol.9, nř4, pp.919-926.

Pollard D., A Central Limit Theorem for k-MeansClustering, The Annals of Probability, 1982, Vol.10, nř1,pp.135-140.

Qiu D. and Tamhane A. C. (2007), A comparative study ofthe k-means algorithm and the normal mixture model forclustering : Univariate case, Journal of Statistical Planning

and Inference, 137, pp. 3722-3740.