Genetic Approach and HOTELLING Criterion for Selecting the ... · VI. THE PROPOSED CRITERION We...

4
Genetic Approach and HOTELLING Criterion for Selecting the Optimal Attribute Vector: Application to a Supervised Classification of Texture Images M. Nasri and M. EL Hitmy University of Mohamed I, EST, Labo MATSI, B.P 473, OUJDA, MOROCCO [email protected] AbstractSelecting the parameters for the classification is a delicate procedure. We present in this work a new method for selecting the parameters based on the genetic approach which optimizes the choice of parameters by minimizing a cost function. This function is defined by a new criterion that we have proposed. The proposed approach is validated on some texture images. The experimental results show the good performance of the proposed method. I. INTRODUCTION texture is spatial repetition of a basic pattern in different directions [1]. The classification [2] of texture images requires a robust selection of the specific parameters [3][4]. Selecting the parameters for the classification is a delicate procedure [5][6][7][8][9][10][11]. There exist various methods for selecting the parameters [1]. We present in this work a new method for selecting the parameters [12][13]. II. TEXTURE PARAMETERS Various methods of texture characterization of images has been proposed. HARALICK has proposed 14 descriptive texture parameters all extracted from the cooccurrence matrix. Most frequently used are [14]: Hom, HomL, Ent, Uni, Dir, Cont and Cor. GALLOWAY has defined 5 parameters for characterizing textures from the matrix of run length, which are [15]: SRE, LRE, GLN, RLN and RP. III. GENETIC ALGORITHMS Genetic algorithms are techniques for optimizing functions. GA are based on the evolution of a population of solutions which under the action of some precise rules optimize a given behavior, which initially has been formulated by a given specified function fitness function [16]. A GA manipulates a population of constant size. This population is formed by c hromosomes. Each chromosome represents the coding of a potential solution to the problem to be solved, it is formed by a set of genes belonging to an alphabet [17]. At each iteration is created a new population by applying the genetic operators: se lection, crossover and mutation [18]. The algorithm chooses in selection the most pertinent candidates. Crossover consists in building 2 new chromosomes from 2 old ones referred to as the parents. Mutation realizes the inversion of one or several genes in a chromosome [18]. Random generation of the initial population Fitness evaluation of each chromosome Repeat Selection Crossover Mutation Fitness evaluation of each chromosome Until Satisfying the stop criterion Fig. 1. Basic genetic algorithm. IV. PARAMETERS SELECTION PROBLEM FORMULATION A. Descriptive elements Let consider a set of M texture images {I i } 1£i £M characterized by N texture parameters written in a line vector V = ( a j ). Let R i = ( a ij ) 1j N be a line vector of R N where a ij is the value of a j for I i . Let E V = {a j } 1£j £N . Let mat_va : N j M i ij a va mat = 1 1 ) ( _ (1) V is the attribute vector , R i is the realization of V for I i , R N is the parameter space [2][1][10], E V is the set associated with V and mat_va is the observation matrix associated with V. The i th line of mat_va is R i . Each R i belongs to the class CL s , s=1, …, C. Let j a col a vector of M dimension associated with a j . Let j a m be the mean value and let j a s be the standard deviation of a j : ( M i ij a a col j = 1 (2) = = M i ij a a M m j 1 1 (3) 2 / 1 1 ² 1 = - = M i m ij a M j a j a s (4) The linear correlation coefficient between a j and a l is : l j l j a a a a l j col col a a r s s ) , cov( ) , ( = (5) Let ) , ( s j CL a m be the mean value and let ) , ( s j CL a s be the standard deviation of a j of the CL s class realizations: = s i s j CL R i ij CL a a s CL card m ) ( 1 ) , ( (6) 2 / 1 ² ) , ( ) ( 1 ) , ( - = s CL i R i CL a m ij a s CL card s j s j CL a s (7) A

Transcript of Genetic Approach and HOTELLING Criterion for Selecting the ... · VI. THE PROPOSED CRITERION We...

Page 1: Genetic Approach and HOTELLING Criterion for Selecting the ... · VI. THE PROPOSED CRITERION We have improved the HOTELLING criterion by introducing the correlation as an additional

Genetic Approach and HOTELLING Criterion for Selecting the Optimal Attribute

Vector: Application to a Supervised Classification of Texture Images

M. Nasri and M. EL Hitmy University of Mohamed I, EST, Labo MATSI, B.P 473, OUJDA, MOROCCO

[email protected]

Abstract—Selecting the parameters for the classification is a delicate procedure. We present in this work a new method for selecting the parameters based on the genetic approach which optimizes the choice of parameters by minimizing a cost function. This function is defined by a new criterion that we have proposed. The proposed approach is validated on some texture images. The experimental results show the good performance of the proposed method.

I. INTRODUCTION

texture is spatial repetition of a basic pattern in different directions [1]. The classification [2] of texture

images requires a robust selection of the specific parameters [3][4]. Selecting the parameters for the classification is a delicate procedure [5][6][7][8][9][10][11]. There exist various methods for selecting the parameters [1]. We present in this work a new method for selecting the parameters [12][13].

II. TEXTURE PARAMETERS

Various methods of texture characterization of images has been proposed. HARALICK has proposed 14 descriptive texture parameters all extracted from the cooccurrence matrix. Most frequently used are [14]: Hom, HomL, Ent, Uni, Dir, Cont and Cor. GALLOWAY has defined 5 parameters for characterizing textures from the matrix of run length, which are [15]: SRE, LRE, GLN, RLN and RP.

III. GENETIC ALGORITHMS

Genetic algorithms are techniques for optimizing functions. GA are based on the evolution of a population of solutions which under the action of some precise rules optimize a given behavior, which initially has been formulated by a given specified function fitness function [16]. A GA manipulates a population of constant size. This population is formed by chromosomes. Each chromosome represents the coding of a potential solution to the problem to be solved, it is formed by a set of genes belonging to an alphabet [17]. At each iteration is created a new population by applying the genetic operators: selection, crossover and mutation [18]. The algorithm chooses in selection the most pertinent candidates. Crossover consists in building 2 new chromosomes from 2 old ones referred to as the parents. Mutation realizes the inversion of one or several genes in a chromosome [18].

Random generation of the initial population Fitness evaluation of each chromosome Repeat Selection Crossover

Mutation Fitness evaluation of each chromosome Until Satisfying the stop criterion

Fig. 1. Basic genetic algorithm.

IV. PARAMETERS SELECTION PROBLEM FORMULATION

A. Descriptive elements Let consider a set of M texture images {Ii}1≤i≤M characterized by N texture parameters written in a line vector V = (aj). Let Ri = (aij) 1≤j≤N be a line vector of RN where aij is the value of aj for Ii. Let EV = {aj}1≤j≤N. Let mat_va :

NjMiijavamat

≤≤≤≤=

11)(_ (1)

V is the attribute vector, Ri is the realization of V for Ii, RN

is the parameter space [2][1][10], EV is the set associated with V and mat_va is the observation matrix associated with V. The ith line of mat_va is Ri. Each Ri belongs to the class

CLs, s=1, …, C. Let jacol a vector of M dimension

associated with aj. Let jam be the mean value and let

jaσ be

the standard deviation of aj :

( )Mi

ija acol j≤≤

=1

(2)

∑=

=M

iija aMm j

1

1

(3)

2/1

1

²1

∑=

−=

M

imijaM jajaσ

(4)

The linear correlation coefficient between aj and al is :

lj

lj

aa

aalj

colcolaar

σσ),cov(

),( =

(5)

Let ),( sj CLam be the mean value and let ),( sj CLaσ be the

standard deviation of aj of the CLs class realizations:

∑∈

=

si

sj

CLRi

ijCLa asCLcardm )(

1),(

(6)

2/1

²

),()(1

),(

−=

sCLiRi CLamija

sCLcard sjsj CLaσ

(7)

A

Page 2: Genetic Approach and HOTELLING Criterion for Selecting the ... · VI. THE PROPOSED CRITERION We have improved the HOTELLING criterion by introducing the correlation as an additional

Let V(m) be the vector formed by the mean value of each aj

and let ),( sCLmV be the vector formed by the mean value of

each aj of the CLs class realizations:

( )Njam

jmV

≤≤=

1)(

(8)

( )NjCLaCLm sjs

mV≤≤

=1),(),(

(9)

The total variance matrix TV associated with V is [1] [12]:

( ) ( )t)(

1)( 1

mti

M

im

tiV

VRVRMT −−= ∑=

(10)

The intra-class matrix WV associated with V is [1] [12]:

( )( )∑ ∑=

−−=C

sCLRi

t

CLmCLmV

si

ssVt

iRVtiRMW

1),(),(

1

(11)

The inter-class matrix associated with V is BV = TV - WV. B. Formulation of the optimization problem We start with Vinit = (a1 a2 ... aj ... aN), and with M realizations Ri of this vector. Each Ri, associated with Ii, belongs to CLs, s=1,…, C. The CLs classes are :

( ) 11

ssbisb

i CLR ∈≤≤−+

(12)

where b0 = 0, bC =M b1,b2, ..., bC-1 : integers delimiting the C classes.

The objective is to extract, among the N parameters (aj)1≤j≤N of Vinit, the q (where q<<N) most pertinent parameters in the sense of the classification performances. The q parameters retained constitute Vopt.

V. THE HOTELLING CRITERION HOTELLING criterion was proposed by HOTELLING in 1951 [13][7] [12]. This criterion estimates the discriminate power of the set of parameters based on a measurement of the classes separability and compacity:

)(),( 1−=kk VVk WBtraceVqJHot

(13)

with Vk of q dimension extracted from Vinit.JHot is to be maximized [12], it was used in [6].

VI. THE PROPOSED CRITERION

We have improved the HOTELLING criterion by introducing the correlation as an additional factor for selecting the parameters. We have thus proposed the following criterion J:

=

ljkV

Elajalajar

kVW

kVBtrace

VqJ k

2),(),(

)1(

),(

(14)

J is to be maximized.

VII. GENETIC ALGORITHM STRATEGY FOR THE ATTRIBUTE VECTOR OPTIMIZATION CASE

A. The proposed coding Vk may have one to N components, then for each Vk is associated a chromosome chrk of N binary genes gkj:

Njkjk gchr ≤≤= 1)(

(15)

where

= otherwise 0

if 1kVEja

kjg (16)

Vk may not be a possible solution unless if:

qgVDimN

jkjk ==∑

=1)(

(17)

B. The proposed fitness functions We define the following fitness functions corresponding to two criteria JHot and J :

)1(1)(

−=

kk VWVBtracechrF kHot

(18)

)1(

2),(

),(

)(−

=

kk VWVBtrace

ljkV

Elajalajar

chrF k

(19)

Vk is the optimal solution according to the J (respectively JHot) criterion if F(chrk) (respectively FHot(chrk)) is minimal.

VIII. EXPERIMENTAL RESULTS AND EVALUATIONS For both experiments, the value of q retained is 2 and Vinit is : ( )RPRLNGLNLRESREContDirUniEntHomLHominitV = (20) A. First experiment Each class contains 40 texture images.

I1

1≤i≤40, Ii ∈CL1

I41

41≤i≤80, Ii ∈CL2

I81 81≤i≤120, Ii ∈ CL3

Fig. 2. Three test images representing the three classes of textures.

0 2 4 6 8 10 12 14 16 18 200.03

0.04

0.05

0.06

0.07

0.08

0.09

Generation

F Hot

Fig. 3. Fitness evolution for the JHot criterion.

Page 3: Genetic Approach and HOTELLING Criterion for Selecting the ... · VI. THE PROPOSED CRITERION We have improved the HOTELLING criterion by introducing the correlation as an additional

0 2 4 6 8 10 12 14 16 18 200.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Generation

F

Fig. 4. Fitness evolution for the J criterion.

Criterion JHot J Vopt (Uni Cont) (Uni SRE)

r 0.924 0.532 Tab. 1. Parameters space and correlation coefficient.

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7

x 10-5

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

Uni

Class CL1Class CL2Class CL3

Cont

Fig. 5. Representation of the observations in (Uni Cont).

1.9 2.4 2.9 3.4 3.9 4.4 4.9 5.4 5.9 6.4 6.9

x 10-5

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

Uni

Class CL1Class CL2Class CL3

SRE

Fig. 6. Representation of the observations in (Uni SRE).

2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7

x 10-5

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

Uni

Center Class CL1Class CL2Class CL3

Cont

Fig. 7. Classes obtained in the (Uni Cont) space.

1.9 2.4 2.9 3.4 3.9 4.4 4.9 5.4 5.9 6.4 6.9

x 10-5

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

Uni

Center Class CL1Class CL2Class CL3

SRE

Fig. 8. Classes obtained in the (Uni SRE) space.

Criterion JHot J Number of misclassified observations 0 1

Error rate τ (%) 0 0.083 Tab. 2. Error rate of classification in each space.

The discrimination of the 3 classes for J is realized without information redundancy.

B. Second experiment Each class contains 40 texture images.

I1

1≤i≤40 , Ii ∈CL1

I41

41≤i≤80, Ii ∈CL2

I81 81≤i≤120, Ii ∈CL3

Fig. 9. Three test images representing the three classes of textures.

0 2 4 6 8 10 12 14 16 18 200.015

0.02

0.025

0.03

0.035

0.04

Generation

F Hot

Fig. 10. Fitness evolution for the JHot criterion.

0 2 4 6 8 10 12 14 16 18 200.012

0.014

0.016

0.018

0.02

0.022

0.024

0.026

0.028

0.03

0.032

Generation

F

Fig. 11. Fitness evolution for the J criterion.

Criterion JHot J Vopt (Uni GLN) (Uni Cont)

r 0.894 0.757

Page 4: Genetic Approach and HOTELLING Criterion for Selecting the ... · VI. THE PROPOSED CRITERION We have improved the HOTELLING criterion by introducing the correlation as an additional

Tab. 3. Parameters space and correlation coefficient.

0 1 2 3 4 5 6 7

x 10-5

10

15

20

25

30

35

40

45

50

55

Uni

Class CL1Class CL2Class CL3

GLN

Fig. 12. Representation of the observations in (Uni GLN).

0 2 4 6 8

x 10-5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Uni

Class CL1Class CL2Class CL3

Cont

Fig. 13. Representation of the observations in (Uni Cont).

0 1 2 3 4 5 6 7

x 10-5

10

15

20

25

30

35

40

45

50

55

Uni

Center Class CL1Class CL2Class CL3

GLN

Fig. 14. Classes obtained in the (Uni GLN) space.

0 2 4 6 8

x 10-5

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Uni

Center Class CL1Class CL2Class CL3

Cont

Fig. 15. Classes obtained in the (Uni Cont) space.

Criterion JHot J Number of misclassified observations 0 0

Error rate τ (%) 0 0 Tab. 4: Error rate of classification in each space.

With respect to the separability, J and JHot behave the same way. With respect to the compacity J behaves in a better way than JHot. J has obtained a space for which the parameters are less correlated than those obtained by JHot.

IX. CONCLUSION

We have presented a new selection criterion for the parameters inspired from the HOTELLING criterion and the correlation approach. We have used the genetic technique for optimizing the two criteria. We have performed two experiments on some texture images. The experimental results confirm the good performance of our approach with respect to the HOTELLING criterion.

REFERENCES

[1] T. P. Cocquerez and S. Phillip, Analyse d'images : filtrage et segmentation, Editions MASSON, Paris, 1995.

[2] D. Hamad, S. El Saad and J.G. Postaire, "Algorithmes d'apprentissage compétitif pour la classification automatique", TISVA'98, Oujda, Maroc, 27-28 Avril 1998. pp. 159-164.

[3] M. Dash and H. Liu, "Feature selection for classification", Intelligent Data Analysis, vol. 1, N°3, east.elsevier.com/ida/browse/0103/ida0103/article.htm, 1997.

[4] S. De Backer, A. Naud and P. Scheunders, "Non linear dimensionality reduction techniques for unsupervised feature extraction", Pattern Recognition Letters, Vol. 19, N°8, pp. 711-720, 1998.

[5] I. Ludimla et al., "Selection of cluster prototypes from data by a genetic algorithm", EUFIT'97, Aachaen, Germany, pp. 1683-1688, September 1997.

[6] C. Firmin, D. Hamad, J. G. Postaire and R. D. Zhang, " Feature extraction and selection for fault detection in production of glass bottles", Machine Graphics and Vision International Journal, Vol. 5, N°1, pp. 77-86, 1996.

[7] R. Tomassone, M. Danzart, J. J. Daudin and J. P. Masson, Discrimination et classement, Techniques stochastiques, Editions MASSON, Paris, 1988.

[8] J. Kittler, Feature selection and extraction, Handbook of Pattern Recognition and Image Processing, T.Y. Young et K. S. Fu ed., pp. 59-83, Academic Press, Orlando, FL, 1986.

[9] J. M. Romeder, Méthodes et programmes d’analyse discriminante, Editions DUNOD, Paris, 1973.

[10] J. G. Postaire, De l'image à la décision : analyse des images numériques et théorie de la décision, Editions DUNOD, 1987.

[11] R.O. Duda and P.E. Hart, Pattern classification and scene analysis, John Wiley and Sons, New York, 1973.

[12] N . Vandenbroucke, "Segmentation d’images couleur par classification de pixels dans des espaces d’attributs colorimétriques adaptés : Application à l’analyse d’images de football", Thèse de Doctorat, Université de Lille 1, France, 14 Décembre 2000.

[13] E. Diday, J. Lemaire, J. Pouget and F. Testu, Eléments d’analyse de données, Editions DUNOD, Paris, 1982.

[14]

R. M. Haralick, "Statistical and structural approaches to texture", Proceedings of the IEEE, vol. 67, N° 5, pp. 786-804, mai 1979.

[15] M. M. Galloway, "Texture analysis using gray level run lengths", Computer Graphics and Image Processing (CGIP), Vol. 4, pp. 172-179, 1975.

[16] E. Lutton, "Algorithmes génétiques et fractals", Dossier d'Habilitation à diriger des recherches, Université Paris XI Orsay, 11 Février 1999.

[17] M. Ludovic, "Audit de sécurité par algorithmes génétiques", Thèse de Doctorat, Université de Rennes 1, France, 7 Juillet 1994.

[18] J. M. Renders, Algorithmes génétiques et réseaux de neurons, Editions HERMES, 1995.