Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ [email protected].

50
Classification Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ [email protected]
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    1

Transcript of Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ [email protected].

Page 1: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

ClassificationClassification

Adriano Joaquim de O Cruz ©2002

NCE/UFRJ

[email protected]

Page 2: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 2

ClassificationClassification

Technique that associates samples to Technique that associates samples to classes previously known. classes previously known.

May be Crisp or FuzzyMay be Crisp or Fuzzy SupervisedSupervised

MLP MLP trained trained Non supervisedNon supervised

K-NN e fuzzy K-NN K-NN e fuzzy K-NN not trained not trained

Page 3: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 3

K-NN and Fuzzy K-NNK-NN and Fuzzy K-NN

Classification MethodsClassification Methods

Classes identified by patternsClasses identified by patterns

Classifies by the k nearest neighboursClassifies by the k nearest neighbours

Previous knowledge about the problem Previous knowledge about the problem classesclasses

It is not restricted to a specific It is not restricted to a specific distribution of the samplesdistribution of the samples

Page 4: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

Classification

Crisp K-NNCrisp K-NN

Page 5: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 5

Crisp K-NNCrisp K-NN

Supervised clustering method (Classification Supervised clustering method (Classification method).method).

Classes are defined before hand.Classes are defined before hand. Classes are characterized by sets of Classes are characterized by sets of

elements.elements. The number of elements may differ among The number of elements may differ among

classes.classes. The main idea is to associate the sample to The main idea is to associate the sample to

the class containing more neighbours.the class containing more neighbours.

Page 6: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 6

Crisp K-NNCrisp K-NN

ww22ww11

ww33ww1313

ww1010

ww99

ww44

ww55

ww1414

ww1111 ww1212

ww77

ww88

ww66

ss

Class 1Class 1 Class 2Class 2 Class 3Class 3

Class 4Class 4 Class 5Class 5

3 nearest neighbours, and sample 3 nearest neighbours, and sample ss is closest is closest to pattern to pattern ww66 on class 5. on class 5.

Page 7: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 7

Crisp K-NNCrisp K-NN

Consider Consider WW={={ww11, w, w22, ..., w, ..., w

tt}} a set of a set of tt labelled data.labelled data.

Each object Each object wwii is defined by is defined by ll

characteristics characteristics wwii=(=(wwi1i1, w, wi2i2, ..., w, ..., w

ilil).).

Input of Input of yy unclassified elements. unclassified elements. k k the number of closest neighbours of the number of closest neighbours of yy.. EE the set of the set of kk nearest neighbours (NN). nearest neighbours (NN).

Page 8: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 8

Crisp K-NNCrisp K-NN

Let Let tt be the number of elements that be the number of elements that identify the classes.identify the classes.

Let Let cc be the number of classes. be the number of classes. Let Let WW be the set that contain the be the set that contain the tt

elementselements Each cluster is represented by a subset Each cluster is represented by a subset

of elements from of elements from WW..

Page 9: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 9

Crisp K-NN algorithmCrisp K-NN algorithm

setset kk

{Calculating the NN}{Calculating the NN}

forfor i i = 1 = 1 toto tt

Calculate distance from Calculate distance from yy to to xxii

ifif ii<=<=kk

thenthen add add xxii to to EE

elseelse ifif xxii is closer to is closer to yy than any than any previous NNprevious NN

thenthen delete the farthest delete the farthest neighbour neighbour and include and include xxii in in the set the set EE

Page 10: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 10

Crisp K-NN algorithm cont.Crisp K-NN algorithm cont.

Determine the majority class represented Determine the majority class represented in the set in the set E E and include and include yy in this in this class.class.

if if there is a draw, there is a draw,

thenthen calculate the sum of distances from calculate the sum of distances from yy to all neighbours in each class in to all neighbours in each class in the drawthe draw

ifif the sums are different the sums are different

thenthen add add xxii to class with smallest to class with smallest sumsum

else else add add xxii to class where last to class where last minimum minimum was foundwas found

Page 11: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

Classification

Fuzzy K-NNFuzzy K-NN

Page 12: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 12

Fuzzy K-NNFuzzy K-NN

The basis of the algorithm is to assign The basis of the algorithm is to assign membership as a function of the membership as a function of the object’s distance from its K-nearest object’s distance from its K-nearest neighbours and the memberships in the neighbours and the memberships in the possible classes.possible classes.

J. Keller, M. Gray, J. Givens. A Fuzzy J. Keller, M. Gray, J. Givens. A Fuzzy K-Nearest Neighbor Algorithm. IEEE K-Nearest Neighbor Algorithm. IEEE Transactions on Systems, Man and Transactions on Systems, Man and Cybernectics, vol smc-15, no 4, July Cybernectics, vol smc-15, no 4, July August 1985 August 1985

Page 13: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 13

Fuzzy K-NNFuzzy K-NN

w1

w2

w3

w4

w13

w10

w9 w14

w5

w8

w12

w11

w6w7

Classe 1Classe 2

Classe 3

Classe 4 Classe 5

Page 14: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 14

Fuzzy K-NNFuzzy K-NN

Consider Consider WW={={ww11, w, w22, ..., w, ..., w

tt}} a set of a set of tt labelled labelled data.data.

Each object Each object wwii is defined by is defined by ll characteristics characteristics wwii=(=(wwi1i1, w, w

i2i2, ..., w, ..., wilil).).

Input of Input of yy unclassified elements. unclassified elements. k k the number of closest neighbours of the number of closest neighbours of yy.. EE the set of the set of kk nearest neighbours (NN). nearest neighbours (NN). ii(y)(y) is the membership of is the membership of yy in the class in the class ii ijij is the membership in theis the membership in the itith h class of theclass of the jjth th

vector of the vector of the labelled set (labelled labelled set (labelled wwjj in class i) in class i) ..

Page 15: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 15

Fuzzy K-NNFuzzy K-NN

Let Let tt be the number of elements that be the number of elements that identify the classes.identify the classes.

Let Let cc be the number of classes. be the number of classes. Let Let WW be the set that contain the be the set that contain the tt

elementselements Each cluster is represented by a subset Each cluster is represented by a subset

of elements from of elements from WW..

Page 16: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 16

Fuzzy K-NN algorithmFuzzy K-NN algorithm

setset kk

{Calculating the NN}{Calculating the NN}

forfor i i = 1 = 1 toto tt

Calculate distance from Calculate distance from yy to to xxii

ifif ii<=<=kk

thenthen add add xxii to to EE

elseelse ifif xxii is closer to is closer to yy than than any previous NNany previous NN

thenthen delete the farthest delete the farthest neighbour neighbour and include and include xxii in in the set the set EE

Page 17: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 17

Fuzzy K-NN algorithmFuzzy K-NN algorithm

Calculate Calculate ii(y)(y) using using

forfor i i = 1 = 1 toto c // number of classesc // number of classes

k

jm

j

k

jm

j

ij

i

xy

xyy

1)1(2

1)1(2

1

1

)(

Page 18: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 18

Computing Computing ijij(y) (y)

ijij(y) (y) can be assigned class membership can be assigned class membership in several ways.in several ways.

They can be given complete membership in They can be given complete membership in their known class and non membership in all their known class and non membership in all other.other.

Assign membership based on distance from Assign membership based on distance from their class mean.their class mean.

Assign membership based on the distance Assign membership based on the distance from labelled samples of their own class and from labelled samples of their own class and those from other classes.those from other classes.

Page 19: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

Classification

ICC-KNN SystemICC-KNN System

Page 20: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 20

ICC-KNN SystemICC-KNN System

Non-Parametric Statistical Pattern Non-Parametric Statistical Pattern Recognition SystemRecognition System

Associates FCM, fuzzy KNN and ICCAssociates FCM, fuzzy KNN and ICC

Evaluates data disposed on several Evaluates data disposed on several class formatsclass formats

Page 21: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 21

ICC-KNN SystemICC-KNN System

Divided in two modulesDivided in two modules First module (training)First module (training)

chooses the best patterns to use with K-chooses the best patterns to use with K-NNNN

chooses the best fuzzy constant and best chooses the best fuzzy constant and best number of neighbours (K)number of neighbours (K)

Second module (classification)Second module (classification) uses fuzzy k-nn to classifyuses fuzzy k-nn to classify

Page 22: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 22

ICC-KNN First ModuleICC-KNN First Module

Classification ModuleClassification Module Finds structure on data sampleFinds structure on data sample Divided into two phasesDivided into two phases

First phase of trainingFirst phase of training Finds the best patterns for fuzzy K-NNFinds the best patterns for fuzzy K-NN

• FCM – Applied to each class using many FCM – Applied to each class using many numbers of categoriesnumbers of categories

• ICC – Finds the best number of categories to ICC – Finds the best number of categories to represent each classrepresent each class

Page 23: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 23

ICC-KNN First PhaseICC-KNN First Phase

Results of applying FCM and ICC Results of applying FCM and ICC

Patterns for K-NN which are the centres of Patterns for K-NN which are the centres of the chosen run of FCMthe chosen run of FCM

Number of centres which are all the Number of centres which are all the centres of the number of categories centres of the number of categories resulting after applying ICC to all FCM runsresulting after applying ICC to all FCM runs

Page 24: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 24

ICC-KNN Second PhaseICC-KNN Second Phase

Second phase of trainingSecond phase of training

Evaluates the best fuzzy constant and Evaluates the best fuzzy constant and the best number of neighbours so to the best number of neighbours so to achieve best performance on the K-NNachieve best performance on the K-NN

tests several tests several mm and and kk values values

finds finds mm and and kk for the maximum rate of crisp for the maximum rate of crisp hitshits

Page 25: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 25

ICC-KNNICC-KNN

Pattern Recognition ModulePattern Recognition Module Distributes each data to its classDistributes each data to its class

Uses the chosen patterns, m and k to Uses the chosen patterns, m and k to classify dataclassify data

Page 26: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 26

ICC-KNN block diagramICC-KNN block diagram

Class 1

Class s

FCM

FCM

ICC

ICC

Fuzzy K-NN

m k

W, Uw

W Uw

w1

ws

U1cmin

U1cmáx

UScmin

UScmáx

FuzzyK-NN

Classification ModulePattern

RecognitionModule

Not classified

Data

Page 27: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 27

ICC-KNNICC-KNN

Let Let R=R={{rr11,r,r22,...,r,...,rnn}} be the set of samples. be the set of samples. Each sample Each sample rrii belongs to one of belongs to one of ss known known

classes.classes. Let Let UUicic be the inclusion matrix for the class be the inclusion matrix for the class ii

with with cc categories. categories. Let Let VVicic be the centre matrix for the class be the centre matrix for the class ii with with

cc categories. categories. Let Let wwii be equal to the best be equal to the best VVicic of each class of each class

Let Let WW be the set of sets of centres be the set of sets of centres wwii

Page 28: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 28

ICC-KNN algorithmICC-KNN algorithm

Classification ModuleClassification Module First phase of trainingFirst phase of training Step 1.Step 1. Set Set mm Step 2.Step 2. Set Set cmin andcmin and cmáxcmáx Step 3.Step 3. For each s known class For each s known class

Generate the set Generate the set RsRs with the points from with the points from RR belonging to the class s belonging to the class s

For each category c in the interval [For each category c in the interval [cmincmin , , cmáxcmáx]]

Run Run FCMFCM for c and the set for c and the set RsRs generating generating UscUsc and and VscVsc

Calculate Calculate ICCICC for for RsRs e e UscUsc

EndEnd

Define the patterns Define the patterns ws ws ofof class s as the matrix class s as the matrix VscVsc that that maximizes maximizes ICCICC

Step 4. Step 4. Generate the set W = {w1, ..., ws}Generate the set W = {w1, ..., ws}

Page 29: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 29

ICC-KNN algorithmICC-KNN algorithm

Second phase of TrainingSecond phase of Training Step 5. Step 5. Set mmin e mmáx Set mmin e mmáx Step 6. Step 6. SetSet kmin e kmáxkmin e kmáx

For each m from [mmin , mmáx]For each m from [mmin , mmáx]

For each k from [kmin , kmáx]For each k from [kmin , kmáx]

Run fuzzy K-NN for the patterns from the set Run fuzzy K-NN for the patterns from the set WWgenerating Umk generating Umk

Calculate the number of crisp hits for UmkCalculate the number of crisp hits for Umk

Step 7. Step 7. Choose m and k that yields the best crips hit figuresChoose m and k that yields the best crips hit figures Step 8. Step 8. if there is a drawif there is a draw

If the k’s are differentIf the k’s are different

Choose the smaller kChoose the smaller k

elseelse

Choose the smaller mChoose the smaller m

Page 30: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 30

ICC-KNN algorithmICC-KNN algorithm

Pattern Recognition ModulePattern Recognition Module

Step 9. Step 9. Apply fuzzy K-NN using patterns form the set W and the chosen Apply fuzzy K-NN using patterns form the set W and the chosen parameters m and k to the data to be classified. parameters m and k to the data to be classified.

Page 31: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 31

ICC-KNN resultsICC-KNN results

Page 32: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 32

ICC-NN resultsICC-NN results

2000 samples, 4 classes, 500 2000 samples, 4 classes, 500 samples in each classsamples in each class

Classes 1 and 4 – concave classesClasses 1 and 4 – concave classes

Classes 2 and 3 – convex classes, Classes 2 and 3 – convex classes, elliptic formatelliptic format

Page 33: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 33

ICC-KNN resultsICC-KNN results

First phase of trainingFirst phase of training FCM applied to each classFCM applied to each class

Training data 80% Training data 80% 400 samples from 400 samples from each classeach class

cc = 3..7 and = 3..7 and mm = 1,25 = 1,25 ICC applied to resultsICC applied to results

Classes 1 and 4 Classes 1 and 4 4 categories 4 categories Classes 2 and 3 Classes 2 and 3 3 categories 3 categories

Page 34: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 34

ICC-KNN resultsICC-KNN results

Second phase of TrainingSecond phase of Training Running fuzzy K-NNRunning fuzzy K-NN

Patterns from first phasePatterns from first phase Random patternsRandom patterns k = 3 a 7 neighboursk = 3 a 7 neighbours m = {1,1; 1,25; 1,5; 2}m = {1,1; 1,25; 1,5; 2}

Page 35: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 35

ICC-KNN resultsICC-KNN results

Conclusão:K-NN é mais estável em relação ao valor de m para os padrões da Conclusão:K-NN é mais estável em relação ao valor de m para os padrões da PFTPFT

Page 36: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 36

ICC-KNN resultsICC-KNN results

Training dataTraining data Lines Lines classes classes Columns Columns classification classification m = 1,5 e k = 3 m = 1,5 e k = 3 96,25% 96,25% m = 1,1 e k = 3 m = 1,1 e k = 3 79,13% (random patterns) 79,13% (random patterns)

34914643972104

733240324376003

103801970379142

12106621320103881

43214321

Padrões AleatóriosPadrões da PFTClasses

Dados de Treinamento

Page 37: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 37

ICC-KNN resultsICC-KNN results

Test dataTest data Lines Lines classes classes Columns Columns classification classification Pad. PFT – 94,75% Pad. Aleat – 79%Pad. PFT – 94,75% Pad. Aleat – 79%

850150991004

1882001090003

00964309342

2002753102971

43214321

Padrões AleatóriosPadrões da PFTClasses

Dados de Testes

Page 38: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 38

ICC-KNN x OthersICC-KNN x Others

FCM, FKCN, GG e GKFCM, FKCN, GG e GK Fase de Treinamento (FTr)Fase de Treinamento (FTr)

Dados de treinamentoDados de treinamento c = 4 e m = {1,1; 1,25; 1,5; 2}c = 4 e m = {1,1; 1,25; 1,5; 2} Associar as categorias às classesAssociar as categorias às classes

• Critério do somatório dos graus de inclusãoCritério do somatório dos graus de inclusãoo Cálculo do somatório dos graus de inclusão dos Cálculo do somatório dos graus de inclusão dos

pontos de cada classe em cada categoriapontos de cada classe em cada categoriao Uma classe pode ser representada por mais de uma Uma classe pode ser representada por mais de uma

categoriacategoria

Page 39: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 39

ICC-KNN x OthersICC-KNN x Others

Fase de TesteFase de Teste Dados de TesteDados de Teste Inicialização dos métodos com os centros da FTrInicialização dos métodos com os centros da FTr Calcula o grau de inclusão dos pontos em cada Calcula o grau de inclusão dos pontos em cada

categoriacategoria Classe representada por mais de 1 categoriaClasse representada por mais de 1 categoria

Grau de inclusão = soma dos graus de inclusão Grau de inclusão = soma dos graus de inclusão dos pontos nas categorias que representam a dos pontos nas categorias que representam a classeclasse

Page 40: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 40

ICC-KNN x OthersICC-KNN x Others

GK para m = 2 GK para m = 2 84% 84% FCM e FKCN FCM e FKCN 66% para m = 1,1 e m = 1,25 66% para m = 1,1 e m = 1,25 GG-FCM GG-FCM 69% para m = 1,1 e 1,25 69% para m = 1,1 e 1,25 GG Aleatório GG Aleatório 57,75% para m = 1,1 e 25% para m 57,75% para m = 1,1 e 25% para m

= 1,5= 1,5

18,14s22,66s2,59s2,91s23,11s36,5sT

89,5%69%70,75%70,75%83%95,75%N

84%69%66%66%79%94,75%R

GKGGFKCNFCMKNN A.

ICC-KNN

Page 41: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 41

GKGK

Page 42: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 42

GKGK

Classes

GK

1 2 3 4

1 77 6 0 17

2 6 94 0 0

3 0 0 97 3

4 0 0 32 68

Page 43: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

Classification

KNN+Fuzzy Cmeans SystemKNN+Fuzzy Cmeans System

Page 44: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 44

KNN+Fuzzy C-Means algorithmKNN+Fuzzy C-Means algorithm

The idea is an two-layer clustering algorithmThe idea is an two-layer clustering algorithm First an unsupervised tracking of cluster First an unsupervised tracking of cluster

centres is made using K-NN rulescentres is made using K-NN rules The second layer involves one iteration of the The second layer involves one iteration of the

fuzzy c-means to compute the membership fuzzy c-means to compute the membership degrees and the new fuzzy centres.degrees and the new fuzzy centres.

Ref. N. Zahit et all, Fuzzy Sets and Systems Ref. N. Zahit et all, Fuzzy Sets and Systems 120 (2001) 239-247120 (2001) 239-247

Page 45: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 45

First Layer (K-NN)First Layer (K-NN)

Let Let XX={={xx11,…,x,…,xnn} be a set of } be a set of nn unlabelled unlabelled objects.objects.

cc is the number of clusters. is the number of clusters. The first layer consists of partitioning The first layer consists of partitioning XX into into cc

cells using the fist part of K-NN.cells using the fist part of K-NN. Each cell Each cell ii is (1<= is (1<=ii<=<=cc) represented as ) represented as EEi i ((yyii, ,

K-NN of K-NN of yyii, , GGii))

GGii is the center of cell is the center of cell EEii and defined as and defined as 1

kxG

ik Exki

Page 46: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 46

KNN-1FCMA settingsKNN-1FCMA settings

Let Let XX={={xx11,…,x,…,xnn} be a set of } be a set of nn unlabelled unlabelled objects.objects.

FixFix c c the number of clusters. the number of clusters. Choose Choose mm>1 (nebulisation factor).>1 (nebulisation factor). Set Set kk = Integer( = Integer(nn//cc –1). –1). LetLet I I={1,2,…,={1,2,…,nn} be the set of all indexes of } be the set of all indexes of XX..

Page 47: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 47

KNN-1FCMA algorithm step 1KNN-1FCMA algorithm step 1

CalculateCalculate GG00

forfor i i = 1 = 1 toto ccSearch in Search in II for the index of the for the index of the farthest object farthest object yyii from from GGi-1i-1

For For j j = 1 to = 1 to nn

Calculate distance from Calculate distance from yyii to to xxjj

if if jj <= <= kk

thenthen add add xxjj to to EEii

elseelse ifif xxii is closer to is closer to yy than than any any previous NNprevious NN

thenthen delete the farthest delete the farthest neighbour neighbour and include and include xxii in in the set the set EEii

Page 48: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 48

KNN-1FCMA algorithm cont.KNN-1FCMA algorithm cont.

Include Include yyii in the set in the set EEi i ..

Calculate Calculate GGii..

Delete Delete yyii index index and the K-NN indexes ofand the K-NN indexes of y yii fromfrom I I..

ifif I I thenthen for each remaining object for each remaining object xx

determine the minimum distance determine the minimum distance to any centre to any centre GGii of of EEii..

classify x to the nearest centre.classify x to the nearest centre.

update all centres.update all centres.

Page 49: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 49

KNN-1FCMA algorithm step2KNN-1FCMA algorithm step2

Compute the matrix U according to Compute the matrix U according to

Calculate all fuzzy centres usingCalculate all fuzzy centres using

1

1

1

2

c

l

m

lk

ikik d

d

n

e

mie

n

eej

mie

ij

x

v

1

1

)(

)(

Page 50: Classification Adriano Joaquim de O Cruz ©2002 NCE/UFRJ adriano@nce.ufrj.br.

*@2001 Adriano Cruz *NCE e IM - UFRJ Classification 50

Results KNN-1FCMAResults KNN-1FCMA

1217163100IRIS

1013142150IRIS23

1913136120S4

1002480S3

801360S2

1100220S1

FCMAKNN-1FCMAFCMA

Number of Iterations avg

Misclassification rate

cElemData