Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf ·...

11
ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 15, June 29, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM EN-3026 ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition July 2006 2 Lecture 16 Lecture 17 Assignment 4 Due Lecture 18 Lecture 19 Presentations Presentations Assignment 5 Due Lecture 21 Final Reports Lecture 22 2 9 16 4 6 8 11 13 15 18 20 22 Sunday Tuesday Thursday Saturday Monday Wednesday Friday 23 25 27 29

Transcript of Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf ·...

Page 1: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Memorial University of Newfoundland

Pattern RecognitionLecture 15, June 29, 2006

http://www.engr.mun.ca/~charlesr

Office Hours: Tuesdays & Thursdays 8:30 - 9:30 PM

EN-3026

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

July 2006

2

Lecture 16 Lecture 17Assignment

4 Due

Lecture 18 Lecture 19

Presentations PresentationsAssignment

5 Due

Lecture 21

Final ReportsLecture 22

2

9

16

4 6 8

11 13 15

18 20 22

Sunday Tuesday Thursday SaturdayMonday Wednesday Friday

23 25 27 29

Page 2: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Last Week

3

Clustering (Unsupervised Classification)

ß

# o

ccurr

ences

pel value

Distribution Modes and Minima Pattern Grouping

- Group similar patterns using

distance metrics.

- Merge or split clusters based on

cluster similarity measurements.

- Measures of clusters ‘goodness’

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Recap: Simple Grouping using Threshold

4

1.!! k = 1! ! (number of clusters)

2.!! z1 = x1!! (set first sample as class prototype)

3.!! For all other samples xi:

! a.! Find zj for which d(xi,zj) is minimum

" b." If d(xi,zj) # T then assign xi to Cj

! c.! Else k = k+1, zk = xi

Page 3: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Recap: Hierarchical Grouping Algorithms

5

Successively merge based on some measure of within or

between cluster similarity.

For n samples {x1, ..., xn}

1.!! k = n, Ci = xi for i = 1...n

2.!! Merge Ci, Cj which are most similar, k = k-1:

3.!! Continue until some stopping condition is met.

1. IMAGE SEGMENTATION 2

dH(xi, xj) = number of di!erences

xi ! xj

Hierarchical scheme:

mini,j

d(Ci, Cj)" C !i = Ci # Cj

Intercluster distance

d1(Ci, Cj) = minx!Ci,x!!Cj

(d(x, x!))

Furthest Neighbour

d2(Ci, Cj) = maxx!Ci,x!!Cj

(d(x, x!)

Average Neighbour

d3(Ci, Cj) =1

NiNj

!

x!Ci

!

x!!Cj

d(x, x!)

Mean distance

d4(Ci, Cj) = d(mi,mj)

Stopping Conditions

min(d(Ci, Cj)) $ T

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Goodness of Partitioning

6

• Several of the stopping conditions suggest that we

can use a measure of the scatter of each cluster to

gauge how good the overall clustering is.

• In general, we would like compact clusters with a lot

of space between them.

• We can use the measure of goodness to iteratively

move samples from one cluster to another to optimize

the groupings.

Page 4: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Clustering Criterion

7

Global measurements of the goodness of the clusters.

1. Representative error = summed scatter within clusters

Representation error of a clustering is the error from

representing the N samples by the k cluster prototypes.

Can choose zi to minimize

1. IMAGE SEGMENTATION 3

d(k) = min(d(Ci, Cj)d(k) > !d(k ! 1)

Representation Error

Je =k!

i=1

!

x!Ci

|x! zi|2

Ji =!

x!Ci

|x! zi|2

"Ji

"zi

=!

x!Ci

2(x! zi)" zi = mi

Je =k!

i=1

!

x!Ci

|x!mi|2

Swi =!

x!Ci

(x!mi)(x!mi)T

Je =k!

i=1

tr Swi

Sw =k!

i=1

SwiJe = tr Sw

Cluster criterion 2

|Sw|

Cluster Criterion 3

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 8

So

Now define scatter matrix

Thus

Define summed scatter to be

1. IMAGE SEGMENTATION 3

d(k) = min(d(Ci, Cj)d(k) > !d(k ! 1)

Representation Error

Je =k!

i=1

!

x!Ci

|x! zi|2

Ji =!

x!Ci

|x! zi|2

"Ji

"zi

=!

x!Ci

2(x! zi)" zi = mi

Je =k!

i=1

!

x!Ci

|x!mi|2

Scatter Matrix

SWi =!

x!Ci

(x!mi)(x!mi)T

Je =k!

i=1

tr SWi

SW =k!

i=1

SWi

Je = tr SW

Cluster criterion 2

1. IMAGE SEGMENTATION 3

d(k) = min(d(Ci, Cj)d(k) > !d(k ! 1)

Representation Error

Je =k!

i=1

!

x!Ci

|x! zi|2

Ji =!

x!Ci

|x! zi|2

"Ji

"zi

=!

x!Ci

2(x! zi)" zi = mi

Je =k!

i=1

!

x!Ci

|x!mi|2

Scatter Matrix

SWi =!

x!Ci

(x!mi)(x!mi)T

Je =k!

i=1

tr SWi

SW =k!

i=1

SWi

Je = tr SW

Cluster criterion 2

Page 5: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Clustering Criterion

9

2. Use volume of summed scatter:

2. K-MEANS 4

|Sw|

Cluster Criterion 3

SB =k!

i=1

Ni(mi !m)(mi !m)T

tr S!1w SB

2 K-Means

{z1, ..., zk}

x!Ci if d(x, zi) < d(x, zj) j "= i

{zi}zi =1Ni

!

Ci

x = mi

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Clustering Criterion

10

3. Could use between-cluster to within-cluster scatter

2. K-MEANS 4

|Sw|

Cluster Criterion 3

SB =k!

i=1

Ni(mi !m)(mi !m)T

tr S!1w SB

2 K-Means

{z1, ..., zk}

x!Ci if d(x, zi) < d(x, zj) j "= i

{zi}zi =1Ni

!

Ci

x = mi

So could use

2. K-MEANS 4

|Sw|

Cluster Criterion 3

SB =k!

i=1

Ni(mi !m)(mi !m)T

tr S!1w SB

2 K-Means

{z1, ..., zk}

x!Ci if d(x, zi) < d(x, zj) j "= i

{zi}zi =1Ni

!

Ci

x = mi

Note: Any within-cluster criterion is minimized with k=N, and

thus we would need an independent criterion for k.

Page 6: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

K-Means

11

Once we have a criterion, we can create an iterative

clustering scheme.

K-Means is the classic iterative clustering scheme:

1. Choose k prototypes {z1, ..., zk}

2. Assign all samples to clusters:

3. Update {zi} to minimize Ji, i=1,...,k

4. Reassign the samples using the new prototypes

5. Continue until no prototypes change.

2. K-MEANS 4

|Sw|

Cluster Criterion 3

SB =k!

i=1

Ni(mi !m)(mi !m)T

tr S!1w SB

2 K-Means

{z1, ..., zk}

x!Ci if d(x, zi) < d(x, zj) j "= i

{zi}zi =1Ni

!

Ci

x = mi

2. K-MEANS 4

|Sw|

Cluster Criterion 3

SB =k!

i=1

Ni(mi !m)(mi !m)T

tr S!1w SB

2 K-Means

{z1, ..., zk}

x!Ci if d(x, zi) < d(x, zj) j "= i

{zi}

zi =1Ni

!

Ci

x = mi

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 12From “KM Demo Algorithm” Java Applet" " " http://web.sfc.keio.ac.jp/~osada/KM/index.html

Page 7: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

K-Means

13

Good Points:

! - Simple conceptually

! - Successful if k is accurate and clusters are well-separated

Problems:

" - If k is incorrect, then the clusters can’t be right

! - Efficiency depends on sample order

! - Non-spherical clusters cause problems

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Extensions to K-Means

14

There are several ways to extend the basic k-means

algorithm:

1. Global minimization of Je

! As an alternative to simply assigning samples to closest

cluster prototype.

2. Allow a variable k

Page 8: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Global Minimization of Je

15

Pattern Recognition - Clustering

Charles RobertsonJuly 9, 2005

1 K-means Extension

global minimization of Je

Je =k!

i=1

Ji

Basic idea:

J !j =

!

x!Cj

|x!m!j |2 + |x! !m!

j |2

But

m!j =

Njmj + x!

Nj + 1= mj +

x! !mj

Nj + 1

! J !j =

!

x!Cj

|x!mj !x! !mj

Nj + 1|2 + |x! !mj !

x! !mj

Nj + 1|2

=!

x!Cj

|x!mj |2 + 0 + Nj(x! !mj)2

(Nj + 1)2+ |

(Nj + 1)(x! !mj)! (x! !mj)Nj + 1

|

=!

x!Cj

|x!mj |2 +Nj

(Nj + 1)2|x! !mj |2 +

N2j

(Nj + 1)2|x! !mj |2

=!

x!Cj

|x!mj |2 +Nj

Nj + 1|x! !mj |2

1

Ji is the representation error for the ith cluster.

Basic Plan:

Move sample x’ from Ci to Cj if the magnitude of the increment

to the representation error in Jj is less than the decrement to the

representation error in Ji

Pattern Recognition - Clustering

Charles Robertson

July 9, 2005

1 K-means Extension

global minimization of Je

Je =k!

i=1

Ji

Basic idea:

!j < !i

J !j =

!

x!Cj

|x!m!j |2 + |x! !m!

j |2

But

m!j =

Njmj + x!

Nj + 1= mj +

x! !mj

Nj + 1

1

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 16

Page 9: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 17

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 18

Similarly, we can show that the decrement to Ji is

1. K-MEANS EXTENSION 2

! J !j =

!

x!Cj

""""x!mj !x! !mj

Nj + 1

""""2

+""""x

! !mj !x! !mj

Nj + 1

""""2

=!

x!Cj

""x!mj

""2 + 0 + Nj(x! !mj)2

(Nj + 1)2+

""""(Nj + 1)(x! !mj)! (x! !mj)

Nj + 1

""""2

=!

x!Cj

|x!mj |2 +Nj

(Nj + 1)2|x! !mj |2 +

N2j

(Nj + 1)2|x! !mj |2

=!

x!Cj

|x!mj |2 +Nj

Nj + 1|x! !mj |2

= Jj + !j

Thus:

!j =Nj

Nj + 1|x! !mj |2

Similarly we can show that

!i =Ni

Ni ! 1|x! !mi|2

So the reassignment rule:

Nj

Nj + 1|x! !mj |2 <

Ni

Ni ! 1|x! !mi|2

Notes:

Nj

Nj + 1< 1

Ni

Ni ! 1> 1

Ni

Ni + 1|x! !mi|2

Ci

"Cj

Nj

Nj + 1|x! !mj |2

So the reassignment rule (step 2 of K-means) is:

- Move x’ from Ci to Cj if

1. K-MEANS EXTENSION 2

! J !j =

!

x!Cj

""""x!mj !x! !mj

Nj + 1

""""2

+""""x

! !mj !x! !mj

Nj + 1

""""2

=!

x!Cj

""x!mj

""2 + 0 + Nj(x! !mj)2

(Nj + 1)2+

""""(Nj + 1)(x! !mj)! (x! !mj)

Nj + 1

""""2

=!

x!Cj

|x!mj |2 +Nj

(Nj + 1)2|x! !mj |2 +

N2j

(Nj + 1)2|x! !mj |2

=!

x!Cj

|x!mj |2 +Nj

Nj + 1|x! !mj |2

= Jj + !j

Thus:

!j =Nj

Nj + 1|x! !mj |2

Similarly we can show that

!i =Ni

Ni ! 1|x! !mi|2

So the reassignment rule:

Nj

Nj + 1|x! !mj |2 <

Ni

Ni ! 1|x! !mi|2

Notes:

Nj

Nj + 1< 1

Ni

Ni ! 1> 1

Ni

Ni + 1|x! !mi|2

Ci

"Cj

Nj

Nj + 1|x! !mj |2

Page 10: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Notes on Global Minimization

19

1. Rule has little impact when Ni and Nj are very large

3. If x’ is an unassigned sample, would get minimum increase to

Je with:

Modifies initial K-Means assignment by taking cluster size into

account. If x’ is equidistant from mi, mj, then assign it to the

smallest cluster.

1. K-MEANS EXTENSION 2

! J !j =

!

x!Cj

""""x!mj !x! !mj

Nj + 1

""""2

+""""x

! !mj !x! !mj

Nj + 1

""""2

=!

x!Cj

""x!mj

""2 + 0 + Nj(x! !mj)2

(Nj + 1)2+

""""(Nj + 1)(x! !mj)! (x! !mj)

Nj + 1

""""2

=!

x!Cj

|x!mj |2 +Nj

(Nj + 1)2|x! !mj |2 +

N2j

(Nj + 1)2|x! !mj |2

=!

x!Cj

|x!mj |2 +Nj

Nj + 1|x! !mj |2

= Jj + !j

Thus:

!j =Nj

Nj + 1|x! !mj |2

Similarly we can show that

!i =Ni

Ni ! 1|x! !mi|2

So the reassignment rule:

Nj

Nj + 1|x! !mj |2 <

Ni

Ni ! 1|x! !mi|2

Notes:

Nj

Nj + 1< 1

Ni

Ni ! 1> 1

Ni

Ni + 1|x! !mi|2

Cj

"Ci

Nj

Nj + 1|x! !mj |2

1. K-MEANS EXTENSION 2

! J !j =

!

x!Cj

|x!mj !x! !mj

Nj + 1|2 + |x! !mj !

x! !mj

Nj + 1|2

=!

x!Cj

|x!mj |2 + 0 + Nj(x! !mj)2

(Nj + 1)2+ |

(Nj + 1)(x! !mj)! (x! !mj)Nj + 1

|

=!

x!Cj

|x!mj |2 +Nj

(Nj + 1)2|x! !mj |2 +

N2j

(Nj + 1)2|x! !mj |2

=!

x!Cj

|x!mj |2 +Nj

Nj + 1|x! !mj |2

Thus:

!j =Nj

Nj + 1|x! !mj |2

Similarly we can show that

!i =Ni

Ni ! 1|x! !mi|2

So the reassignment rule:

Nj

Nj + 1|x! !mj |2 <

Ni

Ni ! 1|x! !mi|2

Notes:

Nj

Nj + 1< 1

Ni

Ni ! 1> 1

Ni

Ni + 1|x! !mi|2

Ci

"Cj

Nj

Nj + 1|x! !mj |2

Dealing k:

1. K-MEANS EXTENSION 2

! J !j =

!

x!Cj

|x!mj !x! !mj

Nj + 1|2 + |x! !mj !

x! !mj

Nj + 1|2

=!

x!Cj

|x!mj |2 + 0 + Nj(x! !mj)2

(Nj + 1)2+ |

(Nj + 1)(x! !mj)! (x! !mj)Nj + 1

|

=!

x!Cj

|x!mj |2 +Nj

(Nj + 1)2|x! !mj |2 +

N2j

(Nj + 1)2|x! !mj |2

=!

x!Cj

|x!mj |2 +Nj

Nj + 1|x! !mj |2

Thus:

!j =Nj

Nj + 1|x! !mj |2

Similarly we can show that

!i =Ni

Ni ! 1|x! !mi|2

So the reassignment rule:

Nj

Nj + 1|x! !mj |2 <

Ni

Ni ! 1|x! !mi|2

Notes:

Nj

Nj + 1< 1

Ni

Ni ! 1> 1

Ni

Ni + 1|x! !mi|2

Ci

"Cj

Nj

Nj + 1|x! !mj |2

Dealing k:

while

no matter what Ni and Nj are.

2. A point nearly on the MED boundary will be reassigned since

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Example

20

(0,2) (1,0) (1,1)(1,2) (1,3) (1,4) (2,2) (3,1) (4,1) (5,0) (5,1) (5,2) (6,1) (7,1)

Consider the following set of samples:

Cluster using basic K-means and using K-means with global

minimization method. Use k = 2.

What happens if we start with k $ 2 (e.g. k = 3)?

Page 11: Lecture 15 Handout - Memorial University of Newfoundlandcharlesr/9881/lecture15_blanks.pdf · Lectur e 16 Lectur e 17 Assignment 4 Due Lectur e 18 Lectur e 19 Presentations Presentations

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition

Dealing with K

21

Need a way of varying k in accordance with goodness of

partitioning.

Strategies for dealing with k:

1. Delete clusters with too few samples.

! If Ni < T1 drop Ci and zi, and reassign samples from Ci.

2. Merge clusters which are close together.

! If

! then replace Ci with the union of Ci and Cj, and drop zj.

2. FEATURE SELECTION AND EXTRACTION 3

Dealing k:

2. Merge clusters which are close together

(mi !mj)T

!Si + Sj

2

"(mi !mj) < T2

2 Feature Selection and Extraction

max J(xk) = P (E|{xi, i "= k})! P (E|{xi#i})

1. Interclass distance

2 classes:

Ji =(mi1 !mi2)2

S2i1 + S2

i2

k classes:

Ji =k#

j=1

k#

l=1

(mij !mil)2

S2ij + S2

il

2. Information measure

I(Ci) ! ! log P (Ci)

independent:

I(Ci, yi) = 0

to evaluate:

3. Split clusters which are spread out.

! If maximum eigenvalue of Si is greater than T3, split Ci with a

plane through mi perpendicular to max. eigenvector and add

new cluster.

ENG 8801/9881 - Special Topics in Computer Engineering: Pattern Recognition 22

There are many possible clustering algorithms.

See “Statistical Pattern Recognition: A Review” by Jain, Duin,

and Mao for more possibilities.