CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

14
CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002

Transcript of CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

Page 1: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 1

Dynamic Clustering(some unfinished business)

Keith van Rijsbergen

Glasgow

October, 2002

Page 2: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 2

OUTLINE• Introduction

• Scales

• Dissimilarity/Similarity

• Information-theoretic approach

• Static Clustering

• Dynamic Clustering

• Application

Page 3: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 3

Introduction

•Theory…..where did it come from (Sneath and Sokal)•Implementation….not yet•Experimentation…..none

Sneath and Sokal, Numerical Taxonomy (1973)Jardine and Sibson, Mathematical Taxonomy (1971)Van Rijsbergen, Information Retrieval (1979)

Page 4: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 4

Scales

scale operation group statistics

nominal equality permutation

1:1

mode

ordinal greater/less isotonic

monotone

median

interval equality/diff

of intervals

linear

x’=ax+b

mean

ratio equality of

ratios

similarity

x’=ax

coeff of

variation

Page 5: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 5

Dissimilarity/Similarity

d(x,y) 0 for all x,y

d(x,x) = 0 for all x

d(x,y) = d(y,x)

d(x,y) d(x,z) +d(z,y)

{d(x,y) max [d(x,z), d(z,y)]}

Page 6: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 6

Information-theoretic approach I

1:2I2:1I2,1J

dxxf

fxf2:1I

HPHP

logxHPxHP

logxfxf

log

2

x1

1

2

1

2

1

2

1

Page 7: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 7

2121

21

2211

21222

2211

21111

21

21

cw,cw2,1Kw,w2,1K

1,2K2,1K

02,1K

0wor 0wif 0)2,1(K

dxfwfw

wwflogfwfwfw

wwflogfw

ww1

w,w2,1K

Information-theoretic approach II

Page 8: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 8

Navigation - Browsing

T-space

D-space

Duality is the key.

Class definition!

Page 9: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 9

Static Clustering

1. dependence on rank-ordering of dissimilarity2. insensitive to small errors in DC3. preservation of well marked clusters4. stable under growth5. labelling independence6. invariance of ultrametric7. subject to 3 minimises distortion

DCBA

1.4.4.4.E

3.3.3.D

2.4.C

1.B

DCBA

2.2.3.3.E

2.3.3.D

3.3.C

1.BT

Page 10: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 10

Dendrogram

.3

.2

.1

Spanning tree?

Page 11: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 11

Dynamic Clustering

Hilbert-Schmidt: (A,B) = trace(A’B)

Page 12: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 12

Applications

• Image Retrieval• Web Retrieval

Page 13: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 13

Ostension

Page 14: CvR 1 Dynamic Clustering (some unfinished business) Keith van Rijsbergen Glasgow October, 2002.

CvR 14

Conclusions

?