Evaluating Hiera r chical Clustering of Search Results

Evaluating Hierarchical Clustering of Search Results

Departamento de Lenguajes y

Sistemas Informáticos

UNED, Spain

Juan Cigarrán

Anselmo Peñas

Julio Gonzalo

Felisa Verdejo

nlp.uned.es

SPIRE 2005, Buenos Aires

Overview

Scenario Assumptions Features of a Good Hierarchical Clustering Evaluation Measures

– Minimal Browsing Area (MBA)– Distillation Factor (DF)– Hierarchy Quality (HQ)

Conclusion

Scenario Complex information needs

– Compile information from different sources– Inspect the whole list of documents

• More than 100 documents

Help to– Find the relevant topics– Discriminate from unrrelevant documents

Approach– Hierarchical Clustering – Formal Concept Analysis

Problem

How to define and measure the quality of a hierarchical clustering?

How to compare different clustering approaches?

Previous assumptions Each cluster contains only those documents

fully described by its descriptors

d1 d2 d3 d4

Physics X X X X

Nuclear physics

X X

Astrophysics

X

d2, d3

d1Physics

Astrophysicsd4

Nuclear physics

d2, d3

d1, d2, d3, d4 Physics

Astrophysicsd4

Nuclear physics

Previous assumptions ‘Open world’ perspective

d1 d2 d3

Physics X X

Jokes X X

Jokes about physics

X

d1Physics

Jokes about physics

d3

Jokesd2

Jokes about physicsd3

d1Physics

Jokesd2

Jokes about physicsd3

Good Hierarchical Clustering The content of the clusters.

– Clusters should not mix relevant with non relevant information

+ + +

+ + + +

-- -+ - -

+ - + +

-- +

Good Hierarchical Clustering The hierarchical arrangement of the clusters

– Relevant information should be in the same path

+ + + +

- - -++ +

- - -

+ + +

+ + + +

-- -

- - -

Good Hierarchical Clustering The number of clusters

– Number of clusters substantially lower than the number of documents

How clusters are described– Cognitive load of reading a cluster description– Ability to predict the relevance of the information

that it contains (not addressed here)

Evaluation Measures Criterion

– Minimize the browsing effort for finding ALL relevant information

Baseline– The original document list returned by a search

engine

(lattice) load cognitive

list) (ranked load cognitivelatticeQuality

Evaluation Measures Consider

– Content of clusters– Hierarchical arrangement of clusters– Size of the hierarchy– Cognitive load of reading a document (in the

baseline): Kd – Cognitive load of reading a node descriptor (in the

hierarchy): Kn

Requirement– Relevance assessments are available

Minimal Browsing Area (MBA) The minimal set of nodes the user has to traverse to

find ALL the relevant documents minimising the number of irrelevant ones

+

- - +

+

+ -

Distillation Factor (DF) Ability to isolate relevant information compared

with the original document list (Gain Factor, DF>1)

MBAMBAd

d

D

D

Dk

Dk

latticeloadCognitive

listrankedloadCognitiveLDF List RankedList Ranked

)(_

)_(_)(

List Ranked

MBA)(precision

precisionLDF

Considers only the cognitive load of reading documents

Equivalent to:

Distillation Factor (DF) Example

DF(L) = 7/5 = 1.4

Doc 1 +

Doc 2 -

Doc 3 +

Doc 4 +

Doc 5 -

Doc 6 -

Doc 7 +

Document List

Precision = 4/7 Precision MBA = 4/5

+

- - +

+

+ -

Distillation Factor (DF) Counterexample:

+ + - -+ -- +

Precision = 4/8

Precision MBA = 4/4

DF = 8/4 = 2

Bad clustering with good DF

Extend the DF measure considering the cognitive cost of taking browsing decisions HQ

Hierarchy Quality (HQ) Assumption:

– When a node (in the MBA) is explored, all its lower neighbours have to be considered: some will be in turn explored, some will be discarded

– Nview : subset of lower neighbours of each node belonging to the MBA

- +

- +

+

+-

-

-

-

MBA

|Nview|=8

Hierarchy Quality (HQ)

Kn and Kd are directly related with the retrieval scenario in which the experiments take place The researcher must tune K=Kn/Kd before conducting the

experiment

HQ > 1 indicates an improvement of the clustering versus the original list

viewMBA

List Ranked

viewMBA

List Ranked)(N

k

kD

D

NkDk

DkLHQ

d

nnd

d

(lattice) load cognitive

list) (ranked load cognitivelatticeQuality

Hierarchy Quality (HQ)

- +

- +

+

+-

-

-

-nd

d

n

kk

k

kLHQ

66.1

:Value Cutting

85

10)(

+ + - -+ -- +

nd

d

n

kk

k

kLHQ

3

:Value Cutting

124

8)(

Example

Conclusions and Future Work Framework for comparing different clustering

approaches taking into account:– Content of clusters– Hierarchical arrangement of clusters– Cognitive load to read document and node descriptions

Adaptable to the retrieval scenario in which experiments take place

Future work– Conduct user studies to compare their results with the

automatic evaluation• Results will reflect the quality of the descriptors• Will be used to fine-tune the kd and kn parameters

Thank you!

Evaluating Hiera r chical Clustering of Search Results

Documents

Transcript of Evaluating Hiera r chical Clustering of Search Results