Hierarchical clusteringhw5x/Course/TextMining-2019Spring...Hierarchical clustering • Build a...

Hierarchical clustering

Hongning WangCS@UVa

Today’s lecture

• Hierarchical clustering algorithm– Bottom-up: agglomerative– Distance between clusters– Complexity analysis

CS@UVa CS 6501: Text Mining 2

Hierarchical clustering

• Build a tree-based hierarchical taxonomy from a set of instances– Dendrogram – a useful tool to summarize

similarities

After cutting, each connected component will be a cluster

Agglomerative hierarchical clustering

• Pairwise distance metric between instances

0 1 2 2 3 3 4

0 2 2 3 3 4

0 2 3 3 4

0 3 4 4

1. Every instance is in its own cluster when initialized

2. Repeat until one cluster left1. Find the best pair of clusters to merge and break

the tie arbitrarily

Enumerate all the possibilities!

How to compare distance between an instance and a cluster of instances??

Distance measure between clusters

• Single link– Cluster distance = distance of two closest

members between the clusters

– 𝑑𝑑 𝑐𝑐𝑖𝑖 , 𝑐𝑐𝑗𝑗 = min𝑥𝑥𝑛𝑛∈𝑐𝑐𝑖𝑖,𝑥𝑥𝑚𝑚∈𝑐𝑐𝑗𝑗

𝑑𝑑(𝑥𝑥𝑛𝑛, 𝑥𝑥𝑚𝑚)

Tend to generate scattered clusters

• Complete link– Cluster distance = distance of two farthest

– 𝑑𝑑 𝑐𝑐𝑖𝑖 , 𝑐𝑐𝑗𝑗 = max𝑥𝑥𝑛𝑛∈𝑐𝑐𝑖𝑖,𝑥𝑥𝑚𝑚∈𝑐𝑐𝑗𝑗

𝑑𝑑(𝑥𝑥𝑛𝑛, 𝑥𝑥𝑚𝑚)

Tend to generate tight clusters

• Average link– Cluster distance = average distance of all pairs of

– 𝑑𝑑 𝑐𝑐𝑖𝑖 , 𝑐𝑐𝑗𝑗 =∑𝑥𝑥𝑛𝑛∈𝑐𝑐𝑖𝑖,𝑥𝑥𝑚𝑚∈𝑐𝑐𝑗𝑗 𝑑𝑑(𝑥𝑥𝑛𝑛,𝑥𝑥𝑚𝑚)

𝑐𝑐𝑖𝑖 ×|𝑐𝑐𝑗𝑗|

Mostly popularly used measure, robust against noise

1. Every instance is in its own cluster when initialized

2. Repeat until one cluster left1. Find the best pair of clusters to merge and break

the tie arbitrarily

Complexity analysis

• In step one, compute similarity between all pairs of 𝑛𝑛 individual instances - 𝑂𝑂(𝑛𝑛2)

• In the following 𝑛𝑛 − 2 steps – It could be 𝑂𝑂(𝑛𝑛2 log𝑛𝑛) or even 𝑂𝑂(𝑛𝑛3) (naïve

implementation)

In k-means, we have 𝑂𝑂(𝑘𝑘𝑛𝑛𝑘𝑘), a much faster algorithm

Comparisons

• Hierarchical clustering– Efficiency: 𝑂𝑂(𝑛𝑛3), slow

• Assumptions– No assumption– Only need distance

metric

• Output– Dendrogram, a tree

• k-means clustering– Efficiency: 𝑂𝑂(𝑘𝑘𝑛𝑛𝑘𝑘), fast

• Assumptions– Strong assumption –

centroid, latent cluster membership

– Need to specify 𝑘𝑘• Output

– 𝑘𝑘 clusters

How to get final clusters?

• If 𝑘𝑘 is specified, find a cut that generates 𝑘𝑘clusters– Since every time we only merge 2 clusters, such

cut must exist

• If 𝑘𝑘 is not specified, use the same strategy as in k-means – Cross validation with internal or external

validation

What you should know

• Agglomerative hierarchical clustering – Three types of linkage function

• Single link, complete link and average link

– Comparison with k-means

Today’s reading

• Introduction to Information Retrieval– Chapter 17: Hierarchical clustering

• 17.1 Hierarchical agglomerative clustering• 17.2 Single-link and complete-link clustering• 17.3 Group-average agglomerative clustering• 17.5 Optimality of HAC

Hierarchical clusteringhw5x/Course/TextMining-2019Spring...Hierarchical clustering • Build a...

Documents

Transcript of Hierarchical clusteringhw5x/Course/TextMining-2019Spring...Hierarchical clustering • Build a...

TEXTMINING: APPROACHESANDAPPLICATIONS

Hierarchical Bayesian Models - Central Web Server 2 - UITS ...web2.uconn.edu/cyberinfra/module3/Downloads/Day 6 - Hierarchical... · What is a hierarchical model? Hierarchical models

Locating the Dead and Textmining the Humanities Tim Hitchcock University of Hertfordshire.

Hierarchical Modulation for DVB-T · Hierarchical Modulation 3 assured communications ™ 21-Jun-04 What is hierarchical Modulation ? • Hierarchical Modulation allows to transmit

WELCOME totheprogram MasterofScience · IMSchairs/researchgroups “TheoreticalComputationalLinguistics”:Prof.SebastianPadó,Dr.Klinger, Dr.Lapesa&manymore semantics,multilinguality,textmining

Hierarchical Bayesian Modeling - Astrostatisticsastrostatistics.psu.edu/RLectures/hierarchical.pdf · Hierarchical Bayesian Modeling ... Hierarchical models “pool” the information

Textmining and Organization in Large Corpus - DTU ETD

Approximating Hierarchical MV-sets for Hierarchical Clustering · 2014-12-03 · Approximating Hierarchical MV-sets for Hierarchical Clustering Assaf Glazer Omer Weissbrod Michael

India Social hierarchical controls India Social hierarchical controls.

Datamining04 textmining

1. Hierarchical deep semantic hashing for fast image …liusi-group.com/pdf/Hierarchical-MTAP2016.pdf · 1. Hierarchical deep semantic hashing for ... Hierarchical deep semantic hashing;

Hierarchical Planning - University of Nottinghampsznza/G52PAS12/lecture15.pdf · Hierarchical Planning Hierarchical Planning 1. ... (at most b), O(rd/k) is root k of non-hierarchical

Modeling Hierarchical Transformations Hierarchical Models Scene Graphs.

Mehrwert aus Daten gewinnen mit Datamining und Textmining Entwicklung der Datenanalyse Einsatzgebiete Umsetzung.

Hierarchical Clustering - unipi.itdidawiki.cli.di.unipi.it/.../dm/dm2014_clustering_hierarchical.pdf · Hierarchical Clustering Two main types of hierarchical clustering – Agglomerative:

Hierarchical GLMMM

DataMining MTAT.03.183 Topics TextMining SocialNetwork ...

J9458-Hierarchical Routes in ArcGIS Network Analystdownloads2.esri.com/.../other_/ArcGIS_NA_Hierarchical_Routes_Aug0… · Hierarchical Routes in ArcGIS ... Hierarchical versus Exact

Learning Based Hierarchical Vessel Segmentation Learning Based Hierarchical Vessel Segmentation Learning Based Hierarchical Vessel Segmentation Presenter:

Hierarchical Model