UVA CS 4501: Machine Learning Lecture 22: Unsupervised ...€¦ · Machine Learning Lecture 22:...
Transcript of UVA CS 4501: Machine Learning Lecture 22: Unsupervised ...€¦ · Machine Learning Lecture 22:...
UVACS4501:MachineLearning
Lecture22:UnsupervisedClustering(I)
Dr.YanjunQi
UniversityofVirginia
DepartmentofComputerScience
4/24/18
Dr.YanjunQi/UVACS
1
Wherearewe?èmajorsecJonsofthiscourse
q Regression(supervised)q ClassificaJon(supervised)
q FeatureselecJonq Unsupervisedmodels
q DimensionReducJon(PCA)q Clustering(K-means,GMM/EM,Hierarchical)
q Learningtheoryq Graphicalmodels
4/24/18
Dr.YanjunQi/UVACS
2
AnunlabeledDatasetX
• Data/points/instances/examples/samples/records:[rows]• Features/a0ributes/dimensions/independentvariables/covariates/predictors/regressors:[columns]
4/24/18
Dr.YanjunQi/UVACS
a data matrix of n observations on p variables x1,x2,…xp
Unsupervisedlearning=learningfromraw(unlabeled,unannotated,etc)data,asopposedtosuperviseddatawherelabelofexamplesisgiven
3
4/24/18
Dr.YanjunQi/UVACS
Today:Whatisclustering?
• Arethereany“groups”?• Whatiseachgroup?• Howmany?• HowtoidenJfythem?
4
• Find groups (clusters) of data points such that data points in a group will be similar (or related) to one another and different from (or unrelated to) the data points in other groups
Whatisclustering?
Inter-cluster distances are maximized
Intra-cluster distances are
minimized
4/24/18
Dr.YanjunQi/UVACS
5
4/24/18
Dr.YanjunQi/UVACS
Whatisclustering?• Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjects– highintra-classsimilarity– lowinter-classsimilarity– Itisthecommonestformofunsupervisedlearning
• AcommonandimportanttaskthatfindsmanyapplicaJonsinScience,Engineering,informaJonScience,andotherplaces,e.g.
• GroupgenesthatperformthesamefuncJon• GroupindividualsthathassimilarpoliJcalview• Categorizedocumentsofsimilartopics• Idealitysimilarobjectsfrompictures
6
4/24/18
Dr.YanjunQi/UVACS
Whatisclustering?• Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjects– highintra-classsimilarity– lowinter-classsimilarity– Itisthecommonestformofunsupervisedlearning
• AcommonandimportanttaskthatfindsmanyapplicaJonsinScience,Engineering,informaJonScience,andotherplaces,e.g.
• GroupgenesthatperformthesamefuncJon• GroupindividualsthathassimilarpoliJcalview• Categorizedocumentsofsimilartopics• Idealitysimilarobjectsfrompictures
7
4/24/18
Dr.YanjunQi/UVACS
ToyExamples• People
• Images
• Language
• species
8
Application (I): Search
Result Clustering
4/24/18 9
Dr.YanjunQi/UVACS
Application (II): Navigation
4/24/18 10
Dr.YanjunQi/UVACS
4/24/18
Dr.YanjunQi/UVACS
Issuesforclustering• Whatisanaturalgroupingamongtheseobjects?
– DefiniJonof"groupness"• Whatmakesobjects“related”?
– DefiniJonof"similarity/distance"• RepresentaJonforobjects
– Vectorspace?NormalizaJon?• Howmanyclusters?
– Fixedapriori?– Completelydatadriven?
• Avoid“trivial”clusters-toolargeorsmall• ClusteringAlgorithms
– ParJJonalalgorithms– Hierarchicalalgorithms
• FormalfoundaJonandconvergence11
4/24/18
Dr.YanjunQi/UVACS
TodayRoadmap:clustering
§ DefiniJonof"groupness”§ DefiniJonof"similarity/distance"§ RepresentaJonforobjects§ Howmanyclusters?§ ClusteringAlgorithms
§ ParJJonalalgorithms§ Hierarchicalalgorithms
§ FormalfoundaJonandconvergence12
4/24/18
Dr.YanjunQi/UVACS
Whatisanaturalgroupingamongtheseobjects?
13
Anotherexample:clusteringissubjecJve
A
B
A
B
A
B
A
B A
B
A
B
TwopossibleSoluJons…
4/24/18 Dr.YanjunQi/UVACS 14
4/24/18
Dr.YanjunQi/UVACS
TodayRoadmap:clustering
§ DefiniJonof"groupness”§ DefiniJonof"similarity/distance"§ RepresentaJonforobjects§ Howmanyclusters?§ ClusteringAlgorithms
§ ParJJonalalgorithms§ Hierarchicalalgorithms
§ FormalfoundaJonandconvergence15
4/24/18
Dr.YanjunQi/UVACS
WhatisSimilarity?
• TherealmeaningofsimilarityisaphilosophicalquesJon.WewilltakeamorepragmaJcapproach
• DependsonrepresentaJonandalgorithm.Formanyrep./alg.,easiertothinkintermsofadistance(ratherthansimilarity)betweenvectors.
Hardtodefine!Butweknowitwhenweseeit
16
4/24/18
Dr.YanjunQi/UVACS
WhatproperJesshouldadistancemeasurehave?
• D(A,B)=D(B,A) Symmetry
• D(A,A)=0 ConstancyofSelf-Similarity
• D(A,B)=0IIfA=B Posi=vitySepara=on
• D(A,B)<=D(A,C)+D(B,C) TriangularInequality
17
4/24/18
Dr.YanjunQi/UVACS
• D(A,B)=D(B,A) Symmetry– Otherwiseyoucouldclaim"AlexlookslikeBob,butBoblooksnothing
likeAlex"
• D(A,A)=0 ConstancyofSelf-Similarity– Otherwiseyoucouldclaim"AlexlooksmorelikeBob,thanBobdoes"
• D(A,B)=0IIfA=B Posi=vitySepara=on– Otherwisethereareobjectsinyourworldthataredifferent,butyou
cannottellapart.
• D(A,B)<=D(A,C)+D(B,C) TriangularInequality– Otherwiseyoucouldclaim"AlexisverylikeBob,andAlexisverylike
Carl,butBobisveryunlikeCarl"
IntuiJonsbehinddesirableproperJesofdistancemeasure
18
4/24/18
Dr.YanjunQi/UVACS
DistanceMeasures:MinkowskiMetric
• Supposetwoobjectxandybothhavepfeatures
• TheMinkowskimetricisdefinedby• MostCommonMinkowskiMetrics
!!d(x , y)= |xi− yi
i=1
p
∑ |rr
!!
x = (x1 ,x2 ,!,xp)y = ( y1 , y2 ,!, yp)
1,r =2(Euclideandistance)d(x , y)= |xi− yii=1
p
∑ |22
2,r =1(Manhattandistance)d(x , y)= |xi− yii=1
p
∑ |
3,r = +∞("sup"distance)d(x , y)=max1≤i≤p
|xi− yi |19
4/24/18
Dr.YanjunQi/UVACS
.},{max :distance sup"" :3. :distanceManhattan :2
. :distanceEuclidean :1
434734
5342 22
==+
=+
AnExample
4
3
x
y
20
4/24/18
Dr.YanjunQi/UVACS
.},{max :distance sup"" :3. :distanceManhattan :2
. :distanceEuclidean :1
434734
5342 22
==+
=+
AnExample
4
3
x
y
21
4/24/18
Dr.YanjunQi/UVACS
11011111100001110100111001001001101716151413121110987654321
GeneBGeneA
. :Distance Hamming 5141001 =+=+ )#()#(
• ManhanandistanceiscalledHammingdistancewhenallfeaturesarebinaryordiscrete.
– E.g.,GeneExpressionLevelsUnder17CondiJons(1-High,0-Low)
Hammingdistance:discretefeatures
!!d(x , y)= |xi− yi
i=1
p
∑ |
22
4/24/18
Dr.YanjunQi/UVACS
EditDistance:Agenerictechniqueformeasuringsimilarity
• Tomeasurethesimilaritybetweentwoobjects,transformoneoftheobjectsintotheother,andmeasurehowmucheffortittook.Themeasureofeffortbecomesthedistancemeasure.
ThedistancebetweenPanyandSelma.
Changedresscolor,1pointChangeearringshape,1pointChangehairpart,1point
D(Pany,Selma)=3
ThedistancebetweenMargeandSelma.
Changedresscolor,1pointAddearrings,1pointDecreaseheight,1pointTakeupsmoking,1pointLoseweight,1point
D(Marge,Selma)=5
ThisiscalledtheEditdistanceortheTransformaJondistance23
SelmaPanyMarge
• PearsoncorrelaJoncoefficient
• Specialcase:cosinedistance4/24/18
Dr.YanjunQi/UVACS
. and where
)()(
))((),(
∑∑
∑ ∑
∑
==
= =
=
==
−×−
−−=
p
iip
p
iip
p
i
p
iii
p
iii
yyxx
yyxx
yyxxyxs
1
1
1
1
1 1
22
1
1≤),( yxs
SimilarityMeasures:CorrelaJonCoefficient
yxyxyxs !!!!
⋅⋅=),(
• MeasuringthelinearcorrelaLonbetweentwosequences,xandy,
• givingavaluebetween+1and−1inclusive,where1istotalposiJvecorrelaLon,0isnocorrelaLon,and−1istotalnegaJvecorrelaLon.
CorrelaJonisunitindependent
24
4/24/18
Dr.YanjunQi/UVACS
SimilarityMeasures:e.g.,CorrelaJonCoefficientonJmeseriessamples
Time
Gene A
Gene B
Gene A Time
Gene B
Expression Level Expression Level
Expression Level
Time
Gene A Gene B
25
CorrelaJonisunitindependent;IfyouscaleoneoftheobjectstenJmes,youwillgetdifferenteuclideandistancesandsamecorrelaJondistances.
4/24/18
Dr.YanjunQi/UVACS
TodayRoadmap:clustering
§ DefiniJonof"groupness”§ DefiniJonof"similarity/distance"§ RepresentaJonforobjects§ Howmanyclusters?§ ClusteringAlgorithms
§ ParJJonalalgorithms§ Hierarchicalalgorithms
§ FormalfoundaJonandconvergence26
4/24/18
Dr.YanjunQi/UVACS
ClusteringAlgorithms
• ParJJonalalgorithms– Usuallystartwitharandom(parJal)parJJoning
– RefineititeraJvely• Kmeansclustering• Mixture-Modelbasedclustering
• Hierarchicalalgorithms– Bonom-up,agglomeraJve– Top-down,divisive
27
4/24/18
Dr.YanjunQi/UVACS
ClusteringAlgorithms
• ParJJonalalgorithms– Usuallystartwitharandom(parJal)parJJoning
– RefineititeraJvely• Kmeansclustering• Mixture-Modelbasedclustering
• Hierarchicalalgorithms– Bonom-up,agglomeraJve– Top-down,divisive
28
4/24/18
Dr.YanjunQi/UVACS
TodayRoadmap:clustering
§ DefiniJonof"groupness”§ DefiniJonof"similarity/distance"§ RepresentaJonforobjects§ Howmanyclusters?§ ClusteringAlgorithms
§ ParJJonalalgorithms§ Hierarchicalalgorithms
§ FormalfoundaJonandconvergence29
4/24/18
Dr.YanjunQi/UVACS
HierarchicalClustering• Buildatree-basedhierarchicaltaxonomy(dendrogram)fromasetofobjects,e.g.organisms,documents.
• NotethathierarchiesarecommonlyusedtoorganizeinformaJon,forexampleinawebportal.– Yahoo!hierarchyismanuallycreated,wewillfocusonautomaJccreaJonofhierarchies
Withbackbone Withoutbackbone
30
(How-to) Hierarchical Clustering The number of dendrograms with n leafs
= (2n -3)!/[(2(n -2)) (n -2)!]
Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10 34,459,425
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjectsè
highintra-classsimilaritylowinter-classsimilarity
4/24/18
Dr.YanjunQi/UVACS
31
(How-to) Hierarchical Clustering The number of dendrograms with n leafs
= (2n -3)!/[(2(n -2)) (n -2)!]
Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10 34,459,425
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjectsè
highintra-classsimilaritylowinter-classsimilarity
Agreedylocal
opJmalsoluJon
4/24/18
Dr.YanjunQi/UVACS
32
4/24/18
Dr.YanjunQi/UVACS
33
(How-to) Hierarchical Clustering The number of dendrograms with n leafs
= (2n -3)!/[(2(n -2)) (n -2)!]
Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10 34,459,425
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
Clustering:theprocessofgroupingasetofobjectsintoclassesofsimilarobjectsè
highintra-classsimilaritylowinter-classsimilarity
Agreedylocal
opJmalsoluJon
0 8 8 7 7
0 2 4 4
0 3 3
0 1
0
D( , ) = 8 D( , ) = 1
We begin with a distance matrix which contains the distances between every pair of objects in our database.
4/24/18
Dr.YanjunQi/UVACS
34
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
… Consider all possible merges…
Choose the best
4/24/18
Dr.YanjunQi/UVACS
35
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
… Consider all possible merges…
Choose the best
Consider all possible merges… …
Choose the best
4/24/18
Dr.YanjunQi/UVACS
36
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
… Consider all possible merges…
Choose the best
Consider all possible merges… …
Choose the best
Consider all possible merges…
Choose the best …
4/24/18
Dr.YanjunQi/UVACS
37
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
… Consider all possible merges…
Choose the best
Consider all possible merges… …
Choose the best
Consider all possible merges…
Choose the best … But how do we compute distances
between clusters rather than objects?
4/24/18
Dr.YanjunQi/UVACS
38
4/24/18
Dr.YanjunQi/UVACS
Howtodecidethedistancesbetweenclusters?
• Single-Link
– NearestNeighbor:theirclosestmembers.
• Complete-Link– FurthestNeighbor:theirfurthestmembers.
• Average:– averageofallcross-clusterpairs.
39
Computing distance between clusters: Single Link
• cluster distance = distance of two closest members in each class
- Potentially long and skinny clusters
4/24/18
Dr.YanjunQi/UVACS
40
Computing distance between clusters: : Complete Link
• cluster distance = distance of two farthest members
+ tight clusters
4/24/18
Dr.YanjunQi/UVACS
41
Computing distance between clusters: Average Link
• cluster distance = average distance of all pairs
the most widely used measure
Robust against noise
4/24/18
Dr.YanjunQi/UVACS
42
Example: single link
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
0458907910
03602
0
54321
54321
12 3 4
5
4/24/18
Dr.YanjunQi/UVACS
43
Example: single link
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
0458907910
03602
0
54321
54321
12 3 4
5
4/24/18
Dr.YanjunQi/UVACS
44
Example: single link
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
0458907910
03602
0
54321
54321
12 3 4
5
4/24/18
Dr.YanjunQi/UVACS
45
Example: single link
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
0458907910
03602
0
54321
54321
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
0458079
030
543)2,1(
543)2,1(
12 3 4
5
8}8,9min{},min{9}9,10min{},min{3}3,6min{},min{
5,25,15),2,1(
4,24,14),2,1(
3,23,13),2,1(
======
===
ddddddddd
4/24/18
Dr.YanjunQi/UVACS
46
Example: single link
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
04507
0
54)3,2,1(
54)3,2,1(
12 3 4
5
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
0458079
030
543)2,1(
543)2,1(
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
0458907910
03602
0
54321
54321
5}5,8min{},min{7}7,9min{},min{
5,35),2,1(5),3,2,1(
4,34),2,1(4),3,2,1(
======
dddddd
4/24/18
Dr.YanjunQi/UVACS
47
Example: single link
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
04507
0
54)3,2,1(
54)3,2,1(
12 3 4
5
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
0458079
030
543)2,1(
543)2,1(
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
0458907910
03602
0
54321
54321
5},min{ 5),3,2,1(4),3,2,1()5,4(),3,2,1( == ddd
4/24/18
Dr.YanjunQi/UVACS
48
29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7
1
2
3
4
5
6
7
Average linkage
Single linkage
Height represents distance between objects / clusters
ParJJonsbycutngthedendrogramatadesiredlevel:eachconnectedcomponentformsacluster.
4/24/18
Dr.YanjunQi/UVACS
49
4/24/18
Dr.YanjunQi/UVACS
HierarchicalClustering• Bonom-UpAgglomeraJveClustering
– Startswitheachobjectinaseparatecluster– thenrepeatedlyjoinstheclosestpairofclusters,– unJlthereisonlyonecluster.
Thehistoryofmergingformsabinarytreeorhierarchy(dendrogram)
• Top-Downdivisive– StarJngwithallthedatainasinglecluster,– Considereverypossiblewaytodividetheclusterintotwo.Choosethebestdivision
– Andrecursivelyoperateonbothsides.
50
4/24/18
Dr.YanjunQi/UVACS
ComputaJonalComplexity
• InthefirstiteraJon,allHACmethodsneedtocomputesimilarityofallpairsofnindividualinstanceswhichisO(n2p).
• Ineachofthesubsequentn−2mergingiteraJons,computethedistancebetweenthemostrecentlycreatedclusterandallotherexisJngclusters.
• Forthesubsequentsteps,inordertomaintainanoverallO(n2)performance,compuJngsimilaritytoeachotherclustermustbedoneinconstantJme.ElseO(n2logn)orO(n3)ifdonenaively
51
SummaryofHierarchalClusteringMethods
• Noneedtospecifythenumberofclustersinadvance.
• HierarchicalstructuremapsnicelyontohumanintuiJonforsomedomains
• Theydonotscalewell:JmecomplexityofatleastO(n2),wherenisthenumberoftotalobjects.
• LikeanyheurisJcsearchalgorithms,localopJmaareaproblem.
• InterpretaJonofresultsis(very)subjecJve.4/24/18
Dr.YanjunQi/UVACS
52
Hierarchical Clustering
Clustering
n/a
No clearly defined loss
greedy bottom-up (or top-down)
Dendrogram (tree)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
4/24/18 53
Dr.YanjunQi/UVACS
References
q HasJe,Trevor,etal.Theelementsofsta=s=callearning.Vol.2.No.1.NewYork:Springer,2009.
q BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides
q BigthankstoProf.ZivBar-Joseph@CMUforallowingmetoreusesomeofhisslides
4/24/18
Dr.YanjunQi/UVACS
54