Lecture17
-
Upload
albert-orriols-puig -
Category
Education
-
view
1.170 -
download
0
description
Transcript of Lecture17
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 17Lecture 17Clustering
Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull
Recap of Lectures 5-16
Slide 2Artificial Intelligence Machine LearningArtificial Intelligence Machine Learning
Recap of Lectures 5-16Data classification
Labeled data
Build a modelBuild a modelthat coversall the space
Association rule analysisUnlabeled dataUnlabeled data
Get the most frequent/importantassociations
Slide 3Artificial Intelligence Machine Learning
Today’s Agenda
What’s clustering?What’s a good clustering solution?Components of a clustering taskTypes of ClusteringTypes of ClusteringHierarchical Clustering
Slide 4Artificial Intelligence Machine Learning
What’s ClusteringClustering g
The goal of clustering is to separate a finite unlabeled data set into a finite and discrete set of “natural,” hidden data structureso a e a d d sc e e se o a u a , dde da a s uc u es
As a data mining task, data clustering aims at the identification of clusters, or densely populated regions, according to some o c us e s, o de se y popu a ed eg o s, acco d g o so emeasurement or similarity function
Studied and applied in many fieldsSStatistics
Spatial database
Machine learning (unsupervised learning)
Data mining
Slide 5
Data mining
Artificial Intelligence Machine Learning
What’s a Good Clustering Sol.?
in cluster analysis a group of objects is split up into a number of more or less y g p j p phomogeneous subgroups on the basis of an often subjectively chosen measure of similarity (i.e., chosen subjectively based on its ability to create “interesting” clusters), such that the similarity between objects within a subgroup is larger than the similarity between objects belonging to different subgroups
Slide 6
between objects belonging to different subgroups
Artificial Intelligence Machine Learning
What’s a Good Clustering Sol.?
Do you thing this is good?y g g
Slide 7Artificial Intelligence Machine Learning
What’s a Good Clustering Sol.?
Do you thing this is better?Do you thing this is better?
Slide 8Artificial Intelligence Machine Learning
What’s a Good Clustering Sol.?
Do you thing this is better?Do you thing this is better?
Slide 9Artificial Intelligence Machine Learning
Good Clustering Sols.So, we got the point visually. Can we express more , g p y pformally when a clustering solution is good?
Homogeneity and separation principlesHomogeneity: Elements within a cluster are close to each other
Separation: Elements in different clusters are further apart from each other
clustering is not an easy task!…clustering is not an easy task!
Slide 10Artificial Intelligence Machine Learning
Components of a Clustering Task
Slide 11Artificial Intelligence Machine Learning
Types of ClusteringHard partitional clusteringp g
Organize elements into disjoin groupsg oups
Hierarchical clustering O i l iOrganize elements into a tree, leaves represent genes and the length of the paths between leaves representsof the paths between leaves represents the distances between genes. Similar genes lie within the same subtrees
Also classified asAgglomerative: Start with every element in its own cluster andAgglomerative: Start with every element in its own cluster, and iteratively join clusters together
Divisive: Start with one cluster and iteratively divide it into
Slide 12
Divisive: Start with one cluster and iteratively divide it into smaller clusters
Artificial Intelligence Machine Learning
Types of Clustering
HIERARCHICAL CLUSTERING
Slide 13Artificial Intelligence Machine Learning
Example of Hierarchical Clust.
Slide 14Artificial Intelligence Machine Learning
Example of Hierarchical Clust.
Slide 15Artificial Intelligence Machine Learning
Example of Hierarchical Clust.
Slide 16Artificial Intelligence Machine Learning
Example of Hierarchical Clust.
Slide 17Artificial Intelligence Machine Learning
Example of Hierarchical Clust.
Slide 18Artificial Intelligence Machine Learning
Example of Hierarchical Clust.Hierarchical clustering is sometimes used to reveal gevolutionary history
It provides very informative descriptions and visualization for the potential data clustering structures, especially when real
Slide 19
hierarchical relations exist in the data.
Artificial Intelligence Machine Learning
Pseudocode Hierarchical Clustering (d , n)1. Form n clusters each with one element2. Construct a graph T by assigning one vertex to each cluster3 while there is more than one cluster3. while there is more than one cluster
1. Find the two closest clusters C1 and C2 2. Merge C1 and C2 into new cluster C with |C1| +|C2| elements
C t di t f C t ll th l t3. Compute distance from C to all other clusters4. if they are close
1. Add a new vertex C to T and connect to vertices C1 and C22. Remove rows and columns of d corresponding to C1 and C23. Add a row & column to d corresponding to the new cluster C
4. return T4. return T
The algorithm takes a nxn distance matrix d of pairwise distances between points as an input.
Slide 20Artificial Intelligence Machine Learning
Are They Similar?
Slide 21Artificial Intelligence Machine Learning
Distance FunctionsHow close?
Distance between two clusters is the smallest distance between any pair of their elementsbe ee a y pa o e e e e s
d (C C*) (1 / |C*||C|) ∑ d( )davg(C, C*) = (1 / |C*||C|) ∑ d(x,y)
for all elements x in C and y in C*
Distance between two clusters is the average distanceDistance between two clusters is the average distance between all pairs of their elements
Slide 22Artificial Intelligence Machine Learning
Distance Functions
Slide 23Artificial Intelligence Machine Learning
Some remarksThe common criticism
HC algorithms lack robustness, since they are sensitive to noise and outlierso se a d ou e s
Once an object is assigned to a cluster is never reconsidered
C t ti l l it i t l t O(N2)Computational complexity is, at least, O(N2)
RecentlyNew improvements to deal with large data setsNew improvements to deal with large data sets
E.g.: CURE, ROCK, Chameleon and BIRCH
Slide 24Artificial Intelligence Machine Learning
Next Class
More topics in clustering: K-means
Slide 25Artificial Intelligence Machine Learning
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 17Lecture 17Clustering
Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull