01.04 Clustering Examples
Embed Size (px)
Transcript of 01.04 Clustering Examples
-
8/13/2019 01.04 Clustering Examples
1/14
Cluster Analysis
-
8/13/2019 01.04 Clustering Examples
2/14
Cluster Analysis
-
8/13/2019 01.04 Clustering Examples
3/14
Cluster Analysis (data segmentation)
is an exploratory method for identifying
homogenous groups (clusters) of records
Similarrecords should belong to the same cluster
Dissimilarrecords should belong to different
clusters
-
8/13/2019 01.04 Clustering Examples
4/14
case sex glasses moustache smile hat
1
m
y
n
y
n
2 f n n y n3 m y n n n4 m n n n n5 m n n y n6 m n y n y7 m y n y n8 m n n y n9 m y y y n
10 f n n n n11 m n y n n12 f n n n n
From http://obelia.jde.aca.mmu.ac.uk/multivar/ca.htm(no longer)
http://obelia.jde.aca.mmu.ac.uk/multivar/ca.htmhttp://obelia.jde.aca.mmu.ac.uk/multivar/ca.htmhttp://obelia.jde.aca.mmu.ac.uk/multivar/ca.htm -
8/13/2019 01.04 Clustering Examples
5/14
Example 1: Exploring Stars(from Data Mining Techniquesby Berry & Linoff)
Astronomers study the relationship between
brightness of stars and their temperatures
Lum
inosity
Temperature
-
8/13/2019 01.04 Clustering Examples
6/14
Example 2: Fitting the Troops(from Data Mining Techniquesby Berry & Linoff)
The US army recently commissioned a study on how toredesign the uniforms of female soldiers. The armys goal
is to reduce the number of different uniform sizes that have
to be kept in inventory while still providing each soldier with
well-fitting khakis.
Researchers Ashdown and Paal @ Cornell University
designed a new set of sizes based on the actual shapes of
womenin the army. Unlike traditional clothing size systems,
the new sizes are not an ordered set of graduated sizes
where all dimensions increase together.
Instead, they came up with sizes that fit particular body
types(e.g., short-legged, small-waisted, large-busted
women with long torsos, average arms, broad shoulders,
and skinny necks).
-
8/13/2019 01.04 Clustering Examples
7/14
Market Segmentation
Segment customers based on demographic info and transaction
history in order to tailor marketing strategy for each segment
-
8/13/2019 01.04 Clustering Examples
8/14
Investment:
Cluster securities based on financial performance info(return, volatility, beta) and other info (industry andmarket capitalization) to create a balanced portfolio
Industry Analysis:For a given industry, cluster firms based on {growth rate,profitability, market size, product range, presence invarious international markets}, to understand industry
structure (determine competitors)
Lots more @ http://www.clustan.com/clustering_examples.html
http://www.clustan.com/clustering_examples.htmlhttp://www.clustan.com/clustering_examples.html -
8/13/2019 01.04 Clustering Examples
9/14
UG Business ProgramsUniversities Clustering.xls
Data for 25 undergraduateprograms at business schools inUS universities in 1995.
Univ SAT Top10 Accept SFRatio Expenses GradRate
Brown 1310 89 22 13 22,704 94
CalTech 1415 100 25 6 63,575 81
CMU 1260 62 59 9 25,026 72
Columbia 1310 76 24 12 31,510 88
Cornell 1280 83 33 13 21,864 90
Dartmouth 1340 89 23 10 32,162 95
Duke 1315 90 30 12 31,585 95
Georgetown 1255 74 24 12 20,126 92
Harvard 1400 91 14 11 39,525 97
JohnsHopkins 1305 75 44 7 58,691 87
MIT 1380 94 30 10 34,870 91
Northwestern 1260 85 39 11 28,052 89
NotreDame 1255 81 42 13 15,122 94PennState 1081 38 54 18 10,185 80
Princeton 1375 91 14 8 30,220 95
Purdue 1005 28 90 19 9,066 69
Stanford 1360 90 20 12 36,450 93
TexasA&M 1075 49 67 25 8,704 67
UCBerkeley 1240 95 40 17 15,140 78
UChicago 1290 75 50 13 38,380 87
UMichigan 1180 65 68 16 15,470 85
UPenn 1285 80 36 11 27,553 90
UVA 1225 77 44 14 13,349 92UWisconsin 1085 40 69 15 11,857 71
Yale 1375 95 19 11 43,514 96
This dataset excludes image variables
(student satisfaction, employersatisfaction, deans opinions)
Student
quality ProgramPlacement
http://localhost/var/www/apps/conversion/Desktop/Universities.xlshttp://localhost/var/www/apps/conversion/Desktop/Universities.xls -
8/13/2019 01.04 Clustering Examples
10/14
Why cluster?
How can clustering help a prospective applicant?
How can clustering help a business school dean?
-
8/13/2019 01.04 Clustering Examples
11/14
Cluster Analysis:
Backstage
-
8/13/2019 01.04 Clustering Examples
12/14
Simple Clustering: 1-2 variables
Visual inspection of data (histogram, scatterplot)
Lum
inosity
Temperature
-
8/13/2019 01.04 Clustering Examples
13/14
Clustering with >2 variables
Two approaches:
Distance: compute multivariate distance
between records, and group close records
Homogeneity: group records to increase within-
group homogeneity
-
8/13/2019 01.04 Clustering Examples
14/14
Two Types of Clustering Algorithms
Hierarchical methods- agglomerative:Begin with nclusters; sequentially mergesimilar clusters until 1 cluster is left.
Useful when goal is to arrange the
clusters into a natural hierarchy. Requires specifying distance measure
Clustering music on Last.fm (from anthony.liekens.net)
Non-Hierarchical methods:
Pre-specified number of clusters,assign records to each of the clusters. Requires specifying # clusters Computationally cheap