01.04 Clustering Examples

of 14 /14
8/13/2019 01.04 Clustering Examples http://slidepdf.com/reader/full/0104-clustering-examples 1/14 Cluster Analysis

Embed Size (px)

Transcript of 01.04 Clustering Examples

  • 8/13/2019 01.04 Clustering Examples

    1/14

    Cluster Analysis

  • 8/13/2019 01.04 Clustering Examples

    2/14

    Cluster Analysis

  • 8/13/2019 01.04 Clustering Examples

    3/14

    Cluster Analysis (data segmentation)

    is an exploratory method for identifying

    homogenous groups (clusters) of records

    Similarrecords should belong to the same cluster

    Dissimilarrecords should belong to different

    clusters

  • 8/13/2019 01.04 Clustering Examples

    4/14

    case sex glasses moustache smile hat

    1

    m

    y

    n

    y

    n

    2 f n n y n3 m y n n n4 m n n n n5 m n n y n6 m n y n y7 m y n y n8 m n n y n9 m y y y n

    10 f n n n n11 m n y n n12 f n n n n

    From http://obelia.jde.aca.mmu.ac.uk/multivar/ca.htm(no longer)

    http://obelia.jde.aca.mmu.ac.uk/multivar/ca.htmhttp://obelia.jde.aca.mmu.ac.uk/multivar/ca.htmhttp://obelia.jde.aca.mmu.ac.uk/multivar/ca.htm
  • 8/13/2019 01.04 Clustering Examples

    5/14

    Example 1: Exploring Stars(from Data Mining Techniquesby Berry & Linoff)

    Astronomers study the relationship between

    brightness of stars and their temperatures

    Lum

    inosity

    Temperature

  • 8/13/2019 01.04 Clustering Examples

    6/14

    Example 2: Fitting the Troops(from Data Mining Techniquesby Berry & Linoff)

    The US army recently commissioned a study on how toredesign the uniforms of female soldiers. The armys goal

    is to reduce the number of different uniform sizes that have

    to be kept in inventory while still providing each soldier with

    well-fitting khakis.

    Researchers Ashdown and Paal @ Cornell University

    designed a new set of sizes based on the actual shapes of

    womenin the army. Unlike traditional clothing size systems,

    the new sizes are not an ordered set of graduated sizes

    where all dimensions increase together.

    Instead, they came up with sizes that fit particular body

    types(e.g., short-legged, small-waisted, large-busted

    women with long torsos, average arms, broad shoulders,

    and skinny necks).

  • 8/13/2019 01.04 Clustering Examples

    7/14

    Market Segmentation

    Segment customers based on demographic info and transaction

    history in order to tailor marketing strategy for each segment

  • 8/13/2019 01.04 Clustering Examples

    8/14

    Investment:

    Cluster securities based on financial performance info(return, volatility, beta) and other info (industry andmarket capitalization) to create a balanced portfolio

    Industry Analysis:For a given industry, cluster firms based on {growth rate,profitability, market size, product range, presence invarious international markets}, to understand industry

    structure (determine competitors)

    Lots more @ http://www.clustan.com/clustering_examples.html

    http://www.clustan.com/clustering_examples.htmlhttp://www.clustan.com/clustering_examples.html
  • 8/13/2019 01.04 Clustering Examples

    9/14

    UG Business ProgramsUniversities Clustering.xls

    Data for 25 undergraduateprograms at business schools inUS universities in 1995.

    Univ SAT Top10 Accept SFRatio Expenses GradRate

    Brown 1310 89 22 13 22,704 94

    CalTech 1415 100 25 6 63,575 81

    CMU 1260 62 59 9 25,026 72

    Columbia 1310 76 24 12 31,510 88

    Cornell 1280 83 33 13 21,864 90

    Dartmouth 1340 89 23 10 32,162 95

    Duke 1315 90 30 12 31,585 95

    Georgetown 1255 74 24 12 20,126 92

    Harvard 1400 91 14 11 39,525 97

    JohnsHopkins 1305 75 44 7 58,691 87

    MIT 1380 94 30 10 34,870 91

    Northwestern 1260 85 39 11 28,052 89

    NotreDame 1255 81 42 13 15,122 94PennState 1081 38 54 18 10,185 80

    Princeton 1375 91 14 8 30,220 95

    Purdue 1005 28 90 19 9,066 69

    Stanford 1360 90 20 12 36,450 93

    TexasA&M 1075 49 67 25 8,704 67

    UCBerkeley 1240 95 40 17 15,140 78

    UChicago 1290 75 50 13 38,380 87

    UMichigan 1180 65 68 16 15,470 85

    UPenn 1285 80 36 11 27,553 90

    UVA 1225 77 44 14 13,349 92UWisconsin 1085 40 69 15 11,857 71

    Yale 1375 95 19 11 43,514 96

    This dataset excludes image variables

    (student satisfaction, employersatisfaction, deans opinions)

    Student

    quality ProgramPlacement

    http://localhost/var/www/apps/conversion/Desktop/Universities.xlshttp://localhost/var/www/apps/conversion/Desktop/Universities.xls
  • 8/13/2019 01.04 Clustering Examples

    10/14

    Why cluster?

    How can clustering help a prospective applicant?

    How can clustering help a business school dean?

  • 8/13/2019 01.04 Clustering Examples

    11/14

    Cluster Analysis:

    Backstage

  • 8/13/2019 01.04 Clustering Examples

    12/14

    Simple Clustering: 1-2 variables

    Visual inspection of data (histogram, scatterplot)

    Lum

    inosity

    Temperature

  • 8/13/2019 01.04 Clustering Examples

    13/14

    Clustering with >2 variables

    Two approaches:

    Distance: compute multivariate distance

    between records, and group close records

    Homogeneity: group records to increase within-

    group homogeneity

  • 8/13/2019 01.04 Clustering Examples

    14/14

    Two Types of Clustering Algorithms

    Hierarchical methods- agglomerative:Begin with nclusters; sequentially mergesimilar clusters until 1 cluster is left.

    Useful when goal is to arrange the

    clusters into a natural hierarchy. Requires specifying distance measure

    Clustering music on Last.fm (from anthony.liekens.net)

    Non-Hierarchical methods:

    Pre-specified number of clusters,assign records to each of the clusters. Requires specifying # clusters Computationally cheap