MultiDimensionalScaling

download MultiDimensionalScaling

of 28

Transcript of MultiDimensionalScaling

  • 8/8/2019 MultiDimensionalScaling

    1/28

    1

    MDS

    Multi-Dimensional Scaling

    1

  • 8/8/2019 MultiDimensionalScaling

    2/28

    2

    Feature Matrices

    Name Age Income Education

    1 Sean 19 23,000 H.S.

    2 Joe 25 90,000 MBA

    n Bill 32 28,000 Ph.D.

    Formalized as a set of observations,each containing a set of variables

  • 8/8/2019 MultiDimensionalScaling

    3/28

    3

    Feature Matrices Standard fodder of regression analysis

    Assumption:

    For all observations, dependenciesbetween variables are linear, log-linear orat least monotonic

    Generally not true in real world

    Outliers may be just as important

  • 8/8/2019 MultiDimensionalScaling

    4/28

    4

    Monotonic Relationship

    4

    *

    *

    *

    *

    *

    *

    *

    Education

    $$

    Normal People

  • 8/8/2019 MultiDimensionalScaling

    5/28

    5

    What then Network methods allow detection of

    similarity clusters in feature data

    Relationship between clusters can bediscontinuous

    *

    *

    *

    *

    *

    *

    **

    **

    ** **

    ** **

    **

    *

    **

    **

    Education

    $$

    Ph.Ds

    Lotto WinnersNormal People

  • 8/8/2019 MultiDimensionalScaling

    6/28

    6

    What if we have more

    columns?

    6

  • 8/8/2019 MultiDimensionalScaling

    7/28

    77

  • 8/8/2019 MultiDimensionalScaling

    8/28

    88

  • 8/8/2019 MultiDimensionalScaling

    9/28

    99

  • 8/8/2019 MultiDimensionalScaling

    10/28

    1010

  • 8/8/2019 MultiDimensionalScaling

    11/28

    11

    Multi-Dimensional Scaling Hi-clustering is a discrete model

    Partition nodes into exhaustive non-overlapping subsets

    World is not so black-n-white

  • 8/8/2019 MultiDimensionalScaling

    12/28

    12

    MDS The purpose of multidimensional scaling

    (MDS) is to provide a spatial representation

    of the pattern of similarities

    More similar nodes will appear closer together

    Finds non-intuitive equivalences in networks

    12

  • 8/8/2019 MultiDimensionalScaling

    13/28

    13

    Input to MDS Measure of pairwise similarity among nodes

    Attribute-based

    Euclidean distances Graph distances

    CONCOR similarities

    Output:

    A set of coordinates in 2D or3D space such that

    Similar nodes are closer together then dissimilar nodes

  • 8/8/2019 MultiDimensionalScaling

    14/28

    14

    14

  • 8/8/2019 MultiDimensionalScaling

    15/28

    1515

  • 8/8/2019 MultiDimensionalScaling

    16/28

    16

    Algorithm MDS finds a set of vectors in p-dimensional space

    such that the matrix of euclidean distances amongthem corresponds as closely as possible to afunction of the input matrix according to a fitnessfunction called stress.

    1. Assign points to arbitrary coordinates in p-dimensionalspace.

    2.Compute euclidean distances among all pairs of points, toform the D matrix.

    3.Compare the D matrix with the input D matrix by evaluatingthe stress function.The smaller the value, the greater thecorrespondance between the two.

    4. Adjust coordinates of each point in the direction of the

    stress vector

  • 8/8/2019 MultiDimensionalScaling

    17/28

    17

    Dimensionality Normally, MDS is used in 2D space for

    optimal visual impact may be a very poor,highly distorted,

    representation of your data. High stress value.

    Increase the number of dimensions.

    Difficulties: High-dimensional spaces are difficult to represent

    visually

    With increasing dimensions, you must estimate anincreasing number of parameters to obtain adecreasing improvement in stress.

  • 8/8/2019 MultiDimensionalScaling

    18/28

    1818

  • 8/8/2019 MultiDimensionalScaling

    19/28

    1919

  • 8/8/2019 MultiDimensionalScaling

    20/28

    20

    Stress function The degree of correspondence between the distances among points

    on MDS map and the matrix input

    dij = euclidean distance, across all dimensions, between points i and j

    on the map, f(xij) is some function of the input data,

    scale = a constant scaling factor, used to keep stress values between0 and 1.

    When the MDS map perfectly reproduces the input data,

    f(xij) = dij is for all i and j, so stress is zero.

    Thus, the smaller the stress, the better the representation.

  • 8/8/2019 MultiDimensionalScaling

    21/28

    21

    Stress Function, cont.

    The transformation of the input values f(xij)used depends on whether metric or non-metric scaling.

    Metric scaling: f(xij) = xij.

    raw input data is compared directly to the mapdistances

    Inverse of map distances for similarities

    Non-metric scaling f(xij) is a weakly monotonic transformation of the

    input data that minimizes the stress function.

    Computed using a regression method

  • 8/8/2019 MultiDimensionalScaling

    22/28

    22

    Non-zero stress Caused by measurement error or

    insufficient dimensionality

    Stress levels of

  • 8/8/2019 MultiDimensionalScaling

    23/28

    23

    Increasing dimensionality

    As number of dimensions increases,stress decreases:

  • 8/8/2019 MultiDimensionalScaling

    24/28

    24

    Interpretation of MDS Map Axes are meaningless

    We are looking at cohesiveness and

    proximity of clusters, not their locations

    Infinite number of possible permutations

    If stress > 0, there is distortion

    Larger distances less distorted thensmaller

  • 8/8/2019 MultiDimensionalScaling

    25/28

    25

    What to look for Clusters

    groups of items that are closer to each other than

    to other items. When really tight,highly separated clusters occur

    in perceptual data, it may suggest that eachcluster is a domain or subdomain which should beanalyzed individually.

    Extract clusters and re-run MDS on them forfurther separation

  • 8/8/2019 MultiDimensionalScaling

    26/28

    2626

  • 8/8/2019 MultiDimensionalScaling

    27/28

    27

    What to look for

    Dimensions Item attributes that seem to order the items in the

    map along a continuum. For example, an MDS of perceived similarities among

    breeds of dogs may show a distinct ordering of dogs by

    size. At the same time, an independent ordering of dogs

    according to viciousness might be observed.

    Orderings may not follow the axes or be orthogonal toeach other

    The underlying dimensions are thought to"explain" the perceived similarity between items.

    Implicit similarity function is a weighted sum ofattributes

    May discover non-obvious continuums

  • 8/8/2019 MultiDimensionalScaling

    28/28

    28

    High-dimensionality MDS Difficult to interpret visually, need a

    mathematical technique

    Feed MDS coordinates into anotherdiscriminator function

    HiClus may work well

    May be easier to tease apart then original

    attribute vectors