MultiDimensionalScaling
-
Upload
maksim-tsvetovat -
Category
Documents
-
view
225 -
download
0
Transcript of MultiDimensionalScaling
-
8/8/2019 MultiDimensionalScaling
1/28
1
MDS
Multi-Dimensional Scaling
1
-
8/8/2019 MultiDimensionalScaling
2/28
2
Feature Matrices
Name Age Income Education
1 Sean 19 23,000 H.S.
2 Joe 25 90,000 MBA
n Bill 32 28,000 Ph.D.
Formalized as a set of observations,each containing a set of variables
-
8/8/2019 MultiDimensionalScaling
3/28
3
Feature Matrices Standard fodder of regression analysis
Assumption:
For all observations, dependenciesbetween variables are linear, log-linear orat least monotonic
Generally not true in real world
Outliers may be just as important
-
8/8/2019 MultiDimensionalScaling
4/28
4
Monotonic Relationship
4
*
*
*
*
*
*
*
Education
$$
Normal People
-
8/8/2019 MultiDimensionalScaling
5/28
5
What then Network methods allow detection of
similarity clusters in feature data
Relationship between clusters can bediscontinuous
*
*
*
*
*
*
**
**
** **
** **
**
*
**
**
Education
$$
Ph.Ds
Lotto WinnersNormal People
-
8/8/2019 MultiDimensionalScaling
6/28
6
What if we have more
columns?
6
-
8/8/2019 MultiDimensionalScaling
7/28
77
-
8/8/2019 MultiDimensionalScaling
8/28
88
-
8/8/2019 MultiDimensionalScaling
9/28
99
-
8/8/2019 MultiDimensionalScaling
10/28
1010
-
8/8/2019 MultiDimensionalScaling
11/28
11
Multi-Dimensional Scaling Hi-clustering is a discrete model
Partition nodes into exhaustive non-overlapping subsets
World is not so black-n-white
-
8/8/2019 MultiDimensionalScaling
12/28
12
MDS The purpose of multidimensional scaling
(MDS) is to provide a spatial representation
of the pattern of similarities
More similar nodes will appear closer together
Finds non-intuitive equivalences in networks
12
-
8/8/2019 MultiDimensionalScaling
13/28
13
Input to MDS Measure of pairwise similarity among nodes
Attribute-based
Euclidean distances Graph distances
CONCOR similarities
Output:
A set of coordinates in 2D or3D space such that
Similar nodes are closer together then dissimilar nodes
-
8/8/2019 MultiDimensionalScaling
14/28
14
14
-
8/8/2019 MultiDimensionalScaling
15/28
1515
-
8/8/2019 MultiDimensionalScaling
16/28
16
Algorithm MDS finds a set of vectors in p-dimensional space
such that the matrix of euclidean distances amongthem corresponds as closely as possible to afunction of the input matrix according to a fitnessfunction called stress.
1. Assign points to arbitrary coordinates in p-dimensionalspace.
2.Compute euclidean distances among all pairs of points, toform the D matrix.
3.Compare the D matrix with the input D matrix by evaluatingthe stress function.The smaller the value, the greater thecorrespondance between the two.
4. Adjust coordinates of each point in the direction of the
stress vector
-
8/8/2019 MultiDimensionalScaling
17/28
17
Dimensionality Normally, MDS is used in 2D space for
optimal visual impact may be a very poor,highly distorted,
representation of your data. High stress value.
Increase the number of dimensions.
Difficulties: High-dimensional spaces are difficult to represent
visually
With increasing dimensions, you must estimate anincreasing number of parameters to obtain adecreasing improvement in stress.
-
8/8/2019 MultiDimensionalScaling
18/28
1818
-
8/8/2019 MultiDimensionalScaling
19/28
1919
-
8/8/2019 MultiDimensionalScaling
20/28
20
Stress function The degree of correspondence between the distances among points
on MDS map and the matrix input
dij = euclidean distance, across all dimensions, between points i and j
on the map, f(xij) is some function of the input data,
scale = a constant scaling factor, used to keep stress values between0 and 1.
When the MDS map perfectly reproduces the input data,
f(xij) = dij is for all i and j, so stress is zero.
Thus, the smaller the stress, the better the representation.
-
8/8/2019 MultiDimensionalScaling
21/28
21
Stress Function, cont.
The transformation of the input values f(xij)used depends on whether metric or non-metric scaling.
Metric scaling: f(xij) = xij.
raw input data is compared directly to the mapdistances
Inverse of map distances for similarities
Non-metric scaling f(xij) is a weakly monotonic transformation of the
input data that minimizes the stress function.
Computed using a regression method
-
8/8/2019 MultiDimensionalScaling
22/28
22
Non-zero stress Caused by measurement error or
insufficient dimensionality
Stress levels of
-
8/8/2019 MultiDimensionalScaling
23/28
23
Increasing dimensionality
As number of dimensions increases,stress decreases:
-
8/8/2019 MultiDimensionalScaling
24/28
24
Interpretation of MDS Map Axes are meaningless
We are looking at cohesiveness and
proximity of clusters, not their locations
Infinite number of possible permutations
If stress > 0, there is distortion
Larger distances less distorted thensmaller
-
8/8/2019 MultiDimensionalScaling
25/28
25
What to look for Clusters
groups of items that are closer to each other than
to other items. When really tight,highly separated clusters occur
in perceptual data, it may suggest that eachcluster is a domain or subdomain which should beanalyzed individually.
Extract clusters and re-run MDS on them forfurther separation
-
8/8/2019 MultiDimensionalScaling
26/28
2626
-
8/8/2019 MultiDimensionalScaling
27/28
27
What to look for
Dimensions Item attributes that seem to order the items in the
map along a continuum. For example, an MDS of perceived similarities among
breeds of dogs may show a distinct ordering of dogs by
size. At the same time, an independent ordering of dogs
according to viciousness might be observed.
Orderings may not follow the axes or be orthogonal toeach other
The underlying dimensions are thought to"explain" the perceived similarity between items.
Implicit similarity function is a weighted sum ofattributes
May discover non-obvious continuums
-
8/8/2019 MultiDimensionalScaling
28/28
28
High-dimensionality MDS Difficult to interpret visually, need a
mathematical technique
Feed MDS coordinates into anotherdiscriminator function
HiClus may work well
May be easier to tease apart then original
attribute vectors