Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee...
-
Upload
shon-watson -
Category
Documents
-
view
214 -
download
0
Transcript of Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee...
![Page 1: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/1.jpg)
Principal Component Analysis
![Page 2: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/2.jpg)
• 20 food products16 European Countries
Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish
Germany 90 49 88 19 57 51 19 21 27ItalyFrance
PCA Example: FOODS
![Page 3: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/3.jpg)
PCA Example: FOODS
![Page 4: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/4.jpg)
PCA Example: FOODS
![Page 5: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/5.jpg)
PCA Example: Red Sox Dataset: 110 Years of Redsox Performance Data Question: Pitchers and Batters Ages Matter for
Performance
![Page 6: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/6.jpg)
![Page 7: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/7.jpg)
![Page 8: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/8.jpg)
Redundancy
• Arbitrary observations by r1 and r2
• Low to high redundancies from (a) to (c)• (c) can be represented by a single variable
• Spread across the best-fit line – covariance between two variables
![Page 9: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/9.jpg)
Transform
• Linear Transformation• (x,y) in Cartesian coordinate• The same point becomes in (a,b) in another
coordinate system• Assuming linear transformation
• a = f(x, y) = x*c11 + y*c12
• b = g(x,y) = x*c21 + y*c22
=
For review of matrix, www.cs.uml.edu/~kim/580/review_matrix.pdf
![Page 10: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/10.jpg)
Eigenvector
=
= = 4*
• Eigenvector – projection to the same coordinate
• Eigenvectors of a square matrix are orthogonal
• Unique eigenvalues are associated with eigenvectors
![Page 11: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/11.jpg)
Transform
=
• http://www.ams.org/samplings/feature-column/fcarc-svd
![Page 12: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/12.jpg)
Transform
(Symmetric)
• http://www.ams.org/samplings/feature-column/fcarc-svd
![Page 13: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/13.jpg)
![Page 14: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/14.jpg)
Eigenvectors
• Mv = λiv
λi is scalarv is orthogonal vectors
• Non-symmetric
![Page 15: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/15.jpg)
![Page 16: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/16.jpg)
• For unit vectors u1 and u2
• A general vector x has coefficients projected by unit vectors
•
![Page 17: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/17.jpg)
• Vector product is of the form:
•
• =>
•
• SVD (Singular Vector Decomposition)
![Page 18: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/18.jpg)
•
M: mxn; U: mxm; Σ: mxn; V: nxn
• U is eigenvector of MMT
• V is eigenvector of MTM
![Page 19: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/19.jpg)
Transform = New Coordinate• What should be good for transform matrix for
PCA ?
• Covariance
![Page 20: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/20.jpg)
Mean, Variance, Covariance• X = (x1, x2, ….. xn) Y = (y1, y2, …..yn)
• E[X] = ∑i xi /n E[Y] = ∑I yi /n
• Variance = (st. dev.)2:• V[X] = ∑I (xi – E[X])2 / (n-1)
• Covariance -- • cov[X,Y] = ∑I (xi – E[X]) (yi – E[Y]) / (n-1)
![Page 21: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/21.jpg)
Covariance Matrix
• Three variables X,Y,Z
cov[x,x] cov[x,y] cov[x,z]cov[y,x] cov[y,y] cov[y,z]cov[z,x] cov[z,y] cov[z,z]
• cov[X,X] = V[X]• cov[X,Y] = cov{Y,X]
![Page 22: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/22.jpg)
PCA
![Page 23: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/23.jpg)
![Page 24: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/24.jpg)
![Page 25: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/25.jpg)
X Y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
1.0 1.1
1.5 1.6
1.1 0.9
167 749
13l92 62.42
Numerical Example
![Page 26: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/26.jpg)
X (adj) Y (adj)
0.69 0.49
-1.31 -1.21
0.39 0.99
0.09 0.29
1.29 1.09
0.49 0.79
0.19 -0.31
-0.81 -0.81
-0.31 -0.31
-0.71 -1.01
• After adjustments
![Page 27: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/27.jpg)
• Covariance matrix
cov = (.6165 .6154).6154 .7166
• Eigenvalues
|.6165-λ .6154 ||.6154 .7166-λ| =0
1.2840, 0.0491• Eigenvectors
![Page 28: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/28.jpg)
![Page 29: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/29.jpg)
Example: Amino Acid (AA) - Basic
![Page 30: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/30.jpg)
Clustering of AAs How many clusters ?
Use 4 AA groupsGood for acidic and basicP in polar groupNonpolar group is wide spread
Similarities of AA’s determine the ease of substitutions
Some alignment tools show similar AA’s in colors Needs a more systematic approach
![Page 31: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/31.jpg)
Physico-Chemical Properties Physico-chemical properties of AA
determine protein structures(1) Size in volume(2) Partial Vol.
Measure expanded volume in solution when dissolved
(3) Bulkiness The ratio of side chain volume to its length: average
cross-sectional area of the side chain(4) pH of isoelectric point of AA (pI)(5) Hydrophobicity(6) Polarity index(7) Surface area(8) Fraction of area
Fraction of the accessible surface area that is buried in the interior in a set of known crystal structures
![Page 32: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/32.jpg)
Vol. Bulk Pol. pI Hydro
Surf2
Frac
Alanine Ala A 67 11.5 0.0 6.0 1.8 113 0.74
Arginine Arg R 148 14.3 52.0 10.8 -4.5 241 0.64
Asparagine Asn N 96 12.3 3.4 5.4 -3.5 158 0.63
Aspartic Asp D 91 11.7 49.7 2.8 -3.5 151 0.62
Cysteine Cys C 86 13.5 1.5 5.1 2.5 140 0.91
Glutamine Gln Q 114 14.5 3.5 5.7 -3.5 189 0.62
Glu. Acid Glu E 109 13.6 49.9 3.2 -3.5 183 0.62
Glycine Gly G 48 3.4 0.0 6.0 -0.4 85 0.72
Histidine His H 118 13.7 51.6 7.6 -3.2 194 0.78
Isoleucine Ile I 124 21.4 0.1 6.0 4.5 182 0.88
Leucine Leu L 124 21.4 0.1 6.0 3.8 180 0.85
Lysine Lys K 135 13.7 49.5 9.7 -3.9 211 0.52
Methionine Met M 124 16.3 1.4 5.7 1.9 204 0.85
Phenyl. Phe F 135 10.8 0.4 5.5 2.9 218 0.88
Proline Prot P 90 17.4 1.6 6.3 -1.6 143 0.64
Serine Ser S 73 9.5 1.7 5.7 -0.8 122 0.66
Threonine Thr T 93 15.8 1.7 5.7 -0.7 146 0.70
Tryptophan Trp W 163 21.7 2.1 5.9 -0.9 259 0.85
Tyrosine Thr Y 141 18.0 1.6 5.7 -1.3 229 0.76
Valine Val V 105 21.6 0.1 6.0 4.2 160 0.86
Mean 109 15.4 13.6 6.0 -0.5 175 0.74
Red: acidicOrange: basicGreen: polar
(hydrophillic)
Yellow: non-polar
(hydrophobic
)
![Page 33: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/33.jpg)
PCA of AAs How to incorporate different properties
In order to group similar AA’sVisual clustering with Volume and pI
![Page 34: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/34.jpg)
PCA Given NxP matrix (e.g., 20x7),
Each row represents a p-dimensional data pointEach data point is
Scaled and shifted to the origin Rotated to spread out points as much as
possible
Scaling For property j, compute the average and the s.d.
μj = ∑i xij /N, σj2 = ∑i (xij - μj)2 /N
Since each property has a different scales and means, define normalized variables,
zij = (xij - μj) /σj
zij measures the deviation from the mean for each property with the mean of 0 and s.d. of 1
![Page 35: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/35.jpg)
PCA New orthogonal coordinate system
Find vj = (vj1, vj2 ,…, vjP ) such that
∑k vik vjk = 0 for i ≠ j (orthogonal) and ∑k vjk2 = 0 (unit length)
vj represents new coordinate vectorData points in z-coordinate becomes
yij = ∑k zjk vik
New y coordinate systems is a rotation of the z coordinate system
vjk turns out to be related to the correlation coefficient
![Page 36: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/36.jpg)
PCA Correlation coefficient, Cij
Cij = ∑k(zik - mi)(zjk - mj) /Psi sj (mi, si mean and s.d. of the i-th row)
-1 ≤ Cij ≤ 1
Results in NxN simiarlity matrix, Sij
Vol Bulk Polar pI Hyd SA FrA
Vol 1.00 9.73 0.24 0.37 -0.08 0.99 0.18
Bulk 1.00 -0.20 0.08 0.44 0.64 0.49
Polar 1.00 0.27 -0.69 0.29 -0.53
pI 1.00 -0.20 0.36 -0.18
Hyd 1.00 -0.18 0.84
SA 1.00 0.12
FrA 1.00
![Page 37: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/37.jpg)
![Page 38: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/38.jpg)
Clustering Family of related sequences evolved from a common
ancestor is studied with phylogenetic trees showing the order of evolution
Criteria neededCloseness between sequencesThe number of clusters
Hierarchical Clustering Algorithm – connectivity-based K-mean -- centroid
![Page 39: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/39.jpg)
Hierarchical Clustering Hierarchical Clustering Algorithm
Each point forms its own cluster, initiallyJoin two clusters with the highest similarity to form a single larger clusterRecompute similarities between all clusterRepeat two steps above until all points are connected to clusters
Criteria of similarities ?Use scaled coordinates z
Vector zi from origin to each data point i with length |zi|2 = ∑k zik
2
Use cosine angle between two points for similarity
cos θij = ∑k zikzjk / |zi||zj|
N elements, nxn distrance matrix d
![Page 40: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/40.jpg)
Hierarchical_Clustering (d, n) Form n clusters, each with 1 element Construct a graph T by assigning an isolated vertex to each cluster while there is more than 1 cluster Find the two closest clusters C1 and C2
Merge C1 and C2 into new cluster C with | C1 | + | C2| elements Compute distance from C to all other clusters Add a new vertex C to T Remove rows and columns of d for C1 and C2, and add for C return T
![Page 41: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/41.jpg)
![Page 42: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/42.jpg)
![Page 43: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/43.jpg)
k-mean Clustering The number of clusters, k, is known ahead of the time Minimize the squared errors between data points and k
cluster centers
No known polynomial algorithmHeuristics – Lloyd algorithm
initially partition n points arbitrarily to k centers, then move some points between clusters
Converge to a local minimum, may move many points in each iteration
k-means Clustering Problem Given n data points, find k center points minimizing the squared error distortion,
d(V, X) = ∑id(vi,X)2/n
input: A set V of n data points and a parameter k output: A set X consisting of k center points minimizing d(V,X)
over all possible choices of X
![Page 44: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/44.jpg)
k-mean Clustering Assume every possible partition of n elements to k
clusters And each partition has cost(P)
Move one point in each iteration
Progressive_Greedy_k-means(n) Select an arbitray partition P into k clusters while forever bestChange = 0 for every cluster C for every element i not in C if moving i to C reduces Cost(P) if Δ(i → C) > bestChange bestChange ← Δ(i → C) i* = i C* = C if bestChange >0 change partition P by moving i* to C* else return P
![Page 45: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/45.jpg)
![Page 46: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/46.jpg)
Dynamic Modeling in Chameleon Similarity between clusters is determined by
Relative interconnectivity (RI)Relative closeness (RC)
Select pairs with high RI and RC to merge
![Page 47: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/47.jpg)
Hierarchical Clustering - Cluto Generates a set of clusters within
clusters Algorithm can be arranged as a tree
Each node becomes where two smaller clusters join
CLUTO package with cosine and group-average rulesRed/green indicates values significantly higher/lower than the averageDark colors close to the average
![Page 48: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/48.jpg)
1. red on both pI and polarity scale
2. green on hydrophobicity and pI (can be separated into two smaller clusters)
3. green on volume and surface area
4. C is unusual in protein structure due to its potential to form disulfide bonds between pairs of cysteine residues (thus, difficult to interchange for other residues)
5. Hydrophobic6. Two largest AA’s
Clustering of properties: properties can be ordered illustrating groups of properties that are correlated
![Page 49: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/49.jpg)
6 clusters
cluster
Property AA
1 Basic K, R, H
2 Acid and amide
E, D, Q, N
3 Small P, T, S, G, A
4 Cysteine C
5 Hydrophobic V, L, I, M, F
6 Large, aromatic
W, Y
![Page 50: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/50.jpg)
In PAM matrix, considered probabilities of pairs of amino acids appearing together
Pairs of amino acids that tend to appear together are grouped into a cluster
six clusters (KRH) (EDQN) (PTSGA) (C) (VLIM) (FWY)
Contrast to clusters via hierarchical clustering(KRH) (EDQN) (PTSGA) (C) (VLIMF) (WY)
Dayhoff Clustering - 1978
![Page 51: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/51.jpg)
(KRH) (EDQN) (PTSGA) (C) (VLIM) (FWY)
![Page 52: Principal Component Analysis. 20 food products 16 European Countries Country Gr_coffee Inst_coffee Tea Sweetener Biscuit Pe_Soup Ti_soup In_Portat Fro_Fish.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bfab1a28abf838c9ae55/html5/thumbnails/52.jpg)
To study protein folding Used BLOSUM50 similarity matrix
Determine correlation coefficients between similarity matrix elements for all pairs of AA’s
e.g., CAV = (∑i MA,i MV,i )/[(∑i MA,i MA,i q)*(∑i MV,i MV,i )] with summation over i is taken for 20 AA’s
Group two AA’s with highest CC’s, and either add the next AA with the highest CC to a group or a new group
Murphy, Wallqvist, Levy, 2000