Protein Folding - School of Computing and Information...
Transcript of Protein Folding - School of Computing and Information...
![Page 1: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/1.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 1
Protein Folding
UnfoldedRapid (< 1s)
Molten Globule StateSlow (1 – 1000 s)
Folded Native State
• How to find minimum energy configuration?
![Page 2: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/2.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 2
Modular Nature of Protein StructuresExample: Diphtheria Toxin
![Page 3: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/3.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 3
Structural Classification of Proteins
• SCOP (Structural Classification of Proteins)– Based on structurla & evolutionary
relationships.– Contains ~ 40,000 domains– Classes (groups of folds), Folds (proteins
sharing folds), Families (proteins related by function/evolution), Superfamilies (distantly related proteins)
![Page 4: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/4.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 4
SCOP Family View
![Page 5: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/5.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 5
CATH: Protein Structure Classification
• Semi-automatic classification; ~36K domains
• 4 levels of classification: – Class (C), depends on sec. Str. Content – Architecture (A), orientation of sec. Str.– Topolgy (T), topological connections & – Homologous Superfamily (H), similar str and
functions.
![Page 6: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/6.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 6
DALI Domain Dictionary
• Completely automated; 3724 domains• Criteria of compactness & recurrence• Each domain is assigned a Domain
Classification number DC_l_m_n_p representing fold space attractor region (l), globular folding topology (m), functional family (n) and sequence family (p).
![Page 7: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/7.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 7
5 Fold Space classes
Attractor 1 can be characterized as alpha/beta, attractor 2 as all-beta, attractor 3 as all-alpha, attractor 5 as alpha-beta meander (1mli), and attractor 4 contains antiparallel beta-barrels e.g. OB-fold (1prtF).
![Page 8: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/8.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 8
Fold Types & Neighbors
Structural neighbours of 1urnA (top left). 1mli (bottom right) has the same topology even though there are shifts in the relativeorientation of secondary structure elements.
![Page 9: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/9.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 9
Sequence Alignment of Fold Neighbors
![Page 10: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/10.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 10
Frequent FoldTypes
![Page 11: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/11.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 11
Gene Expression
• Process of transcription and/or translation of a gene is called gene expression.
• Every cell of an organism has the same genetic material, but different genes are expressed at different times.
• Patterns of gene expression in a cell is indicative of its state.
![Page 12: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/12.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 12
Hybridization
• If two complementary strands of DNA or mRNA are brought together, under appropriate experimental conditions they will hybridize.
• A hybridizes to B ⇒– A is reverse complementary to B, or – A is reverse complementary to a subsequence of B.
• It is possible to experimentally verify whether Ahybridizes to B, by labeling A or B with a radioactive or fluorescent tag, followed by excitation by laser.
![Page 13: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/13.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 13
Measuring gene expression
• Gene expression for a single gene can be measured by extracting mRNA from the cell and doing a simple hybridizationexperiment.
• Given a sample of cells, gene expression for every gene can be measured using a singlemicroarray experiment.
![Page 14: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/14.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 14
Microarray/DNA chip technology
• High-throughput method to study gene expression of thousands of genes simultaneously.
• Many applications:– Genetic disorders & Mutation/polymorphism detection– Study of disease subtypes– Drug discovery & toxicology studies– Pathogen analysis– Differing expressions over time, between tissues,
between drugs, across disease states
![Page 15: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/15.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 15
Microarray Data
Gene Expression Level
Gene1
Gene2
Gene3
…
![Page 16: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/16.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 16
![Page 17: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/17.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 17
Microarray/DNA chips (Simplified)• Construct probes corresponding to reverse
complements of genes of interest.• Microscopic quantities of probes placed on solid
surfaces at defined spots on the chip.• Extract mRNA from sample cells and label them.• Apply labeled sample (mRNA extracted from
cells) to every spot, and allow hybridization.• Wash off unhybridized material.• Use optical detector to measure amount of
fluorescence from each spot.
![Page 18: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/18.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 18
Affymetrix DNA chip schematic
www.affymetrix.com
![Page 19: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/19.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 19
DNA Chips & Images
![Page 20: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/20.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 20
![Page 21: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/21.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 21
Microarrays: competing technologies
• Affymetrix & Synteni/Stanford• Differ in:
– method to place DNA: Spotting vs. photolithography
– Length of probe– Complete sequence vs. series of fragments
![Page 22: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/22.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 22
How to compare 2 cell samples?
• mRNA from sample 1 is extracted and labeled with a red fluorescent dye.
• mRNA from sample 2 is extracted and labeled with a green fluorescent dye.
• Mix the samples and apply it to every spot on the microarray. Hybridize sample mixture to probes.
• Use optical detector to measure the amount of green and red fluorescence at each spot.
![Page 23: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/23.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 23
Studying effect of a treatment over time
Sample
Treated Sample(t1) Expt 1 Treated Sample(t2) Expt 2Treated Sample(t3) Expt 3…Treated Sample(tn) Expt n
![Page 24: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/24.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 24
Sources of Variations & Errors• Variations in cells/individuals.• Variations in mRNA extraction, isolation, introduction of
dye, variation in dye incorporation, dye interference.• Variations in probe concentration, probe amounts,
substrate surface characteristics• Variations in hybridization conditions and kinetics• Variations in optical measurements, spot misalignments,
discretization effects, noise due to scanner lens and laser irregularities
• Cross-hybridization of sequences with high sequence identity.
• Limit of factor 2 in precision of results.Need to Normalize data
![Page 25: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/25.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 25
Clustering
• Clustering is a general method to study patterns in gene expressions.
• Several known methods:– Hierarchical Clustering (Bottom-Up Approach)– K-means Clustering (Top-Down Approach)– Self-Organizing Maps (SOM)
![Page 26: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/26.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 26
Hierarchical Clustering: Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
![Page 27: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/27.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 27
A Dendrogram
![Page 28: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/28.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 28
Hierarchical Clustering [Johnson, SC, 1967]
• Given n points in Rd, compute the distance between every pair of points
• While (not done)– Pick closest pair of points si and sj and make
them part of the same cluster.– Replace the pair by an average of the two sij
Try the applet at:http://www.cs.mcgill.ca/~papou/#applet
![Page 29: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/29.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 29
Distance Metrics
• For clustering, define a distance function:– Euclidean distance metrics
– Pearson correlation coefficient
k=2: Euclidean Distancekd
i
kiik YXYXD
/1
1)(),( ⎥⎦
⎤⎢⎣
⎡−= ∑
=
⎟⎟⎠
⎞⎜⎜⎝
⎛ −⎟⎟⎠
⎞⎜⎜⎝
⎛ −= ∑
= y
i
x
id
ixy
YYXXd σσ
ρ1
1-1 ≤ ρxy ≥ 1
![Page 30: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/30.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 30
![Page 31: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/31.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 31
Clustering of gene expressions
• Represent each gene as a vector or a point in d-space where d is the number of arrays or experiments being analyzed.
![Page 32: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/32.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 32
From Eisen MB, et al, PNAS 1998 95(25):14863-8
Clustering Random vs. Biological Data
![Page 33: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/33.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 33
![Page 34: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/34.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 34
![Page 35: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/35.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 35
![Page 36: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/36.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 36
K-Means Clustering: Example
Example from Andrew Moore’s tutorial on Clustering.
![Page 37: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/37.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 37
Start
![Page 38: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/38.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 38
![Page 39: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/39.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 39
![Page 40: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/40.jpg)
Start
End
10/7/2003 CAP/CGS 5991: Lecture 7 40
![Page 41: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/41.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 41
K-Means Clustering [McQueen ’67]
Repeat– Start with randomly chosen cluster centers– Assign points to give greatest increase in
score– Recompute cluster centers– Reassign pointsuntil (no changes)
Try the applet at: http://www.cs.mcgill.ca/~bonnef/project.html
![Page 42: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/42.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 42
Comparisons
• Hierarchical clustering– Number of clusters not preset.– Complete hierarchy of clusters– Not very robust, not very efficient.
• K-Means– Need definition of a mean. Categorical data?– More efficient and often finds optimum
clustering.
![Page 43: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/43.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 43
Functionally related genes behave similarly across experiments
![Page 44: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/44.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 44
Self-Organizing Maps [Kohonen]
• Kind of neural network.• Clusters data and find complex relationships
between clusters.• Helps reduce the dimensionality of the data.• Map of 1 or 2 dimensions produced.• Unsupervised Clustering• Like K-Means, except for visualization
![Page 45: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/45.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 45
SOM Architectures
• 2-D Grid• 3-D Grid• Hexagonal Grid
![Page 46: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/46.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 46
SOM Algorithm
• Select SOM architecture, and initialize weight vectors and other parameters.
• While (stopping condition not satisfied) dofor each input point x– winning node q has weight vector closest to x.– Update weight vector of q and its neighbors.– Reduce neighborhood size and learning rate.
![Page 47: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/47.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 47
SOM Algorithm Details• Distance between x and weight vector:• Winning node: • Weight update function (for neighbors):
• Learning rate:
iwx −
)]()()[,,()()1( kwkxixkkwkw iii −+=+ µ
ii
wxxq −= min)(
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛ −−= 2
2)(
exp)(),,( 0
σηµ
xqi rrkixk
![Page 48: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/48.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 48
World Bank Statistics
• Data: World Bank statistics of countries in 1992.
• 39 indicators considered e.g., health, nutrition, educational services, etc.
• The complex joint effect of these factors can can be visualized by organizing the countries using the self-organizing map.
![Page 49: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/49.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 49
World Poverty PCA
![Page 50: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/50.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 50
World Poverty SOM
![Page 51: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/51.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 51
World Poverty Map
![Page 52: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/52.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 52
![Page 53: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/53.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 53
![Page 54: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/54.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 54
Viewing SOM Clusters on PCA axes
![Page 55: Protein Folding - School of Computing and Information Sciencesusers.cis.fiu.edu/~giri/teach/Bioinf/F03/Lectures/L8.pdf · World Bank Statistics • Data: World Bank statistics of](https://reader034.fdocuments.net/reader034/viewer/2022042118/5e9790607bbc2528081219d2/html5/thumbnails/55.jpg)
10/7/2003 CAP/CGS 5991: Lecture 7 55
1
SOM Example [Xiao-rui He]