Introduction to Bioinformatics Introduction to Bioinformatics -
CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II)...
-
date post
15-Jan-2016 -
Category
Documents
-
view
224 -
download
0
Transcript of CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II)...
![Page 1: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/1.jpg)
CISC667, F05, Lec23, Liao 1
CISC 667 Intro to Bioinformatics(Fall 2005)
Support Vector Machines (II)
Bioinformatics Applications
![Page 2: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/2.jpg)
CISC667, F05, Lec23, Liao 2
![Page 3: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/3.jpg)
CISC667, F05, Lec23, Liao 3
![Page 4: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/4.jpg)
CISC667, F05, Lec23, Liao 4
![Page 5: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/5.jpg)
CISC667, F05, Lec23, Liao 5
Combining pairwise similarity with SVMs for protein homology detection
Protein homologs
Protein non-homologs
Positivepairwise score
vectors
Negativepairwise score
vectors
Support vector machine
Binary classification
Target protein of unknown function
1
23
Positive train Negative train
Testing data
![Page 6: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/6.jpg)
CISC667, F05, Lec23, Liao 6
Experiment: known protein families
Jaakkola, Diekhans and Haussler 1999
![Page 7: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/7.jpg)
CISC667, F05, Lec23, Liao 7
Vectorization
![Page 8: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/8.jpg)
CISC667, F05, Lec23, Liao 8
![Page 9: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/9.jpg)
CISC667, F05, Lec23, Liao 9
A measure of sensitivity and specificity
ROC = 1
ROC = 0
ROC = 0.67
6
5
ROC: receiver operating characteristic score is the normalized area
under a curve the plots true positives as a function of false positives
![Page 10: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/10.jpg)
CISC667, F05, Lec23, Liao 10
Performance Comparison (1)
![Page 11: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/11.jpg)
CISC667, F05, Lec23, Liao 11
![Page 12: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/12.jpg)
CISC667, F05, Lec23, Liao 12
Using Phylogenetic Profiles & SVMs YAL001C
E-value Phylogenetic profile
0.122 1
1.064 0
3.589 0
0.008 1
0.692 1
8.49 0
14.79 0
0.584 1
1.567 0
0.324 1
0.002 1
3.456 0
2.135 0
0.142 1
0.001 1
0.112 1
1.274 0
0.234 1
4.562 0
3.934 0
0.489 1
0.002 1
2.421 0
0.112 1
![Page 13: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/13.jpg)
CISC667, F05, Lec23, Liao 13
phylogenetic profiles and Evolution Patterns1
1
1 1
10
0
1 1 0 1 0 0 0 1 1 0x
Impossible to know for sure if the gene followed exactly this
evolution pattern
![Page 14: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/14.jpg)
CISC667, F05, Lec23, Liao 14
Tree Kernel (Vert, 2002) For a phylogenetic profile x and an evolution pattern e:• P(e) quantifies how “natural” the pattern is
• P(x|e) quantifies how likely the pattern e is the “true history” of the profile x
Tree Kernel :
K tree(x,y) = Σe p(e)p(x|e)p(y|e) Can be proved to be a kernel Intuition: two profiles get closer in the feature space when
they have shared common evolution patterns with high probability.
![Page 15: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/15.jpg)
CISC667, F05, Lec23, Liao 15
1 1 0 1 0 0 0 1 1
10.33
0.67
0.34
0.5
0.75
0.55
1 0.33 0.67 0.34 0.5 0.75 0.55
Post-order traversal
Tree-Encoded Profile (Narra & Liao, 2004)
![Page 16: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/16.jpg)
CISC667, F05, Lec23, Liao 16
![Page 17: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/17.jpg)
CISC667, F05, Lec23, Liao 17
Using Support Vector Machines
![Page 18: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/18.jpg)
CISC667, F05, Lec23, Liao 18
Kernel function:
where r = 0.10
Soft margin regularization C = 1.50
Coding scheme: BIN21
L() = i ½ i j yi yj (K(xi · xj) + ij /C)
Evaluation:
Q3 = (P1+P2+P3)/N
C = (TPTN - FP FN) / ( PP PN AP AN)
SOV: segment overlap accuracy
![Page 19: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/19.jpg)
CISC667, F05, Lec23, Liao 19
Design tertiary classifiers
![Page 20: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/20.jpg)
CISC667, F05, Lec23, Liao 20
![Page 21: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/21.jpg)
CISC667, F05, Lec23, Liao 21
Nguyen & Rajapakse, Genome Informatics 14: 218-227 (2003)
![Page 22: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/22.jpg)
CISC667, F05, Lec23, Liao 22
A two-stage SVM
![Page 23: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.](https://reader036.fdocuments.net/reader036/viewer/2022062322/56649d585503460f94a37f1c/html5/thumbnails/23.jpg)
CISC667, F05, Lec23, Liao 23