Presenter : Yi-Ming Wei Adviser : Chuen-Fa Ni Date : 2010/10/14
Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.
-
Upload
loraine-lawrence -
Category
Documents
-
view
213 -
download
0
Transcript of Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.
![Page 1: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/1.jpg)
Study of Protein Prediction Related
Problems
Ph.D. candidate
2013.10.16
Le-Yi WEI
1
![Page 2: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/2.jpg)
1
2
3
Background
Methods
Experiments
Contents
2
![Page 3: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/3.jpg)
Background
3
![Page 4: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/4.jpg)
>Example PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTTADELKKSADVRWHAERIINAVDDAVASMDDTEKMSMKLRNLSGKHAKSFQVDPEYFKVLAAVIADTVAAGDAGFEKLMSMI
4
Definition of protein
20 different amino acids
… …
A C D … … V W Y
![Page 5: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/5.jpg)
Protein prediction related problems
5
Protein Protein structural class prediction
Protein foldprediction
Multi-functional enzyme predictionProtein remote
homology detection
Other protein-related problems, etc.
Protein subcellular localization prediction
![Page 6: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/6.jpg)
6
Common points
Treat the protein-related problems as classification tasks
Query protein sequence
Data presentation
Classificationalgorithms
Predictedresults
The framework of a classification task
Two major components
![Page 7: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/7.jpg)
Methods
7
![Page 8: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/8.jpg)
Feature extraction methods
8
Primary sequence based
Secondary structure based
Sequence-structure based
e.g. Physicochemical features, N-gram, Functional Domain, PSSM-profile (auto-covariance), etc.
e.g. Secondary sequence based, and probability matrix based
e.g. Triple-sequence-structure features
![Page 9: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/9.jpg)
Primary-sequence based
9
• n-gram model
Given a query protein sequence:
Compute
Obtain
![Page 10: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/10.jpg)
10
A query protein sequence
… …
…
Database sequence 1
Database sequence 2
Database sequence 3
Database sequence n-2
Database sequence n-1
Database sequence n
… …
…
0
1
0
1
0
0
PSI-BLAST
Functional protein database
Featurevector
Primary-sequence based
• Functional domain
… …
…
![Page 11: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/11.jpg)
11
Position-Specific Score Matrix (PSSM)
Protein database
PSI-BLAST
Primary-sequence based
• Evolution information
![Page 12: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/12.jpg)
1220-D features
Primary-sequence based
• AAC features
Compute
Obtain
![Page 13: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/13.jpg)
1320*g-D features
Primary-sequence based
• Auto-covariance (AC) transformation
Compute
Obtain
![Page 14: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/14.jpg)
14
Primary-sequence based
PSSM profile Frequency profile
• Consensus sequence
Consensus sequence:
A query sequence:
![Page 15: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/15.jpg)
15
Secondary structure based
• Secondary structure sequence
SLFEQLGGQAAVQAVTAQFYANIQADA example of a query protein sequence :
CCHEHEEEEECCCCHHHHHHEEEEECC
Predicted secondary structure sequence , which has three
states:
PSI-PRED
C (coil), H (Helix), E (strand)
![Page 16: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/16.jpg)
16
Secondary structure based
• Structure state confidence matrix
A example of a structure state confidence matrix:
A query protein sequencePredicted structure sequence
Predicted confidence
![Page 17: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/17.jpg)
17
Secondary structure based
• Global structural features
Compute Obtain
Structure state confidence matrix:
![Page 18: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/18.jpg)
18
Secondary structure based
• Local structural features
Compute Obtain
Structure state confidence matrix:
![Page 19: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/19.jpg)
19
Sequence-structure based
The framework of triple sequence-structure feature extraction method
![Page 20: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/20.jpg)
20
Classification algorithms
Commonly used classification algorithms
e.g. Support Vector Machine (SVM), Random Forest (RF), SMO, Naive Bayes, etc.
Ensemble classification algorithms
e.g. Majority Vote, Average Probability, Selective Ensemble, etc.
![Page 21: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/21.jpg)
Experiments
21
![Page 22: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/22.jpg)
22
The framework of RF_PSCP
Webserver site : http://59.77.16.70:8080/RF_PSCP/Index.html
![Page 23: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/23.jpg)
23
Datasets
Three benchmark datasets
Three updated large-scale datasets
Sequence similarity
• Protein structural class prediction
![Page 24: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/24.jpg)
24
Results
Comparison with existing methods on three benchmark datasets
![Page 25: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/25.jpg)
25
Results
Tests of the proposed method on three updated large-scale datasets
![Page 26: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/26.jpg)
26
Results
Comparison with different combinations of feature subsets on three benchmark datasets
![Page 27: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/27.jpg)
27
Results
Optimization of Random forest classifier
![Page 28: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/28.jpg)
28
![Page 29: Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.](https://reader035.fdocuments.net/reader035/viewer/2022070403/56649f2f5503460f94c49623/html5/thumbnails/29.jpg)
Q&A!
29