Protein Prediction II Exercise - ROSTLAB.ORG fileProtein Prediction II Exercise . L. Richter Task...
Transcript of Protein Prediction II Exercise - ROSTLAB.ORG fileProtein Prediction II Exercise . L. Richter Task...
L. Richter
Task Schedule 17.10 24.10 31.10 7.11
Orga/Group formation Accounts for i12r-biolab-machines
Get familiar with HPO database and test set sequences, data preparation
Find most similar sequences (with n different methods: Blast, HHBlits) write scripts with parameter n
Extract HPO paths for each sequence, design data structures to store and manipulate paths and trees
14.11 21.11 28.11 5.12
Merge paths into trees Merge trees from different neighbors
Implement performance measures
Complete parameter estimation, make final predictions
10.12 Tuesday 19.12 9.1 16.1
Midterm review / Handover of predictions for CAFA submission
Evaluation of improvements Integrate on meta-server platform
Server integration
23.1. 30.1. 4.2
Final Presentation
Hints for Publication Writing Exam -----
8
L. Richter
Performance Measurement Nov 28th • Use the same performance measures as described in
Radivojac et al. in Nature Methods 10(3), March 2013 pp221-230; doi:10.1038/nmeth.2340
• Read also Hamp et al. in BMC Bioinformatics 2013,14(Suppl 3):S7. http://www.biomedcentral.com/1471-2105/14/S3/S7
• implement the measures for precision and recall • try to run with threshold steps of 0.1 and construct a
precision / recall curve • get a sufficient number of data points for the curve • e.g. chose n high (10/20), use a fine grained score for the
terms • iterate over different values for the different parameters
24
L. Richter
Performance Measurement • Precision:
• Recall:
• t: threshold, i.e. probability of being true, 0 ≤ t ≤ 1 • f: functional term from an ontology • i: a given target protein • Ti: is a set of experimentally determined terms • Pi(t): is a set of predicted terms for protein i with a score
greater than or equal to t • I(): is the standard indicator function
25
pri (t) =I( f ! Pi(t )" f ! Ti )f#
I( f ! Pi (t))f#
rci (t) =I( f ! Pi(t )" f ! Ti )f#
I( f ! Ti )f#
L. Richter
26
Item Set and Association Rule Weights Classification Regression
Complex Measures – Performance Curves
Recall Precision Curves
Taken from http://scikit-learn.github.io/scikit-learn.org/
preferred in informationretrievalpositives are thedocuments retrieved inresponse to a querytrue positives aredocuments really relevantto the queryy -axis: precisionx-axis: recall
Richter, Cejuela Technische Universität München
MiniTalk3