Protein Prediction II Exercise - ROSTLAB.ORG fileProtein Prediction II Exercise . L. Richter Task...

L. Richter

Protein Prediction II Exercise

L. Richter

Task Schedule 17.10 24.10 31.10 7.11

Orga/Group formation Accounts for i12r-biolab-machines

Get familiar with HPO database and test set sequences, data preparation

Find most similar sequences (with n different methods: Blast, HHBlits) write scripts with parameter n

Extract HPO paths for each sequence, design data structures to store and manipulate paths and trees

14.11 21.11 28.11 5.12

Merge paths into trees Merge trees from different neighbors

Implement performance measures

Complete parameter estimation, make final predictions

10.12 Tuesday 19.12 9.1 16.1

Midterm review / Handover of predictions for CAFA submission

Evaluation of improvements Integrate on meta-server platform

Server integration

23.1. 30.1. 4.2

Final Presentation

Hints for Publication Writing Exam -----

8

L. Richter

Performance Measurement Nov 28th • Use the same performance measures as described in

Radivojac et al. in Nature Methods 10(3), March 2013 pp221-230; doi:10.1038/nmeth.2340

• Read also Hamp et al. in BMC Bioinformatics 2013,14(Suppl 3):S7. http://www.biomedcentral.com/1471-2105/14/S3/S7

•  implement the measures for precision and recall •  try to run with threshold steps of 0.1 and construct a

precision / recall curve •  get a sufficient number of data points for the curve •  e.g. chose n high (10/20), use a fine grained score for the

terms •  iterate over different values for the different parameters

24

L. Richter

Performance Measurement • Precision:

• Recall:

•  t: threshold, i.e. probability of being true, 0 ≤ t ≤ 1 •  f: functional term from an ontology •  i: a given target protein • Ti: is a set of experimentally determined terms • Pi(t): is a set of predicted terms for protein i with a score

greater than or equal to t •  I(): is the standard indicator function

25

pri (t) =I( f ! Pi(t )" f ! Ti )f#

I( f ! Pi (t))f#

rci (t) =I( f ! Pi(t )" f ! Ti )f#

I( f ! Ti )f#

L. Richter

26

Item Set and Association Rule Weights Classification Regression

Complex Measures – Performance Curves

Recall Precision Curves

Taken from http://scikit-learn.github.io/scikit-learn.org/

preferred in informationretrievalpositives are thedocuments retrieved inresponse to a querytrue positives aredocuments really relevantto the queryy -axis: precisionx-axis: recall

Richter, Cejuela Technische Universität München

MiniTalk3

L. Richter

Performance Measurement • Combine both numbers into one:

• Fmax=

•  optional: Do the term-centric metrics

27

maxt

2 ! pr(t) ! rc(t)pr(t)+ rc(t)

"#$

%&'

snf (t) =I( f ! Pi(t )" f ! Ti )i#

I( f ! Ti )i#

spf (t) =I( f ! Pi(t )" f ! Ti )i#

I( f ! Ti )i#

Protein Prediction II Exercise - ROSTLAB.ORG fileProtein Prediction II Exercise . L. Richter Task...

Documents

Transcript of Protein Prediction II Exercise - ROSTLAB.ORG fileProtein Prediction II Exercise . L. Richter Task...