Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO...

28
Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    257
  • download

    1

Transcript of Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO...

Page 1: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Practical Guide to Significantly Improve Peptide Identification

Sensitivity and Accuracy

Bin Ma, CTOBioinformatics Solutions Inc.

June 5, 2011.

Page 2: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

The Sensitivity and Accuracy Dilemma

score

false

true

FDR# reported false hits

# reported hits

Page 3: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Publication Guideline• Earlier experiments paid too much attention on sensitivity and

not enough on accuracy.• MCP started the guideline in 2004 to ensure accuracy.

Page 4: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

People are generally over-optimistic about how reliable their results are.– ABRF iPRG 2011.

1%

iPRG/ABRF 2011 Study

30 out of 45 submissions have FDR much higher than the required 1%

Estimated FDR lower bound

Estimated FDR upper bound

“ ”

Page 5: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

PEAKS Achieved both Sensitivity and Accuracy

1%

PEAKS PEAKS

More peptides in submission

Page 6: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Outline

1. FDR – pitfalls and solutions2. De novo sequencing assisted database search3. Three essential examinations to ensure result

quality.

Page 7: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

1. FDR – pitfalls and solutions

Page 8: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

FDR Estimation

Search Engine

𝐹𝐷𝑅=¿𝑑𝑒𝑐𝑜𝑦¿ 𝑡𝑎𝑟𝑔𝑒𝑡

target

decoy # decoy hits

Protein DB

Identified Peptides

# false target hits ≈

Page 9: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Pitfall 1 – Multiple Round Search

Round 1. Fast Search

Round 2. More Sensitive Search

FDR underestimation.

# decoy hits# false target hits ¿

more targets than decoys

Craig and Beavis 2004. Bioinformatics 20, 1466–67.

Bern and Kil 2011, J Proteome Res. 10, 2123-27.

Evertt et al. 2010. J Proteome Res. 9, 700-707.

Page 10: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Our Solution: Decoy Fusion

Fast Search

More Sensitive Search

Decoy sequence append to each target protein.

PEAKS DB paper. Submitted.

Equal targets and decoys

# decoy hits# false target hits ≈

Page 11: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Pitfall 2 – Mix Protein and Peptide ID

Idea: Peptides on a multi-hit protein get a bonus on their scores to increase sensitivity.

Pitfall

More multi-hit proteins from target DB more false hits are “saved” from target DBFDR underestimation.

A weak hit is “saved” due to the bonus.

So is this weak false hit.

decoy hit

target false hit

Page 12: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Our Solution: Decoy Fusion

Weak false hits are “saved” with approx. equal probabilities in target and decoy.

Get the sensitivity, but still estimate the FDR correctly.

Page 13: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Pitfall 3 – Machine Learning with Decoy

Idea: Re-train the coefficients of scoring function for every search after knowing the decoy hits.Pitfall: Risk of over-fit. Machine learning experts only.

Adjust scoring function to remove decoy hits after search.

Fewer target false hits are removedFDR underestimation

Search

target false hits

decoy hits

Page 14: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Solutions

1. Don’t use it. Judges cannot be players.

2. Only use for very large dataset.3. Train coefficients and reuse; don’t re-train

for every search.

oror

Page 15: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

PEAKS 5.3

• PEAKS DB used all these techniques (and many more) to ensure the accuracy while maximizing sensitivity.

• Reliable FDR estimation is the top priority in PEAKS DB design.

Page 16: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

2. De novo sequencing assisted database search

Page 17: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

An Idea to Improve Score Function

score

false

true

Idea: If de novo matches a DB peptide, it is likely to be correct.

Page 18: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

De Novo Assisted DB Search# matched amino acidsbetween de novo & DB search

x+4ybest separation line

DB Search Score

Page 19: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

score

false

true

Including de novo matching as a feature gives the score function a better discriminative power.

before after

This is just one example of many other new features in PEAKS 5.3 for improving score function.

Page 20: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

… far better than what I could ever squeeze out of my data – Stefano Gotta, Siena Biotech

0 500 1000 1500 2000 2500 3000 3500 40000.0%

0.5%

1.0%

1.5%

2.0%

2.5%

# of PSM

FDR

product M PEAKS DB

“ ”

Page 21: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

DB search

Found?

Yes

No

De Novo

All Spectra

DB peptides De novo only

PEAKS DB Workflow

De novo both helps to improve DB search, and reports novel peptides.

Page 22: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

3. Three essential examinations to ensure result quality.

Page 23: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Don’t Trust Software Blindly!• Google “Don’t trust software blindly” returned

5,140,000 results.• As you quality control your experiments,

quality control the software’s results too.

Page 24: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Essential Examination 1

#decoy #targetin low score region

Low #decoy in high score region

Page 25: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Essential Examination 2

High scoring peptidesshould have low precursor error.

Precursor error start to scatterbelow threshold

Page 26: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Essential Examination 3

• Spectrum annotation around score threshold.

Page 27: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

Take Home Message

• Another year of dedicated work on PEAKS.• Ensured accuracy; maximized sensitivity.• Do the three essential examinations.– They are simple … at least in PEAKS.

Page 28: Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy Bin Ma, CTO Bioinformatics Solutions Inc. June 5, 2011.

“a big step forward” – Christian Schmelzer, Martin Luther University

Enjoy!

http://www.bioinfor.com/peaks-download-a-pricing