Interpreting MS\\MS Results

$: Interpreting MS\\MS Results$
Interpreting MS/MS Proteomics Results

Brian C. SearleProteome Software Inc. Portland, Oregon USA

[email protected]

NPC Progress Meeting(February 2nd, 2006)

The first thing I should say is that none of the material presented is

original research done at Proteome Software

but we do strive to make the tools presented here available in our software product Scaffold. With that

caveat aside…

Illustrated by Toni Boudreault

OrganizationThis is foremost an

introduction so we’re first going to talk

about

Then we’re going to talk about the motivations behind the development of

the first really useful bioinformatics technique in our field, SEQUEST.

This technique has been extended by two other tools

called X! Tandem and Mascot.

We’re also going to talk about how these programs differ

and how we can use that to our advantage by considering them simultaneously using probabilities.

Identify SEQUEST

X! Tandem/Mascot

Differ

Combine

how you go about identifying proteins with tandem mass spectrometry in the

first place

So, this is proteomics, so we’re going to use tandem mass spectrometry to identify proteins-- hopefully many of them, and hopefully very quickly.

A

A

I

K

G

K

I

D

VC

I

V

L

L

Q H KA

E PT

I

R

NT

DG

R

TA

Start with a protein

And to use this technique you

generally have to lyse the protein

into peptides about 8 to 20 amino acids in length and…

A

A

I

K

G

K

I

D

VC

I

V

L

L

Q H KA

E PT

I

R

NT

DG

R

TA

Cut with an enzyme

A

A

I

K

G

K

I

D

VC

I

V

L

L

Q H KA

E PT

I

R

NT

DG

R

TA

Select a peptide

Look at each peptide individually.

We select the peptide by mass using the first half of the tandem mass spectrometer

A E P T I R H2O

Impart energy in collision cell

The mass spectrometer imparts energy into the peptide causing it to fragment at the peptide bonds between amino acids.

M/z

Inte

nsity

A E P

A

A E

A E P T

72.0201.1

298.1399.2

Measure mass of daughter ionsThe masses of these fragment ions is recorded using the second mass spectrometer.

M/z

Inte

nsity

A E P T I R

B-type Ions

H2O

72.0 129.0 97.0 101.0 113.1 174.1

These ions are commonly called B ions, based on nomenclature you don’t really want to

know about…

But the mass difference between the peaks corresponds directly to the amino acid sequence.

M/z

Inte

nsity

A E P T I R

B-type Ions

H2O

72.0 129.0 97.0 101.0 113.1 174.1

A-0 AE-A AEP-AE

AEPT-AEP

AEPTI-AEPT

AEPTIR-AEPTI

For example, the A-E peak minus

the A peak should produce the mass

of E.

You can build these mass differences up and derive a sequence for the original peptide

This is pretty neat and it makes tandem mass spectrometry one of the best tools out there for sequencing novel peptides.

So, it seems pretty easy, doesn’t it?

But there are a couple confounding factors.

For example…

M/z

Inte

nsity

A E P T I R

B-type Ions

H2O

CO CO CO CO CO CO

B ions have a tendency to degrade and lose carbon monoxide producing…

M/z

A E P T I R

A-type Ions

H2O

CO CO CO CO CO CO

A ions.

Furthermore…

M/z

Inte

nsity

R I T P E A

Y-type Ions

H2O

… The second half are represented as Y ions that

sequence backwards.

And, unfortunately, this is the real world, so…

M/z

Inte

nsity

R I T P E A

Y-type Ions

H2O

… All the peaks have different measured heights and many peaks can often be missing.

M/z

Inte

nsity

R I T P E AH2O

B-type, A-type, Y-type Ions

All these peaks are seen together simultaneously

and we don’t

even know…

M/z

Inte

nsity

What type of ion they are, making the mass differences approach even more difficult.

Finally, as with all analytical techniques,

M/z

Inte

nsity

There’s noise,producing a final spectrum that looks like…

M/z

Inte

nsity

….This, on a good day. And so it’s actually fairly difficult to…

M/z

Inte

nsity

72.0 129.0 97.0 101.0 113.1 174.1

A E P T I R H2O

… compute the mass differences to sequence the peptide, certainly in a computer automated way.

So the community needed a new technique.

Now, it wasn’t all without hope…

Known Ion Types

B-type ions

A-type ions

Y-type ions

We knew a couple of things about peptide fragmentation.

Not only do we know to expect B, A, and Y ions,

but…

Known Ion Types

B-type ions

A-type ions

Y-type ions

B- or Y-type +2H ions

B- or Y-type -NH3 ions

B- or Y-type -H2O ions

… We also know a couple

of other variations on

those ions that come up.

We even know something

about the…

Known Ion Types

B-type ions

A-type ions

Y-type ions

B- or Y-type +2H ions

B- or Y-type -NH3 ions

B- or Y-type -H2O ions

• 100%• 20%• 100%

• 50%• 20%• 20%

… likelihood of seeing each type of ion,

where generally B and Y ions are most prominent.

If we know the amino acid

sequence of a peptide,

we can guess

what the spectra should look like!

So it’s actually pretty easy to guess what a spectrum

should look like

if we know what the peptide sequence is.

ELVISLIVESK

Model Spectrum

*Courtesy of Dr. Richard Johnsonhttp://www.hairyfatguy.com/

So as an example, consider the peptide

ELVIS LIVES K

that was synthesized by Rich Johnson in

Seattle

Model Spectrum

We can create a hypothetical spectrum based on our rules

B/Y type ions (100%)

A type ionsB/Y -NH3/-H2O

(20%)

B/Y +2H type ions(50%)

Where B and Y ions are estimated at 100%,

plus 2 ions are estimated at

50%, and other stragglers are at 20%.

Model Spectrum

So if we consider the spectrum that was derived from the ELVIS LIVES K peptide…

Model Spectrum

We can find where the overlap is between the hypothetical and the actual spectra…

Model Spectrum

And say conclusively based on the evidence that the spectrum does belong to the ELVIS LIVES K peptide.

But who cares?

The more important question is

“what about situations where we don’t know the sequence?”

We guess!

PepSeqAAAAAAAAAA

AAAAAAAAAC

AAAAAAAACC

AAAAAAACCC

ELVISLIVESK

WYYYYYYYYY

YYYYYYYYYY

……

J. Rozenski et al., Org. Mass Spectrom.,

29 (1994) 654-658.

build a hypothetical spectrum,

And so this was an approach followed by a program called PepSeq

which would guess every combination of amino acids possible

and find the best matching hypothetical.

PepSeq

• Impossibly hard after 7 or 8 amino acids!

• High false positive rate because you consider so many options

but it’s clearly impossibly hard with larger peptides

and there’s a lot of room to overfit the data.

This was a start,

PepSeq

• Impossibly hard after 7 or 8 amino acids!

• High false positive rate because you consider so many options

Another strategy is needed!

So obviously this isn’t going to work in the long run.

Sequencing Explosion

• 1977 Shotgun sequencing invented, bacteriophage fX174 sequenced.

• 1989 Yeast Genome project announced• 1990 Human Genome project announced• 1992 First chromosome (Yeast) sequenced• 1995 H. influenza sequenced • 1996 Yeast Genome sequenced • 2000 Human Genome draft

…

et cetra, et cetra

In 89 and 90 the Yeast and Human Genome projects were announced

We needed a new invention to come around

followed by the first chromosome

in 92

and that was shotgun Sanger-sequencing

• 1977 Shotgun sequencing invented, bacteriophage fX174 sequenced.

• 1989 Yeast Genome project announced• 1990 Human Genome project announced• 1992 First chromosome (Yeast) sequenced• 1995 H. influenza sequenced • 1996 Yeast Genome sequenced• 2000 Human Genome draft

Sequencing Explosion

…

Eng, J. K.; McCormack, A. L.; Yates, J. R. III J. Am. Soc. Mass Spectrom. 1994, 5, 976-989.

In 1994 Jimmy Eng and John Yates published a technique to

exploit genome sequencing

And the idea was …

for use in tandem mass

spectrometry.

SEQUEST.…instead of searching all possible peptide sequences,

search only those in genome databases.

Now, in the post- genomic world this seems like a pretty

trivial idea,

but back then there was a lot of assumption placed on

the idea

that we’d actually have a complete Human genome in

a reasonable amount of time.

SEQUEST2*1014 -- All possible 11mers

(ELVISLIVESK)

2*1010 -- All possible peptides in NR

1*108 -- All tryptic peptides in NR

4*106 -- All Human tryptic peptides in NRSo, In terms of 11amino

acid peptides

we’re talking about a 10 thousand fold difference between searching every

possible 11mer those in the current non-redundant protein

database from the NCBI

And a 100 million fold difference for searching human trypic peptides

So that was huge,

it made hypothetical spectrum matching feasible.

SEQUEST Model Spectrum

Instead of trying to make a better model,

Jimmy and John noted that there was a

discontinuity between the intensities of the

hypothetical spectrum and the actual spectrum.

SEQUEST made a couple of other interesting

improvements as well

they decided just to make the actual spectrum look

like the model with normalization…


For a scoring function they decided to use Cross-Correlation,

Like so. which basically sums the peaks that

overlap between hypothetical and the actual spectra


And then they shifted the spectra back and ….


They used this number, also called the Auto-Correlation, as their background.

… Forth so that the peaks shouldn’t align.

SEQUEST XCorr

Gentzel M. et al Proteomics 3 (2003) 1597-1610

Offset (AMU)

Cor

rela

tion

Sco

re

Cross Correlation(direct comparison)

Auto Correlation(background)

This is another representation of the Cross Correlation and the Auto Correlation.

SEQUEST XCorrCross Correlation

(direct comparison)

Auto Correlation(background)

CrossCorr

avg AutoCorr offset=-75 to 75 XCorr =Gentzel M. et al

Proteomics 3 (2003) 1597-1610

Offset (AMU)

Cor

rela

tion

Sco

re

The XCorr score is the Cross Correlation divided

by the average of the auto correlation over a

150 AMU range.

The XCorr is high if the direct comparison is significantly

greater than the background,

which is obviously good for peptide identification.

SEQUEST DeltaCn

XCorr1 XCorr 2

XCorr1and so far, there really

haven’t been any significant

improvements on it.The DeltaCn is another

score that scientists often use.

It measures how good the XCorr is relative to the

next best match.

And this XCorr is actually a pretty robust method for estimating how accurate

the match is,

As you can see, this is actually a pretty crude calculation.

Accuracy Score Relative Score

Strong(XCorr)

Weak(DeltaCn)

SE

QU

ES

T

Here’s another representation of that sentiment.

The XCorr is a strong measure of accuracy,

whereas the DeltaCn is a weak measure of relative goodness.

.


Alte

rnat

eM

etho

dStrong(XCorr)

Weak

Weak(DeltaCn)

Strong

SE

QU

ES

T

Obviously, there could be an alternative method that focuses more on the success of the relative score.

Mascot and X! Tandem fit that bill.

by-Score= Sum of intensities of peaks matchingB-type or Y-type ions

HyperScore=

X! Tandem Scoring

by-Score Ny! Nb!

Fenyo, D.; Beavis, R. C. Anal. Chem., 75 (2003) 768-774

Now the X! Tandem accuracy score is

rather crude. It only considers B and Y ions and

and attaches these factorial terms with an admittedly hand waving argument.

Distribution of “Incorrect” Hits

Hyper Score

# of

Mat

ches

Best HitSecond

Best

But instead of just considering the best match to the second best, it looks at the

distribution of lower scoring hits, assuming that they are all wrong.

This is somewhat based on ideas pioneered with the BLAST algorithm.

Here, every bar represents the number of matches at a given score.

The X! Tandem creators found that the distribution decays (or slopes down)

exponentially…

Estimate Likelihood (E-Value)

Best Hit

Hyper Score

Lo

g(#

of M

atch

es)

…and the log of the distribution is relatively linear because of the exponential decay.

Estimate Likelihood (E-Value)Hyper Score

Lo

g(#

of M

atch

es)

Expected NumberOf Random Matches

Best Hit

If the distribution represents the number of random

matches at any given score,

the linear fit should correspond to the expected number of random matches.

Estimate Likelihood (E-Value)L

og

(# o

f Mat

ches

)

Score of 60 has1/10 chanceof occurring

at random

Best Hit

This is called an E-Value, or Expected-Value.

And from this, you can calculate the likelihood that the best match is random.

In this case, a score of 60 corresponds with a log number of

matches being -1 which means the estimated number of random matches

for that score is 0.1

X! Tandem and Mascot

E-Value=Likelihood that match is incorrect relative to N guesses

Empirical(X! Tandem)

P-Value=Likelihood that match is incorrect (E~P·N)

Theoretical(Mascot)

Another search engine, Mascot, tries to get at the same kind of number using

theoretical calculations,

Now, X! Tandem calculates this E-Value empirically.

most likely based on the number of identified peaks and the likelihood of finding certain amino acids in the

genome database.

They’ve never explicitly published their algorithm, so we’ll never really know,

I just want to bring up a point that we’ll touch on a little

later…

but I suspect it’s something smart.

X! Tandem and Mascot

E-Value=Likelihood that match is incorrect relative to N guesses

Empirical(X! Tandem)

P-Value=Likelihood that match is incorrect (E~P·N)

Theoretical(Mascot)

Probability=Likelihood that match is correct

Note (Probability≠1-P)!

This is realistically not nearly as useful as

knowing

the probability that a peptide identification is right, which is NOT 1 minus

the P-Value.

…the E-Value that X! Tandem calculates

and the P-Value that Mascot calculates are

probabilistically based,but they can only estimate the

likelihood that the match is wrong.


X! T

ande

m

S

EQ

UE

ST

XCorr

HyperScore

DeltaCn

E-Value

Now, let’s go back and fill in the X! Tandem part of our accuracy/relativity scoring grid.


X! T

ande

m

S

EQ

UE

ST

XCorr

HyperScore

DeltaCn

E-Value

To reiterate, the XCorr is an excellent measure of accuracy…


X! T

ande

m

S

EQ

UE

ST

XCorr

HyperScore

DeltaCn

E-Value

If we assume that accuracy and relativity scores are independent measures of

goodness,could we use both the SEQUEST’s XCorr and

X! Tandem’s E-Value together?

…whereas the E-Value is an excellent measure of how good the best score is relative to the rest.

SEQUEST: Discriminant Score

X!

Tan

de

m: -

log

(E-V

alu

e)

10 Protein Control SampleAnd the answer is a resounding

yes.Each point on this

graph is a spectrum, where correct

identifications are marked in red, while

incorrect identifications are marked in blue.

Although in general the spectra SEQUEST scores well are spectra X!Tandem also scores well,

there is considerable scatter between the search engines.

We know what’s correct and incorrect

because this is a control sample.

Mascot: Ion-Identity Score

10 Protein Control Sample

X!

Tan

de

m: -

log

(E-V

alu

e)

One might wonder if X! Tandem and Mascot use similar scoring

approaches,

would they benefit as much,

Now, why are the scores so different?

but the answer is

surprisingly still yes!

Why So Different?• Sequest

– Considers relative intensities

• X! Tandem– Considers

semi-tryptic peptides

– Considers only B/Y-type Ions

• Mascot– Considers

theoretical

P-Value relative to search space

Well, here are a couple of possible reasons.

SEQUEST is the only method to consider relative intensities.







theoretical


X! Tandem is the only method to consider peptides outside the standard search space by default,

such as semi-tryptic peptides.

However, it’s the only score that considers only B and Y ions,

as opposed to a complete model.







theoretical


And Mascot is the only search engine to compute a completely theoretical P-Value


Consider Multiple Algorithms?

X!

Tan

de

m: -

log

(E-V

alu

e)

So we clearly want to consider multiple search engines

simultaneously,

but how?

How To Compare Search Engines?– SEQUEST: XCorr>2.5, DeltaCn>0.1– Mascot: Ion Score-Identity Score>0– X! Tandem:E-Value<0.01

You can’t use a thresholding system

because it’s impossible to find corresponding

thresholds.

For example, a SEQUEST match with an XCorr of 2.5

doesn’t mean the same thing

as an X! Tandem match with an E-Value of 0.01.

How To Compare Search Engines?

Need to convert scores to probabilities!

– SEQUEST: XCorr>2.5, DeltaCn>0.1– Mascot: Ion Score-Identity Score>0– X! Tandem:E-Value<0.01

The simplest way would be to convert the scores into probabilities and compare

those.

We advocate for Andrew Keller and Alexy Nesviskii’s Peptide Prophet approach

because it actually calculates a true probability, not just a p-value.

10 Protein Control Sample (Q-ToF)X! Tandem approach

Other IncorrectIDs for Spectrum

PossiblyCorrect?


# of

Mat

ches

So if you remember,

X! Tandem considers the best peptide

match for a spectrum against a

distribution of incorrect

matches

10 Protein Control Sample (Q-ToF)Peptide Prophet approach

ALL Other“Best” Matches

PossiblyCorrect?


# of

Mat

ches

Keller, A. et al Anal. Chem. 74, 5383-5392

Well, Peptide Prophet looks across the entire sample,

and not at just one spectrum at a time.

It compares the best match against all of

the other best matches in the

sample, which is clearly bimodal.


ALL Other“Best” Matches

PossiblyCorrect?


# of

Mat

ches

Keller, A. et al Anal. Chem. 74, 5383-5392

The low mode represents matches that are most likely wrong while the high mode represents matches that are probably right.


PossiblyCorrect?

“Correct”

“Incorrect”


# of

Mat

ches

Peptide Prophet curve fits two distributions to

the modes,

following the assumption that the low scoring

distribution is “Incorrect”

and that the higher scoring distribution is “correct”.

10 Protein Control Sample (Q-ToF)

“Incorrect” p( | D)

p(D | ) p()

p(D | ) p() p(D | ) p( )


# of

Mat

ches

PossiblyCorrect?

“Correct”

These two distributions can be analyzed using Bayesian statistics with

this formula.

Now that formula looks pretty complex,

but…


p( | D)

p(D | ) p()

p(D | ) p() p(D | ) p( )“Incorrect”


# of

Mat

ches

“Correct”

It just calculates the height of the correct distribution at a particular score, divided by the height of both distributions.


p( | D)

p(D | ) p()

p(D | ) p() p(D | ) p( )

prob of having scoreand being correct

prob of having score

“Correct”

“Incorrect”


This is essentially the probability of having that score and being correct

divided by the probability of just having that score


PossiblyCorrect?

“Correct”

“Incorrect”

# of

Mat

ches

This is a neat method because it actually considers the likelihood of being correct,

rather than X! Tandem and Mascot, which only calculate the probability of being incorrect.

It’s because of this that Peptide Prophet can get

produce a true probability,

which is important when the sample characteristics change.


PossiblyCorrect?

“Correct”

“Incorrect”

# of

Mat

ches Q-ToF:

For example, the control sample we’ve been looking at was derived from Q-

ToF data

which produces pretty high quality results

PossiblyCorrect?

“Correct”

“Incorrect”

# of

Mat

ches


PossiblyCorrect?

“Correct”

“Incorrect”

# of

Mat

ches Q-ToF:

Ion Trap:

If you compare that to the same sample on run on an Ion Trap,

the probability of being correct is greatly

diminished.

If you’ll note, the Incorrect distribution doesn’t change very much between the two

analyses, however, the likelihood that the

identification is right changes dramatically!

PossiblyCorrect?

“Correct”

“Incorrect”

# of

Mat

ches


Ion Trap:

As Peptide Prophet considers the correct distribution, it is immune to fluctuations between samples.

P-Values and E-Values don’t consider this information, so they can’t be compared across multiple samples, or different examinations of the same sample

hence the reason why we need to use Peptide

Prophet for comparing two different search engines



X!

Tan

de

m: -

log

(E-V

alu

e)

So going back to the scatter plot between X! Tandem and Mascot,

we can use Peptide Prophet to compute the score

threshold that represents a 95% cut-off…



X! Tandem: 2.6=95%

Mascot: -2.5=95%

X!

Tan

de

m: -

log

(E-V

alu

e)Like so.

This allows you to fairly consider the answers from both search engines simultaneously.

The important thing to note, is that if you looked at a different sample, these thresholds should change depending on the height of the correct distributions

Conclusion• All search engines

use different criteria, producing different scores

• Using multiple search engines simultaneously yields better results

• Peptide Prophet can normalize search engine results

So in conclusion,

all of the search engines look at different criteria





And we can leverage this to identify more peptides





And that Peptide Prophet is a great

mechanism for doing that

because it calculates true probabilities,

instead of p-values

The End

Interpreting MS\\MS Results

Documents

Transcript of Interpreting MS\\MS Results