Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

56
Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park

description

Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry. Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park. Synopsis. - PowerPoint PPT Presentation

Transcript of Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

Page 1: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

Direct Experimental Observation

of Functional Protein Isoforms

by Tandem Mass Spectrometry

Direct Experimental Observation

of Functional Protein Isoforms

by Tandem Mass Spectrometry

Nathan EdwardsCenter for Bioinformatics and Computational BiologyUniversity of Maryland, College Park

Page 2: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

2

Synopsis

• MS/MS spectra provide evidence for the amino-acid sequence of functional proteins.

• Key concepts:• Spectrum acquisition is unbiased• Direct observation of amino-acid sequence• Sensitive to small sequence variations

Page 3: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

3

Synopsis

• MS/MS spectra provide evidence for the amino-acid sequence of functional proteins.

• Applications:• Cancer biomarkers• Genome annotation

Page 4: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

4

Mass Spectrometry for Proteomics

• Measure mass of many (bio)molecules simultaneously• High bandwidth

• Mass is an intrinsic property of all (bio)molecules• No prior knowledge required

Page 5: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

5

Mass Spectrometer

Ionizer

Sample

+_

Mass Analyzer Detector

• MALDI• Electro-Spray

Ionization (ESI)

• Time-Of-Flight (TOF)• Quadrapole• Ion-Trap

• ElectronMultiplier(EM)

Page 6: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

6

High Bandwidth

100

0250 500 750 1000

m/z

% I

nte

nsit

y

Page 7: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

7

Mass is fundamental!

Page 8: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

8

Mass Spectrometry for Proteomics

• Measure mass of many molecules simultaneously• ...but not too many, abundance bias

• Mass is an intrinsic property of all (bio)molecules• ...but need a reference to compare to

Page 9: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

9

Mass Spectrometry for Proteomics

• Mass spectrometry has been around since the turn of the century...• ...why is MS based Proteomics so new?

• Ionization methods• MALDI, Electrospray

• Protein chemistry & automation• Chromatography, Gels, Computers

• Protein / genome sequences• A reference for comparison

Page 10: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

10

Sample Preparation for Peptide Identification

Enzymatic Digestand

Fractionation

Page 11: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

11

Single Stage MS

MS

m/z

Page 12: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

12

Tandem Mass Spectrometry(MS/MS)

Precursor selection

m/z

m/z

Page 13: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

13

Tandem Mass Spectrometry(MS/MS)

Precursor selection + collision induced dissociation

(CID)

MS/MS

m/z

m/z

Page 14: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

14

Peptide Identification

• For each (likely) peptide sequence1. Compute fragment masses2. Compare with spectrum3. Retain those that match well

• Peptide sequences from (any) sequence database• Swiss-Prot, IPI, NCBI’s nr, ESTs, genomes, ...

• Automated, high-throughput peptide identification in complex mixtures

Page 15: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

15

Peptide Identification

...can provide direct experimental evidence for the amino-acid sequence of functional proteins.

Evidence for:• Functional protein isoforms• Translation start and frame• Proteins with short open-reading-frames

Page 16: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

16

Why is this useful for ...... genome annotation?

• Evidence for SNPs and alternative splicing stops with transcription

• No genomic or transcript evidence for translation start-site.

• Conservation doesn’t stop at coding bases!

• Statistical gene-finders struggle with micro-exons, translation start-site, and short ORFs.

Page 17: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

17

Why is this useful for ...... cancer biomarkers?

• Alternative splicing is the norm!• Only 20-25K human genes• Each gene makes many proteins• Some splicing is believed to be silencing• Lots of splicing in cancer

• Proteins have clinical implications• Statistical biomarker discovery• Putative malfunctioning proteins

Page 18: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

18

What can be observed?

• Known coding SNPs

• Novel coding mutations

• Alternative splicing isoforms

• Microexons ( non-cannonical splice-sites )

• Alternative translation start-sites ( codons )

• Alternative translation frames

• “Dark” open-reading-frames

Page 19: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

19

Splice Isoform

• Human Jurkat leukemia cell-line• Lipid-raft extraction protocol, targeting T cells• von Haller, et al. MCP 2003.

• LIME1 gene:• LCK interacting transmembrane adaptor 1

• LCK gene:• Leukocyte-specific protein tyrosine kinase• Proto-oncogene• Chromosomal aberration involving LCK in leukemias.

• Multiple significant peptide identifications

Page 22: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

22

Novel Mutation

• HUPO Plasma Proteome Project• Pooled samples from 10 male & 10 female

healthy Chinese subjects• Plasma/EDTA sample protocol• Li, et al. Proteomics 2005. (Lab 29)

• TTR gene• Transthyretin (pre-albumin) • Defects in TTR are a cause of amyloidosis.• Familial amyloidotic polyneuropathy

• late-onset, dominant inheritance

Page 25: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

25

Translation Start-Site

• Human erythroleukemia K562 cell-line• Depth of coverage study• Resing et al. Anal. Chem. 2004.

• THOC2 gene:• Part of the heteromultimeric THO/TREX complex.

• Initially believed to be a “novel” ORF• RefSeq mRNA in Jun 2007, no RefSeq protein• TrEMBL entry Feb 2005, no SwissProt entry• Genbank mRNA in May 2002 (complete CDS)• Plenty of EST support• ~ 100,000 bases upstream of other isoforms

Page 29: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

29

Translation Start-Site

Page 30: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

30

Easily distinguish minor sequence variations

Two B. anthracis Sterne α/β SASP annotations

• RefSeq/Gb: MVMARN... (7441 Da)• CMR: MARN... (7211 Da)

• Intact proteins differ by 230 Da• 7441 Da vs 7211 Da

• N-terminal tryptic peptides:• MVMAR (606.3 Da), MVMARNR (876.4 Da), vs• MARNR (646.3 Da)• Very different MS/MS spectra

Page 31: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

31

Bacterial Gene-Finding

…TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA…

Stopcodon

Stopcodon

• Find all the open-reading-frames...

...courtesy of Art Delcher

Page 32: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

32

Bacterial Gene-Finding

…TAGAAAAATGGCTCTTTAGATAAATTTCATGAAAAATATTGA…

Stopcodon

Stopcodon

…ATCTTTTTACCGAGAAATCTATTTAAAGTACTTTTTATAACT…

ShiftedStop

Stopcodon

Reversestrand

• Find all the open-reading-frames...

...but they overlap – which ones are correct?

...courtesy of Art Delcher

Page 33: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

33

Coding-Sequence “Score”

...courtesy of Art Delcher

Page 34: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

34

Glimmer3 Performance

Organism Length GC% # Genes ExtraArchaeoglobus fulgidus 2.18Mb 48.6 1165 1162 99.70% 875 75.10% 1305Bacillus anthracis 5.23Mb 35.4 3132 3129 99.9% 2768 88.4% 2340Bacillus subtilis 4.21Mb 43.5 1576 1567 99.4% 1429 90.7% 2879Campylobacter jejuni 1.78Mb 30.3 1233 1233 100.0% 1149 93.2% 668Carboxydothermus hydrogenoformans 2.40Mb 42.0 1753 1752 99.9% 1590 90.7% 865Caulobacter crescentus 4.02Mb 67.2 2192 2187 99.8% 1552 70.8% 1559Chlorobium tepidum 2.15Mb 56.5 1292 1289 99.8% 949 73.5% 765Clostridium perfringens 3.03Mb 28.6 1504 1503 99.9% 1385 92.1% 1178Colwellia psychrerythraea 5.37Mb 38.0 3063 3060 99.9% 2663 86.9% 1714Dehalococcoides ethenogenes 1.47Mb 48.9 1069 1059 99.1% 929 86.9% 483Escherichia coli 4.64Mb 50.8 3603 3553 98.6% 3150 87.4% 913Geobacter sulfurreducens 3.81Mb 60.9 2351 2340 99.5% 1974 84.0% 1091Haemophilus influenzae 1.83Mb 38.1 1170 1170 100.0% 1054 90.1% 639Helicobacter pylori 1.67Mb 38.9 915 914 99.9% 805 88.0% 765Listeria monocytogenes 2.91Mb 38.0 1966 1965 99.9% 1797 91.4% 845Methylococcus capsulatus 3.30Mb 63.6 2015 2005 99.5% 1542 76.5% 1231Mycobacterium tuberculosis 4.40Mb 65.6 2217 2205 99.5% 1493 67.3% 2104Neisseria meningitidis 2.27Mb 51.5 1232 1217 98.8% 1042 84.6% 1329Porphyromonas gingivalis 2.34Mb 48.3 1200 1198 99.8% 933 77.8% 887Pseudomonas fluorescens 7.07Mb 63.3 4535 4503 99.3% 3577 78.9% 1871Pseudomonas putida 6.18Mb 61.5 3633 3596 99.0% 2825 77.8% 1916Ralstonia solanacearum 3.72Mb 67.0 2512 2487 99.0% 2061 82.0% 1077Staphylococcus epidermidis 2.62Mb 32.1 1650 1649 99.9% 1511 91.6% 771Streptococcus agalactiae 2.16Mb 35.6 1441 1438 99.8% 1336 92.7% 683Streptococcus pneumoniae 2.16Mb 39.7 1359 1355 99.7% 1214 89.3% 780Thermotoga maritima 1.86Mb 46.2 1092 1090 99.8% 892 81.7% 804Treponema denticola 2.84Mb 37.9 1463 1463 100.0% 1332 91.0% 1210Treponema pallidum 1.14Mb 52.8 575 572 99.5% 425 73.9% 557Ureaplasma parvum 0.75Mb 25.5 327 327 100.0% 300 91.7% 293Wolbachia endosymbiont 1.08Mb 34.2 628 627 99.8% 528 84.1% 537

99.6% 84.3%Averages: 

Genome Glimmer3 PredictionsMatches Correct Starts

• Glimmer3 trained & compared to RefSeq genes with annotated function

• Correct STOP:• 99.6%

• Correct START:• 84.3%

• “Not all the genomes necessarily have carefully/accurately annotated start sites, so the results for number of correct starts may be suspect.”

Page 35: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

35

N-terminal peptides

• (Protein) N-terminal peptides establish• start-site of known & unexpected ORFs

Use:• Directly to annotate genomes• Evaluate and improve algorithms• Map cross-species

Page 36: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

36

N-terminal peptide workflows

• Typical proteomics workflows sample peptides from the proteome “randomly”

• Caulobacter crescentus (70%)• 3733 Proteins (RefSeq Genome annot.)• 66K tryptic peptides (600 Da to 3000 Da)• 2085 N-terminal tryptic peptides (3%)

Page 37: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

37

N-terminal peptide workflow

• Protect protein N-terminus

• Digest to peptides• Chemically modify

free peptide N-term• Use chem. mod. to

capture unwanted peptides

Nat Biotech, Vol. 21, pp. 566-569, 2003.

Page 38: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

38

Increasing N-terminal peptide coverage

• Multiple (digest) enzymes:• trypsin-R:

60% (80%)• acid + lys-C + trypsin:

85% (94%)• Repeated LC-MS/MS• Precursor Exclusion /

Inclusion lists• MALDI / ESI• Protein separation

and/or orthogonal fractionation Anal Chem, Vol. 76, pp. 4193-4201, 2004.

Page 39: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

39

Proteomics Informatics

• Search spectra against:• Entire bacterial genome;• All Met initiated peptides; or • Statistically likely Met initiated peptides.

• Easily consider initial Met loss PTM, too

• Off-the-shelf MS/MS search engines (Mascot / X!Tandem / OMSSA)

Page 40: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

40

Other Practical Issues

• Suitable for commonly available instrumentation• Only the sample prep. is (somewhat) novel.

• Need living organism• Stage of life-cycle?

• Bang for buck?• N-terminal peptides / $$$$

• In discussions with JCVI (ex TIGR)• Possible pilot project?

Page 41: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

41

Other Research Projects

• Improving peptide identification by MS/MS• Spectral matching using HMMs• Combining search engine results • Spectral matching for detection and quantitation

• Microorganism identification using MS• Live public web-site and database

• (Inexact) uniqueness guarantees• Primer/Probe oligo design• Pathogen detection (DNA & Peptide)• Significant false-positive peptide identifications

Page 42: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

42

Spectral Matching

• Detection vs. identification• Increased sensitivity• No novel peptides

• NIST GC/MS Spectral Library• Identifies small molecules, • 100,000’s of (consensus) spectra• Bundled/Sold with many instruments• “Dot-product” spectral comparison• Current project: Peptide MS/MS

Page 43: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

43

Peptide DLATVYVDVLK

Page 44: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

44

Peptide DLATVYVDVLK

Page 45: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

45

Hidden Markov Models for Spectral Matching

• Capture statistical variation and consensus in peak intensity

• Capture semantics of peaks• Extrapolate model to other peptides

• Good specificity with superior sensitivity for peptide detection• Assign 1000’s of additional spectra (w/ p-value < 10-5)

Page 46: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

46

www.RMIDb.org

Page 47: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

47

www.RMIDb.org

Statistics:• 16.7 x 106 (6.4 x 106) protein sequences• ~ 40,000 organisms, ~ 19,700 species• 557 (415) complete genomes

Sources:• TIGR’s CMR, SwissProt, TrEMBL, Genbank

Proteins, RefSeq Proteins & Genomes• Inclusive Glimmer3 predictions on Genomes• Pfam and GO assignments using BOINC grid

Page 48: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

48

www.RMIDb.org

Accessed from all over the world...

Page 49: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

49

Uniqueness guarantees

• 20-mer oligo signatures for B. anthracis• In all available strains as exact match• No (inexact) match to other Bacillus species

Specificity # Signatures % of genome

Exact 2035086 39.4%

k = 1 866787 16.8%

k = 2 75795 1.5%

k = 3 174 0.003%

Page 50: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

50

Uniqueness guarantees

• Human genome primer design problem

• “4-unique” DNA 20-mers:• Edit-distance ≥ 5 to any non-specific

hybridization site• No such valid loci on Chr. 22!• Currently analyzing entire genome

• “3-unique” DNA 20-mers:• Initial experiments suggest ~ 0.01% valid• Approx. 1 valid oligo every 10,000 bases

Page 51: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

51

Future Research Plans

• Cancer biomarkers:

• Optimize proteomics workflow for protein sequence coverage

• Improve informatics infrastructure to make interpretation easier

• Identify splice variants in cancer cell-lines (MCF-7) and clinical brain tumor samples

Page 52: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

52

Future Research Plans

• Genome Annotation

• Collect evidence for functional alternative splicing in public datasets into dbPEP.

• Conduct pilot project for bacterial genome annotation with JCVI.

• Improve informatics infrastructure to make interpretation easier.

Page 53: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

53

Future Research Plans

• Peptide Identification

• Expand library of HMM models for high-confidence spectral matching

• Spectral matching for biomarkers and quantitation (with Calibrant).

• Specificity metric for peptides identified using MS/MS

Page 54: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

54

Future Research Plans

• Microorganism identification by mass spectrometry

• Specificity of tandem mass spectra

• Revamp RMIDb prototype

• Incorporate spectral matching, top-down.

Page 55: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

55

Future Research Plans

• Oligonucleotide Design

• Uniqueness oracle for inexact match in human

• Integration with Primer3

• Tiling, multiplexing, pooling, & tag arrays

Page 56: Direct Experimental Observation of Functional Protein Isoforms by Tandem Mass Spectrometry

56

Acknowledgements

• Catherine Fenselau, Steve Swatkoski• UMCP Biochemistry

• Chau-Wen Tseng, Xue Wu• UMCP Computer Science

• Cheng Lee, Brian Balgley• Calibrant Biosystems

• PeptideAtlas, HUPO PPP, X!Tandem

• Funding: NIH/NCI, USDA/ARS