MS2012 Lecture 6 - Global and discovery proteomics … and Discovery Proteomics Christine A....

32
11/9/2012 1 Global and Discovery Proteomics Christine A. Jelinek, Ph.D. Johns Hopkins University School of Medicine Department of Pharmacology and Molecular Sciences Middle Atlantic Mass Spectrometry Laboratory Global and Discovery Proteomics Lecture Agenda Genomics vs. Proteomics Discovery Proteomics: Basic Mass Spectrometry Techniques Discovery Proteomics: Basic Bioinformatic Techniques Database Searching: Cloud Computing Mascot Sequest Combining Algorithms

Transcript of MS2012 Lecture 6 - Global and discovery proteomics … and Discovery Proteomics Christine A....

11/9/2012

1

Global and Discovery Proteomics

Christine A. Jelinek, Ph.D.

Johns Hopkins University School of Medicine

Department of Pharmacology and Molecular Sciences

Middle Atlantic Mass Spectrometry Laboratory

Global and Discovery Proteomics Lecture Agenda

Genomics vs. Proteomics

Discovery Proteomics: Basic Mass Spectrometry Techniques

Discovery Proteomics: Basic Bioinformatic Techniques

Database Searching:

Cloud Computing

Mascot

Sequest

Combining Algorithms

11/9/2012

2

Human Genome Project Sequencing the human genome has transformed current biomedical research

Completion of genome sequencing inspired corresponding approach to identify and characterize proteins comprising the human proteome

Human Genome Project

Initiated : October 1990 Working Draft : 2000 Complete : 2003

Inferences about biological systemsCoding genes: 20,476

Non coding genes: 22,170Pseudogenes: 13,322

EnsemblCurrent Totals

http://www.ncbi.nlm.nih.gov/genome

Protein-Coding Genes

Gregory, TR. Nature Reviews Genetics. 6, 699-708. doi:10.1038/nrg1674

11/9/2012

3

Proteomics

A proteome consists of all proteins present in a sample (cell, tissue, body fluid, etc.) at a defined point in time and under defined conditions

Proteomics is the large-scale study of the expression, localization, function, and interaction of proteins expressed by an organism’s genome

Proteomics Using Mass Spectrometry for Proteomics Experiments

“Over the last two decades, mass spectrometry-based technologies have undergone rapid advances and a high degree of innovation to

fulfill the expectations of the proteomics and life science communities”

Surinova S. et al. J. Prot. Res. 2011 10:5-16

11/9/2012

4

Leading mass spectrometry-based proteomics laboratories have demonstrated that protein

products of up to ~10,000 of the ~20,000 protein-coding human genes can be identified and quantified in a single experimental system

Schwanhäusser B et al. Nature 2011 473:337-342Beck M et al. Mol. Syst. Biol. 2011 7:549

Nagaraj N et al. Mol. Syst Biol 2011 7:548

Proteomics Using Mass Spectrometry for Proteomics Experiments

http://www.proteome.ru/en/avogadro

The “-omics Iceberg”

11/9/2012

5

Human Plasma Proteome A case of the “-omics Iceberg”

Anderson N L , Anderson N G Mol Cell Proteomics 2002;1:845-867

Aebersold R.Nature Methods. 6: 411 – 412. doi:10.1038/nmeth.f.255

Challenges in Proteomics

11/9/2012

6

Shotgun Proteomics

Sample Preparation Proteins Peptides

HPLC-MS/MS Analysis

MS Data

MS/MS DataBioinformaticsStatistical Analysis

Aebersold R and Mann M. Nature 2003 422:198-207

Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling

11/9/2012

7

Bottom-Up Proteomics Common Sample Preparative Steps

Sample Preparation Proteins Peptides

HPLC-MS/MS Analysis

MS Data

MS/MS DataBioinformaticsStatistical Analysis

Aebersold R and Mann M. Nature 2003 422:198-207

Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling

11/9/2012

8

Commonly used Enzymes Bottom-up Mass Spectrometry

Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling

Sample Preparation Proteins Peptides

HPLC-MS/MS Analysis

MS Data

MS/MS DataBioinformaticsStatistical Analysis

Aebersold R and Mann M. Nature 2003 422:198-207

11/9/2012

9

Fourier transformed MS Scan

A

B

C

Tandem MS/MS Scans

Bottom-Up Proteomics Data-Dependent Tandem Mass Spectrometry

Fragmentation

Bottom-Up Proteomics Data-Dependent Tandem Mass Spectrometry

11/9/2012

10

Bottom-Up Proteomics Peptide Fragmentation

Biomed. Mass Spectrom. 11 (11): 601.doi:10.1002/bms.1200111109

Bottom-Up Proteomics Peptide Fragmentation

11/9/2012

11

Bottom-Up Proteomics Identifying Post-Translational Modifications

Technical Limitations: LOD Data Dependent Mass Spectrometry

http://www.proteome.ru/en/avogadro

11/9/2012

12

Technical Limitations: LOD Data Dependent Mass Spectrometry

Ghaemmaghami S. et al. Nature. 2003. 425: 737-741.

Smith R. et al. Advances in Protein Chemistry. 2003. 65: 85–131.

Technical Limitations: Dynamic Range Data Dependent Mass Spectrometry

11/9/2012

13

Michalski A., Cox J., and Mann M. J. Proteome Res. 10, 1785–1793.

Technical Limitations: Sampling Data Dependent Mass Spectrometry

Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling

Sample Preparation Proteins Peptides

HPLC-MS/MS Analysis

MS Data

MS/MS DataBioinformaticsStatistical Analysis

Aebersold R and Mann M. Nature 2003 422:198-207

11/9/2012

14

Bottom-Up Proteomics Protein and Peptide Identification

SEMHIKHYTTKILGFREEGDSCPLKQWDDSKILVAVADKLLEYEEKILLFNSAKYLLDESSTYKLMHDDSV

SEMHIKHYTTKILGFR

EEGDSCPLKQWDDSKILVAVADKLLEYEEKILLFNSAK

YLLDESSTYKLMHDDSV

BSA_500fmol_02_120601 #1854 RT: 21.52 AV: 1 NL: 1.91E3T: ITMS + c NSI d Full ms2 [email protected] [115.00-925.00]

200 300 400 500 600 700 800 900m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ative A

bund

ance

800.43

448.20

602.48 814.37440.22

701.38400.63

213.13391.80

585.04515.36 782.43683.26312.35

132.93

Protein Tryptic Peptides Experimental Mass Spectrum

Theoretical Mass Spectrum

Theoretical Tryptic Peptides

Protein Sequence

Bioinformatics Resources Protein and Peptide Identification

11/9/2012

15

Database Searching

Comparing raw MS/MS data with molecular sequence databases to indentify constituent proteins

Database Search Algorithms Protein Protein and Peptide Identification

Search EngineProtein identification and

characterization

Fragment ion mass & intensity values

Peptide molecular masses

Protein/DNA sequence databases

11/9/2012

16

Public Proteomic Databases

MSDB: Comprehensive, non-identical protein sequence database maintained by the Proteomics Department at the Hammersmith Campus of Imperial College London

NCBInr: Comprehensive, non-identical protein database maintained by NCBI. The entries have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB

SwissProt: High quality, curetted protein database

dbEST: Division of GenBank containing "single-pass" cDNA sequences, or Expressed Sequence Tags

ThermoFisher

FASTA file

Public Proteomic Databases Uniprot

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY

11/9/2012

17

Mascot Search Algorithm

MASCOT SoftwareProtein and Peptide Identification

• Mascot combines 3 types of searches: Peptide Mass Fingerprinting MS/MS ions Sequence Query

• Searches against any FASTA database• Unique, true probability based scoring• Accepts mass spectrometry data from all leading instrument manufacturers• High throughput format for single and multi- processor systems and clusters• Automates search submission without custom programming• Results summary of search results in web browser format• Licensed be more than a thousand academic and commercial laboratories

11/9/2012

18

Peptide Mass Fingerprinting (PMF) Peptide Identification using MASCOT

Peptide Mass Fingerprinting (PMF) Peptide Identification using MASCOT

11/9/2012

19

MS/MS Ion Searching Peptide Identification using MASCOT

MS/MS Ion Searching Peptide Identification using MASCOT

11/9/2012

20

Protein Identification using MASCOTUsing MS and MS/MS spectra

x

Protein identified using both MS and MS/MS spectra

Protein Identification using MASCOTCombining MS and MS/MS spectra

11/9/2012

21

Protein Identification using MASCOT Combining MS and MS/MS spectra

Protein Identification using MASCOTInterpreting Results

ThermoFisher

11/9/2012

22

Protein Identification using MASCOTInterpreting Results

ThermoFisher

Protein Identification using MASCOTInterpreting Results

ThermoFisher

11/9/2012

23

Protein Identification using MASCOTInterpreting Results

ThermoFisher

Protein Identification using MASCOTCommon Settings for Search

ThermoFisher

11/9/2012

24

Protein Identification using MASCOTCommon Settings for Search

ThermoFisher

Sequest Search Algorithm

11/9/2012

25

Data – DependentMass Spectral Scans: MS/MS depends on the MS

>gi|84670|pir||B27257 coagulogen II precursor - horseshoe crab (Tachypleustridentatus)gi|10809 (X04192) coagulogentype II [Tachypleustridentatus]gi|217395|gnl|PID|d1000491 (D00077) coagulogen type 2 [Tachypleustridentatus]gi|356167|prf||1208319A coagulogen [Tachypleus sp.] [MASS=21826]MEKKLFGIALLLTTVASVLAADTNAPICLCDEPGVLGRTQIVTTEIKDKIEKAVEAVAQESGVSGRGFSIFSHHPVFRECGKYECRTVRPEHSRCYNFPPFIHFKSECPVSTRDCEPVFGYTVAGEFRVIVQAPRAGFRQCVWQHKCRFGSNSCGYNGRCTQQRSVVRLVTYNLEKDGFLCESFRTCCGCPCRSF>gi|585398|sp|P28175|LFC_TACTR LIMULUS

FASTA Protein Database

CorrelationAnalysis

Protein / Peptide / Modification Identification

Protein Identification using SequestMS/MS spectra-based Identification

Protein Identification using SequestSequest Search Workflow

11/9/2012

26

Extraction Parameters Search Parameters Modifications

Protein Identification using SequestSequest Search Parameters

Comparing Algorithms Mascot vs. Sequest

11/9/2012

27

Comparing Algorithms Mascot vs. Sequest Data file format

ThermoFisher

Proteome Discoverer Software

11/9/2012

28

Combining Algorithms Mascot and Sequest using Proteome Discoverer

ThermoFisher

Combining Algorithms Mascot and Sequest using Proteome Discoverer

ThermoFisher

11/9/2012

29

Proteome Discoverer Interpreting Results

ThermoFisher

Proteome Discoverer Interpreting Results

ThermoFisher

11/9/2012

30

Proteome Discoverer Interpreting Results

ThermoFisher

Proteome Discoverer Interpreting Results

ThermoFisher

11/9/2012

31

Cloud Computing Strategies

Cloud Computing

40:00:00OLD

01:00:00NEW

11/9/2012

32

Combining Bioinformatic Tools Integrated Analysis Inc. Pass Software

.

.

.

.

..

.

.

.

.

Custom

Convert MS (.mzXML, .mgf, .mzML)

X! Tandem

OMSSA

FASTA Merger

Create Decoy FASTA

Peptide Prophet

Isobaric Labeled Quantitation

Refine & Merge Results

Protein Prophet

Custom Algorithms

Combining Bioinformatic Tools Customizing Bioinformatic Workflows

High Resolution MS – OMSSA and X!TandemHigh Resolution MS – OMSSA and X!Tandem

Convert MSConvert MS

Merge FASTA

Create Reverse Sequence

Merge FASTA

Create Reverse Sequence