Introduction to Proteomics - Human Proteomics Program, University
MS2012 Lecture 6 - Global and discovery proteomics … and Discovery Proteomics Christine A....
Transcript of MS2012 Lecture 6 - Global and discovery proteomics … and Discovery Proteomics Christine A....
11/9/2012
1
Global and Discovery Proteomics
Christine A. Jelinek, Ph.D.
Johns Hopkins University School of Medicine
Department of Pharmacology and Molecular Sciences
Middle Atlantic Mass Spectrometry Laboratory
Global and Discovery Proteomics Lecture Agenda
Genomics vs. Proteomics
Discovery Proteomics: Basic Mass Spectrometry Techniques
Discovery Proteomics: Basic Bioinformatic Techniques
Database Searching:
Cloud Computing
Mascot
Sequest
Combining Algorithms
11/9/2012
2
Human Genome Project Sequencing the human genome has transformed current biomedical research
Completion of genome sequencing inspired corresponding approach to identify and characterize proteins comprising the human proteome
Human Genome Project
Initiated : October 1990 Working Draft : 2000 Complete : 2003
Inferences about biological systemsCoding genes: 20,476
Non coding genes: 22,170Pseudogenes: 13,322
EnsemblCurrent Totals
http://www.ncbi.nlm.nih.gov/genome
Protein-Coding Genes
Gregory, TR. Nature Reviews Genetics. 6, 699-708. doi:10.1038/nrg1674
11/9/2012
3
Proteomics
A proteome consists of all proteins present in a sample (cell, tissue, body fluid, etc.) at a defined point in time and under defined conditions
Proteomics is the large-scale study of the expression, localization, function, and interaction of proteins expressed by an organism’s genome
Proteomics Using Mass Spectrometry for Proteomics Experiments
“Over the last two decades, mass spectrometry-based technologies have undergone rapid advances and a high degree of innovation to
fulfill the expectations of the proteomics and life science communities”
Surinova S. et al. J. Prot. Res. 2011 10:5-16
11/9/2012
4
Leading mass spectrometry-based proteomics laboratories have demonstrated that protein
products of up to ~10,000 of the ~20,000 protein-coding human genes can be identified and quantified in a single experimental system
Schwanhäusser B et al. Nature 2011 473:337-342Beck M et al. Mol. Syst. Biol. 2011 7:549
Nagaraj N et al. Mol. Syst Biol 2011 7:548
Proteomics Using Mass Spectrometry for Proteomics Experiments
http://www.proteome.ru/en/avogadro
The “-omics Iceberg”
11/9/2012
5
Human Plasma Proteome A case of the “-omics Iceberg”
Anderson N L , Anderson N G Mol Cell Proteomics 2002;1:845-867
Aebersold R.Nature Methods. 6: 411 – 412. doi:10.1038/nmeth.f.255
Challenges in Proteomics
11/9/2012
6
Shotgun Proteomics
Sample Preparation Proteins Peptides
HPLC-MS/MS Analysis
MS Data
MS/MS DataBioinformaticsStatistical Analysis
Aebersold R and Mann M. Nature 2003 422:198-207
Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling
11/9/2012
7
Bottom-Up Proteomics Common Sample Preparative Steps
Sample Preparation Proteins Peptides
HPLC-MS/MS Analysis
MS Data
MS/MS DataBioinformaticsStatistical Analysis
Aebersold R and Mann M. Nature 2003 422:198-207
Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling
11/9/2012
8
Commonly used Enzymes Bottom-up Mass Spectrometry
Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling
Sample Preparation Proteins Peptides
HPLC-MS/MS Analysis
MS Data
MS/MS DataBioinformaticsStatistical Analysis
Aebersold R and Mann M. Nature 2003 422:198-207
11/9/2012
9
Fourier transformed MS Scan
A
B
C
Tandem MS/MS Scans
Bottom-Up Proteomics Data-Dependent Tandem Mass Spectrometry
Fragmentation
Bottom-Up Proteomics Data-Dependent Tandem Mass Spectrometry
11/9/2012
10
Bottom-Up Proteomics Peptide Fragmentation
Biomed. Mass Spectrom. 11 (11): 601.doi:10.1002/bms.1200111109
Bottom-Up Proteomics Peptide Fragmentation
11/9/2012
11
Bottom-Up Proteomics Identifying Post-Translational Modifications
Technical Limitations: LOD Data Dependent Mass Spectrometry
http://www.proteome.ru/en/avogadro
11/9/2012
12
Technical Limitations: LOD Data Dependent Mass Spectrometry
Ghaemmaghami S. et al. Nature. 2003. 425: 737-741.
Smith R. et al. Advances in Protein Chemistry. 2003. 65: 85–131.
Technical Limitations: Dynamic Range Data Dependent Mass Spectrometry
11/9/2012
13
Michalski A., Cox J., and Mann M. J. Proteome Res. 10, 1785–1793.
Technical Limitations: Sampling Data Dependent Mass Spectrometry
Bottom-Up Proteomics General procedure using LC-MS/MS for proteomic profiling
Sample Preparation Proteins Peptides
HPLC-MS/MS Analysis
MS Data
MS/MS DataBioinformaticsStatistical Analysis
Aebersold R and Mann M. Nature 2003 422:198-207
11/9/2012
14
Bottom-Up Proteomics Protein and Peptide Identification
SEMHIKHYTTKILGFREEGDSCPLKQWDDSKILVAVADKLLEYEEKILLFNSAKYLLDESSTYKLMHDDSV
SEMHIKHYTTKILGFR
EEGDSCPLKQWDDSKILVAVADKLLEYEEKILLFNSAK
YLLDESSTYKLMHDDSV
BSA_500fmol_02_120601 #1854 RT: 21.52 AV: 1 NL: 1.91E3T: ITMS + c NSI d Full ms2 [email protected] [115.00-925.00]
200 300 400 500 600 700 800 900m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ative A
bund
ance
800.43
448.20
602.48 814.37440.22
701.38400.63
213.13391.80
585.04515.36 782.43683.26312.35
132.93
Protein Tryptic Peptides Experimental Mass Spectrum
Theoretical Mass Spectrum
Theoretical Tryptic Peptides
Protein Sequence
Bioinformatics Resources Protein and Peptide Identification
11/9/2012
15
Database Searching
Comparing raw MS/MS data with molecular sequence databases to indentify constituent proteins
Database Search Algorithms Protein Protein and Peptide Identification
Search EngineProtein identification and
characterization
Fragment ion mass & intensity values
Peptide molecular masses
Protein/DNA sequence databases
11/9/2012
16
Public Proteomic Databases
MSDB: Comprehensive, non-identical protein sequence database maintained by the Proteomics Department at the Hammersmith Campus of Imperial College London
NCBInr: Comprehensive, non-identical protein database maintained by NCBI. The entries have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB
SwissProt: High quality, curetted protein database
dbEST: Division of GenBank containing "single-pass" cDNA sequences, or Expressed Sequence Tags
ThermoFisher
FASTA file
Public Proteomic Databases Uniprot
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY
11/9/2012
17
Mascot Search Algorithm
MASCOT SoftwareProtein and Peptide Identification
• Mascot combines 3 types of searches: Peptide Mass Fingerprinting MS/MS ions Sequence Query
• Searches against any FASTA database• Unique, true probability based scoring• Accepts mass spectrometry data from all leading instrument manufacturers• High throughput format for single and multi- processor systems and clusters• Automates search submission without custom programming• Results summary of search results in web browser format• Licensed be more than a thousand academic and commercial laboratories
11/9/2012
18
Peptide Mass Fingerprinting (PMF) Peptide Identification using MASCOT
Peptide Mass Fingerprinting (PMF) Peptide Identification using MASCOT
11/9/2012
19
MS/MS Ion Searching Peptide Identification using MASCOT
MS/MS Ion Searching Peptide Identification using MASCOT
11/9/2012
20
Protein Identification using MASCOTUsing MS and MS/MS spectra
x
Protein identified using both MS and MS/MS spectra
Protein Identification using MASCOTCombining MS and MS/MS spectra
11/9/2012
21
Protein Identification using MASCOT Combining MS and MS/MS spectra
Protein Identification using MASCOTInterpreting Results
ThermoFisher
11/9/2012
22
Protein Identification using MASCOTInterpreting Results
ThermoFisher
Protein Identification using MASCOTInterpreting Results
ThermoFisher
11/9/2012
23
Protein Identification using MASCOTInterpreting Results
ThermoFisher
Protein Identification using MASCOTCommon Settings for Search
ThermoFisher
11/9/2012
24
Protein Identification using MASCOTCommon Settings for Search
ThermoFisher
Sequest Search Algorithm
11/9/2012
25
Data – DependentMass Spectral Scans: MS/MS depends on the MS
>gi|84670|pir||B27257 coagulogen II precursor - horseshoe crab (Tachypleustridentatus)gi|10809 (X04192) coagulogentype II [Tachypleustridentatus]gi|217395|gnl|PID|d1000491 (D00077) coagulogen type 2 [Tachypleustridentatus]gi|356167|prf||1208319A coagulogen [Tachypleus sp.] [MASS=21826]MEKKLFGIALLLTTVASVLAADTNAPICLCDEPGVLGRTQIVTTEIKDKIEKAVEAVAQESGVSGRGFSIFSHHPVFRECGKYECRTVRPEHSRCYNFPPFIHFKSECPVSTRDCEPVFGYTVAGEFRVIVQAPRAGFRQCVWQHKCRFGSNSCGYNGRCTQQRSVVRLVTYNLEKDGFLCESFRTCCGCPCRSF>gi|585398|sp|P28175|LFC_TACTR LIMULUS
FASTA Protein Database
CorrelationAnalysis
Protein / Peptide / Modification Identification
Protein Identification using SequestMS/MS spectra-based Identification
Protein Identification using SequestSequest Search Workflow
11/9/2012
26
Extraction Parameters Search Parameters Modifications
Protein Identification using SequestSequest Search Parameters
Comparing Algorithms Mascot vs. Sequest
11/9/2012
27
Comparing Algorithms Mascot vs. Sequest Data file format
ThermoFisher
Proteome Discoverer Software
11/9/2012
28
Combining Algorithms Mascot and Sequest using Proteome Discoverer
ThermoFisher
Combining Algorithms Mascot and Sequest using Proteome Discoverer
ThermoFisher
11/9/2012
29
Proteome Discoverer Interpreting Results
ThermoFisher
Proteome Discoverer Interpreting Results
ThermoFisher
11/9/2012
30
Proteome Discoverer Interpreting Results
ThermoFisher
Proteome Discoverer Interpreting Results
ThermoFisher
11/9/2012
32
Combining Bioinformatic Tools Integrated Analysis Inc. Pass Software
.
.
.
.
..
.
.
.
.
Custom
Convert MS (.mzXML, .mgf, .mzML)
X! Tandem
OMSSA
FASTA Merger
Create Decoy FASTA
Peptide Prophet
Isobaric Labeled Quantitation
Refine & Merge Results
Protein Prophet
Custom Algorithms
Combining Bioinformatic Tools Customizing Bioinformatic Workflows
High Resolution MS – OMSSA and X!TandemHigh Resolution MS – OMSSA and X!Tandem
Convert MSConvert MS
Merge FASTA
Create Reverse Sequence
Merge FASTA
Create Reverse Sequence