Proteomics Informatics (BMSC-GA 4437)

31
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information [email protected] http://fenyolab.org/presentations/Proteomics_Informatics_2014/

description

Proteomics Informatics (BMSC-GA 4437). Course Director David Fenyö Contact information [email protected] http://fenyolab.org/presentations/Proteomics_Informatics_2014/. http://fenyolab.org/presentations/Proteomics_Informatics_2014/. Proteomics Informatics – Learning Objectives. - PowerPoint PPT Presentation

Transcript of Proteomics Informatics (BMSC-GA 4437)

Page 1: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics (BMSC-GA 4437)

Course Director

David Fenyö

Contact information

[email protected]

http://fenyolab.org/presentations/Proteomics_Informatics_2014/

Page 2: Proteomics Informatics (BMSC-GA 4437)

http://fenyolab.org/presentations/Proteomics_Informatics_2014/

Page 3: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Learning Objectives

Be able analyze proteomics data sets and understand the limitations of the results.

Page 4: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – SyllabusWeek 1 Overview of proteomics (1/28/2014 at 4 pm in TRB 718)

Week 2 Overview of mass spectrometry (2/4/2014 at 4 pm in TRB 718)

Week 3 Analysis of mass spectra: signal processing, peak finding, and isotope clusters (2/11/2014 at 4 pm in TRB 119)

Week 4 Protein identification I: searching protein sequence collections and significance testing (2/18/2014 at 4 pm in TRB 718)

Week 5 Protein identification II: de novo sequencing (2/25/2014 at 4 pm in TRB 718)

Week 6 Databases, data repositories and standardization (3/4/2014 at 4 pm in TRB 718)

Week 7 Proteogenomics (3/11/2014 at 4 pm in TRB 718)

Week 8 Protein quantitation I: Overview (3/18/2014 at 4 pm in TRB 718)

Week 9 Protein quantitation II: Targeted (3/25/2014 at 4 pm in TRB 718)

Week 10 Protein characterization I: post-translational modifications (4/1/2014 at 4 pm in TRB 718)

Week 11 Protein characterization II: Protein interactions (4/10/2014 at 4 pm in TRB 718)

Week 12 Molecular Signatures (4/17/2014 at 4 pm in TRB 718)

Week 13 Presentations of projects (4/22/2014 at 4 pm in TRB 718)

Page 5: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Overview of Proteomics (Week 1)

• Why proteomics?

• Bioinformatics

• Overview of the course

Page 6: Proteomics Informatics (BMSC-GA 4437)

Motivating Example: Protein Regulation

Geiger et al., “Proteomic changes resulting from gene copy number variations in cancer cells”, PLoS Genet. 2010 Sep 2;6(9). pii: e1001090.

Page 7: Proteomics Informatics (BMSC-GA 4437)

Motivating Example: Protein Complexes

Alber et al., Nature 2007

Page 8: Proteomics Informatics (BMSC-GA 4437)

Motivating Example: Signaling

Choudhary & Mann, Nature Reviews Molecular Cell Biology 2010

Page 9: Proteomics Informatics (BMSC-GA 4437)

BioinformaticsBiological System

Samples

Measurements

Experimental Design

Raw Data

Information

Data Analysis

Page 10: Proteomics Informatics (BMSC-GA 4437)

Mass Spectrometry Based Proteomics

Mass spectrometry

LysisFractionation

MS

Digestion

Identified and Quantified Proteins

Peak Finding Charge determination

De-isotopingIntegrating Peaks

Searching

Page 11: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Overview of Mass spectrometry (Week 2)

Ion Source

Mass Analyzer Detector

mass/charge

inte

nsity

Page 12: Proteomics Informatics (BMSC-GA 4437)

Mass Analyzer 1

Frag-mentation DetectorIon

SourceMass

Analyzer 2

b y

Proteomics Informatics – Overview of Mass spectrometry (Week 2)

Page 13: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Overview of Mass spectrometry (Week 2)

Mass Analyzer 1

Frag-mentation

Detector

inte

nsity

mass/charge

Ion Source

Mass Analyzer 2

LC

inte

nsity

mass/chargeinte

nsity

mass/charge

inte

nsity

mass/chargeinte

nsity

mass/chargeinte

nsity

mass/charge

Time

inte

nsity

mass/chargeinte

nsity

mass/chargeinte

nsity

mass/charge

inte

nsity

mass/chargeinte

nsity

mass/chargeinte

nsity

mass/charge

inte

nsity

mass/chargeinte

nsity

mass/chargeinte

nsity

mass/charge

Page 14: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Analysis of mass spectra: signal processing, peak finding, and isotope clusters (Week 3)

m/z

Inte

nsity

Page 15: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Protein identification I: searching protein

sequence collections and significance testing (Week 4)

MS/MS

LysisFractionation

MS/MS

Digestion

SequenceDB

All FragmentMasses

Pick Protein

Compare, Score, Test Significance

Repeat for all proteins

Pick PeptideLC-MS

Repeat for

all peptides

Page 16: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Protein identification I: searching protein

sequence collections and significance testing (Week 4)

Page 17: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Protein identification II:

de novo sequencing (Week 5)

m/z

% R

elat

ive

Abu

ndan

ce

100

0250 500 750 1000

[M+2H]2+

762

260 389 504

633

875

292 405 5349071020663 778 1080

1022

Mass Differences

1-letter code

3-letter code

Chemical formula

Monoisotopic

Average

A Ala C3H5ON 71.0371 71.0788R Arg C6H12ON4 156.101 156.188N Asn C4H6O2N2 114.043 114.104D Asp C4H5O3N 115.027 115.089C Cys C3H5ONS 103.009 103.139E Glu C5H7O3N 129.043 129.116Q Gln C5H8O2N2 128.059 128.131G Gly C2H3ON 57.0215 57.0519H His C6H7ON3 137.059 137.141I Ile C6H11ON 113.084 113.159L Leu C6H11ON 113.084 113.159K Lys C6H12ON2 128.095 128.174M Met C5H9ONS 131.04 131.193F Phe C9H9ON 147.068 147.177P Pro C5H7ON 97.0528 97.1167S Ser C3H5O2N 87.032 87.0782T Thr C4H7O2N 101.048 101.105W Trp C11H10ON2 186.079 186.213Y Tyr C9H9O2N 163.063 163.176V Val C5H9ON 99.0684 99.1326

Amino acid masses

Sequences consistent

with spectrum

Page 18: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Databases, data repositories and

standardization (Week 6)

Page 19: Proteomics Informatics (BMSC-GA 4437)

Most proteins show very reproducible peptide patterns

Proteomics Informatics – Databases, data repositories and

standardization (Week 6)

Page 20: Proteomics Informatics (BMSC-GA 4437)

Query Spectrum

Best match In GPMDB

Secondbest match In GPMDB

Proteomics Informatics – Databases, data repositories and

standardization (Week 6)

Page 21: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Proteogenomics (Week 7)

Tumor Specific

Protein DB

Non-Tumor Sample Genome sequencing Identify germline variants

Reference Human Database (Ensembl)

Genome sequencingRNA-SeqTumor Sample

Identify alternative splicing, somatic variants and

novel expression

TCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGAGAGCTGTCGATAGCTG

Exon 1 Exon 2 Exon 3

Exon 1

Variants

Alt. Splicing Novel Expression

Exon 1 Exon X Exon 2

Fusion Genes

Gene XExon 1

Gene XExon 2

Gene YExon 1

Gene YExon 2

Gene X Gene Y Kelly Ruggles

Page 22: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Protein quantitation I: Overview (Week 8)

Fractionation

Digestion

LC-MS

Lysis

MS

C ij

I ik

pij

Pr

pD

ijk pPep

ik

pLC

ik

pMS

ik

pL

ij

ppppppCIMS

ik

LC

ik

Pep

ikj

D

ijkij

L

ijijkik

Pr

Sample iProtein jPeptide k

ppppppIC MS

ik

LC

ik

Pep

ik

D

ijkij

L

ijk

ikk

ij Pr

k

Page 23: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Protein quantitation I: Overview (Week 8)

Fractionation

Digestion

LC-MS

Lysis

MS MS

pppppp MS

ik

LC

ik

Pep

ik

D

ijkij

L

ijk

Pr

Assumption:

constant for all samples

IICC jjjj iiii mnmn//

Sample iProtein jPeptide k

Page 24: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Protein quantitation II: Targeted (Week 9)

Fractionation

Digestion

LC-MS

Lysis

MS

Shotgun proteomics Targeted MS1. Records M/Z

2. Selects peptides based on abundance and fragments MS/MS

3. Protein database search for peptide identification

Data Dependent Acquisition (DDA) Uses predefined set of peptides

1. Select precursor ion

MS

2. Precursor fragmentation

MS/MS

3. Use Precursor-Fragment pairs for identification

Page 25: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Protein characterization I: post-translational

modifications (Week 10)Peptide with two possible modification sites

MS/MS spectrum

m/z

Inte

nsity

Matching

Which assignment doesthe data support?

1, 1 or 2, or 1 and 2?

Page 26: Proteomics Informatics (BMSC-GA 4437)

AB

AC

D

Digestion

Mass spectrometry

EF

Identification

Proteomics Informatics – Protein Characterization II: protein

interactions (Week 11)

Page 27: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Molecular Signatures (Week 12)

Page 28: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Molecular Signatures (Week 12)

Page 29: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics – Presentations of projects (Week 13)

Select a published data set that has been made public and reanalyze it.

Highlighted data sets: http://www.thegpm.org/

10 min presentations

Page 30: Proteomics Informatics (BMSC-GA 4437)
Page 31: Proteomics Informatics (BMSC-GA 4437)

Proteomics Informatics (BMSC-GA 4437)

Course Director

David Fenyö

Contact information

[email protected]

http://fenyolab.org/presentations/Proteomics_Informatics_2014/