Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum...

78

Transcript of Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum...

Page 1: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.
Page 2: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Correlating mRNA and protein abundance via genomic and proteomic characteristics

Dov Greenbaum

Gerstein LabThesis Seminar

April 21, 2004

Page 3: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

outline

Why analyze mRNA and protein correlationsBackground

Disparate Data Sources Correlating mRNA and Protein

ResultsOther analysesFormalism – comparing genome, transcriptome and proteome in terms of broad categories

New Data SetsAnalysis via Broad CategoriesAnalysis of factors affecting correlations

Another reason to expect correlations Expression and Protein Interactions

Page 4: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Why Correlate mRNA & Protein?

0500

100015002000250030003500400045005000

mRNA Protein

Experiments

Page 5: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Both mRNA and Protein Levels are necessary for complete analysis

Combinations of RNA and protein detection approaches have recently aided in theidentification of biomarkers in cancer Hegde et al Current Opinion in Biotech 2003

Shown mathematically in Hatzimanikatis et al Biotechnology 1999

Page 6: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Relationship between mRNA and Protein levels

dPi

dt= ks;i * mRNAi - kd;i Pi

where ks,i and kd,i are the protein synthesis and degradationrate constants, respectively,

At steady state: Pi =ks;i * mRNAi

kdi

Page 7: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Methods for determining mRNA expressionEach have Strengths and Weaknesses

Page 8: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Methods for determining protein abundance

2DE Gel Electrophoresis– (Klose, 1975; O’Farrell, 1975)• Multiple staining options• Small dynamic range• limited in what it can detect

Page 9: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Methods for determining protein abundance

ICAT– ICAT reagent-- relative

levels– VB dynamic range– Cannot detect post-

translational modifications– it require proteins to contain

cysteine residues, & these residues must be in the region of a peptide that is produced during proteolytic

cleavage

Page 10: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

MudPit

Really only HT that candetect PT modifications

Page 11: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Other Methods for determining protein abundance

DIGE– e.g. Cy3 vs cy5

labeling– Very big dynamic

range

2D-electrophoresis

Tap Tagging Weissman & O’Shea(Oct 2003)

Page 12: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Other Methods for determining protein abundance

020000

4000060000

80000

2DE

DIG

ICA

TM

PT

apA

ffyMax

01000

20003000

4000

2DE

DIG

ICA

TM

PT

AP

Max Prot

Page 13: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Same mRNA levels yet protein data varied > 20X

N ~100, r = 0.9

Protein Quantification via measurement of radioactivity

Gygi et al Molecular and Cellular Biology,1999.

Page 14: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Same mRNA levels yet protein data varied > 20X

Do some ORFs bias the results?

73 proteins (69%) R = 0.356

Page 15: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

mRNA vs Proteinr = 0.74

Protein Quantification via image analysis

Futcher et al Molecular and Cellular Biology, 1999

Page 16: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Jury is out…

Gygi et al: “This study revealed that transcript levels provide little predictive value with respect to the extent of protein expression.”

Futcher et al: “there is a good correlation between protein abundance and mRNA

abundance for the proteins that we have studied”.

Page 17: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

mRNA vs Protein

r =0.67

Greenbaum et al Bioinformatics 2001

Page 18: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

3 Genes in Lung AdenocarcinomasOp18, Annexin IV, and GAPD r = 0.025

Chen et al Molecular & Cellular Proteomics, 2002.

Page 19: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

murine hematopoietic precursor MPROchange in expression 0 - 72 hr

Page 20: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

murine hematopoietic precursor MPROchange in expression 0 - 72 hr

R = 0.58~ 80% of the genes are located in the first and third quadrants

Page 21: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Ratios of wt+gal to wt gal ICAT vs microarray

N ~ 290, r = 0.6

Ideker et al Science, 2001

Page 22: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Yeast growth under two different mediar = 0.45 but almost 1.0 for same loci in same pathway

Washburn et al PNAS 2003

Page 23: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Integrating multiple sources of Information

The challenge for computational biology is to provide methodologies for transforming high-throughput heterogeneous data sets into biological insights about the underlying mechanisms. Although high-throughput assays provide a global picture, the details are often noisy, hence conclusions should be supported by several types of observations. Integration Integration of data from assays that examine cellular of data from assays that examine cellular systems from different viewpointssystems from different viewpoints (for instance, gene expression and protein-protein interactions) can lead to a more can lead to a more coherent reconstruction and reduce the coherent reconstruction and reduce the effects of noiseeffects of noise. Nir Friedman Science 2004

Page 24: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Sources of DataData set Description Size [ORFs] Reference

mRNA expression

YoungGene chip profiles yeast cells with mutations that affect transcription 5455 Holstege et al. (1998)

Church Gene chip profiles of yeast cells under four different conditions 6263 Roth et al. (1998)

SamsonComparing gene chip profiles for yeast cells subjected to alkylating agent 6090 Jelinsky et al. (1998)

SAGE Yeast cells during vegetative growth 3778 Velculescu et al. (1997)

Reference expressionScaling and integrating the mRNA expression set into one data source 6249 -

Protein abundance

2-DE #1Measurement of yeast protein abundance by two-dimensional (2D) gel electrophoresis and mass spectrometry 156 Gygi et al. (1999)

2-DE #2 Similar to 2-DE set #1 71 Futcher et al. (1999)

TransposonLarge-scale fusions of yeast genes with lacZ by transposon insertion 1410

Ross-Macdonald et al. (1999)

Reference abundanceScaling and integrating the 2-DE data sets into one data source 181 -

Annotation

Annotated Localization

Subcellular localizations of yeast proteins 2133 (6280) Drawid et al. (2000)

Transmem-brane segments

Predicted transmembrane and soluble proteins in yeast 2710 (6280) Gerstein (1998)

MIPS functions Functional categories for yeast ORFs 3519 (6194) Mewes et al. (2000)

GOR secondary structure

Predicted secondary structure for yeast ORFs 6280 Gerstein (1998)

Page 25: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Reference mRNA Sets

Young

ChurchSamson

SAGE

Page 26: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Fitting Protein Data

Original Set

Page 27: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

mRNA vs Protein

r =0.67

Greenbaum et al Bioinformatics 2001

mRNA expression Reference Set 3 Affy Chip sets and SAGE6249 ORFs

Page 28: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Outliers (2STDEV from the mean)

ORF FUNCTION MIPSYBR118W translation elongation factor eEF1 alpha-A chain 5,30YER065C Isocitrate Lyase 1,2, 30YMR303C Alcohol dehydrogenase II 1, 2, 30YOL086C Alcohol dehydrogenase I 1, 2, 30YJR009C Glyceraldehyde-3-phosphate dehydrogenase 2 1, 2, 30YGR192C Glyceraldehyde-3-phosphate dehydrogenase 3 1, 2, 30YJR104C Copper-zinc superoxide dismutase 11,30YML054C lactate dehydrogenase cytochrome b2 1,2,30YJL052W glyceraldehyde-3-phosphate dehydrogenase 1 1,2,30YKR059W Translation initiation factor 5,30YML008C S-adenosyl-methionine delta-24-sterol-c-methyltransferase 1,30YFL022C Phenylalanine-- tRNA Ligase beta chain 5,30YJL008C Component of chaperonin-containing T-complex 6,30YPL160W leucine--tRNA ligase 5,30YOR361C translation initiation factor eIF3 subunit 3,5,30YCL030C phosphoribosyl-AMP cyclohydrolase 1YNL209W heat shock protein of HSP70 family 5,30

abo

ve t

ren

dli

ne

bel

ow

tre

nd

lin

e

High ProteinMetabolism (1)

Energy(2)

Low ProteinProt. Syn. (5)Prot. Fate (6)

Page 29: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Later larger datasets concurred with these results in that Generally…

1

10

100

1000

10000

100000

1000000

10000000

0.1 1 10 100 1000

mRNA

pro

tein

Alcohol dehydrogenase is also a stress induced protein in many organisms (Matton et al. 1990; An et al. 1991; Millar et al. 1994), Faster Ramp Up?

AA metabolism & Energy are 2X as likely to have high protein vs mRNA than the general population

Protein synthesis (~35% of all protein synthesis genes) and Protein fate (folding, modification, destination) are more likely to have low protein vs mRNA than the general population

Page 30: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Non-Outliers Generally…Tight Regulation by the cell

Only 3% of transcription associated genes (n = 441) have significantly uncorrelated mRNA and protein levels (2STDEV from trendline)

Transcription Assoc. genes are 25% of the essential genes in yeast.

Essential Genes as a group have higher correlations than the general yeast population

7% of Cell Cycle associated genes (n = 432) have significant non-correlation

Page 31: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Quick Summary

• Why correlate mRNA and protein levels?• Merged Disparate Data Sets

– Distinct but complimentary

• Global Correlations• Outliers are interesting:

– Metabolism & Energy Relatively high protein levels

– Protein Synthesis & Protein Fate low protein levels

Page 32: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Data Set Size

~6,000 ORFs

~6,000 ORFs5 Affymetrix GeneChips+ SAGE data

~170 ORFs2 DE-gel datasets

Page 33: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Enrichments

(F,[v,S]) -(F,[w,G])(F,[w,G])(Feature, [v,S], [w,G]) =

V & W are weights (expression level) of Sets S & G

Page 34: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Visual Formalism

~170 ORFs ~6,000 ORFs

Page 35: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Depletion of Random Coil Secondary Structure STABILITY

Concurrence with data from Perczel et al Chemistry 2003Regarding stability of specific secondary structures

Page 36: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Alanine’s, Glycines, Valines result in more compact structures More compact = more stable (i.e. thermophilic enzymes tend to be very compact)

Enrichment of Amino Acids STABILITY

Page 37: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Enrichment of Amino Acids

Simple story: translatome is enriched in same way as

transcriptome

Page 38: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost

yeast cell favors the expression of shorter ORFs over longer ones (as opposed to long lightweight ORFs – see MW of aa)

This selection is happening, for the most part at the transcriptome level--------------------------------------------------------------------------------------------------

Neg Correlation between ORF length and mRNA expression Jansen & Gerstein 2000 (And to a lesser degree with Protein Abundance)

Effect of transcription

Page 39: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost

CONCURS with experimental results from Akashi, Genetics 2003See also: Akashi,Genetics 1996 & Moriyama and Powell, NAR 1998

hypothesize that this trend exists in S. cerevisiae, D. melanogaster and E. coli. (although probably not in C. elegans)

Effect of transcription

Page 40: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Enrichment of Functional Categories

1

10

100

1000

10000

100000

1000000

10000000

0.1 1 10 100 1000

mRNA

pro

tein

Page 42: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Enrichment of localization - BIAS?

(Drawid & Gerstein. 2000),

Page 43: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Review

Formalism

Different gene sets b/c of limited data

Enrichments

concur with experimental results

Page 44: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 45: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 46: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 47: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 48: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 49: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Global Correlation

0.1

1

10

100

1000

0.1 1 10 100 1000

mRNA Expression

Pro

tein

Ab

un

dan

ce

MudPit (1)MudPit (2)2DE (1)2DE (2)R = 0.66

mRNA Set 6249 ORFs Protein Set # 2 2 2DE sets & 2 Mudpit ~2000 ORFs

Page 50: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Functional Categories

0.1

1

10

100

1000

0.1 1 10 100

mRNA Expression

Pro

tein

Ab

un

dan

ce

Cell Cycle (R=0.71)

Reference Data (R=0.66)

Cell Rescue (R=0.45)

Co-regulated proteins

High: ion transport , INTERACTION WITH THE CELLULAR ENVIRONMENT, CELL FATE LOW: METABOLISM ,FATE. CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION MECHANISM

Page 51: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Subcellular Localization

0.1

1

10

100

0.1 1 10 100mRNA Expression

Pro

tein

Ab

un

da

nc

e

Nucleolus (R=0.8)

Cell Periphery (R=0.74)

Reference Data (R=0.66)

Mitochondria (R=0.42)

Subcellular LocalizationMudpit does not have the 2DE biases

Lack of correlation in mitochondria Concurs

with experimental results from

Ohlmeier S et al.JBC 2004

Page 52: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Budr =0.76

Golgir = 0.28

Extracellularr = 0.33

Nucleusr = 0.49

Cytoplasmr = 0.50

Mitochondriar = 0.50

Cell Wallr =0.52

Endosomer = 0.87

ER r = 0.61

Membraner = 0.73

P M

r global = 0.46

Expression as a function of localization is well correlated with protein levels (latest data)

Page 53: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Why would we not find strong correlations?

Post translational modifications

Protein degradation

Error and Bias

Page 54: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Top

Top

Top

Bottom

Bottom

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Co

rrel

atio

n

Occupancy CAI Coefficient of Variation

Ribosomal OccupancyArava et al. (2003) Proc. Natl. Acad. Sci. USA

Ribosomal Occupancy

Top Frac. 0.78Bot. Frac. 0.30

Our results concurred with experimental findings by Brown and Herschlag’s groups:

Moreover:mRNAs not associated with any polysomes have even less of a correlation r = 0.2 v. strong translational control

Page 55: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Variability of mRNA expression

Top

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Co

rre

lati

on

Coefficient of Variation

mRNA Expression Variability

Top Frac. 0.89Bot. Frac. 0.20

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

timemR

NA

ex

pres

sion

Page 56: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Variability of mRNA expression

Top

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Co

rre

lati

on

Coefficient of Variation

mRNA Expression Variability

Top Frac. 0.89Bot. Frac. 0.20

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

timemR

NA

ex

pres

sion

Page 57: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Codon Adaptation Index

Top

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

Co

rrel

atio

n

CAI

Codon Usage

Top Frac. 0.48Bot. Frac. 0.02

Concurs with experimental data: CAI does not Predict mRNA and protein the same way shown to be the result of different levels ofdegredation

Page 58: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Another summary

Newer, larger data setLooking at Broad Catagories

I Post translational modifications?where we expect PT control --> low r. Where we don’t expect --> high r

Occupancy Variability

II Protein Degradation? CAI

III Experimental Error? next section

Page 59: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Expression and interactions

Types of protein-protein interactions– Protein complexes

• For example: proteasome, ribosome

– Aggregated interactions• Yeast two-hybrid (Y2H)• Genetic/physical interactions from MIPS

Page 60: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Relationship of P-P-interactions to abs. expression level

EE

EED

i

ji

ij

similar protein results

Page 61: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Protein-Protein Interactions & Expression

Correlations

between selected expression timecourses

(all pairs, control)

(strong interactions in perm- anent complexes, clearly diff.)

Cell Cycle CDC28 expt. (Davis) Sets of interactions

(from MIPS)

(Uetz et al.)

Pairwise interactions

Page 62: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Protein-Protein Interactions & Expression Correlations

Sets of interactions

between selected expression timecourses

(all pairs, control)

(from MIPS)

(strong interactions in perm- anent complexes, clearly diff.)

(Uetz et al.)

Cell Cycle CDC28 expt. (Davis)

Pairwise interactions

Page 63: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Permanent vs. Transient Complexes

-0.2

0

0.2

0.4

0.6

0.8

1

-0.2 0 0.2 0.4 0.6 0.8 1 1.2CC

Ro

sett

a

transient

Permanent

.

L Ribosome

S Ribosome

SAGA

Page 64: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Representing Expression Correlations within a Large Complex in a Matrix

MCM3MCM6CDC47MCM2CDC46CDC54

DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1

MC

M3

MC

M6

CD

C4

7M

CM

2C

DC

46

CD

C5

4

DP

B3

CD

C4

5D

PB

2C

DC

2C

DC

7P

OL

2H

YS

2P

OL

32

DB

F4

OR

C2

OR

C6

OR

C5

OR

C4

OR

C3

OR

C1

correlation

Page 65: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Permanent? Transient?

correlation

Page 66: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

L7/L12

correlation

Cell degrades all excess riboosmal proteins, except L7 & L12

Page 67: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Expression Correlations Segment Large Replication Complex into Component Parts

MCM3MCM6CDC47MCM2CDC46CDC54

DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1

MCMsprots.

ORC

Polym.&

Temporally transient

Page 68: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

No distinction visible between components

indicative of the possibility that the two components are really one?

Division is an artifact of their discovery—M Hochstrasser

ProteasomeOverall .43 20S .5019S .51

Proteasome

Page 69: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

%ORFs in complexes with significant correlation

Complex (> 2 ORFS, P < 0.001) n alpha Cdc15 Cdc28 Rosetta

Alpha, al-treh. anchor (50) 4     75% 75%

Cacinerum B (100) 3 67%   67%  

Chaperone containing T-complex TRiC (130) 8 50%   25%  

Pho85p (133.20) 6     33%  

Glycine decarboxylase (200) 3   67%    

ATPase (210) 4   100% 50%  

TRAPP (260.60) 10 40%      

Vps4p ATPase (260.70) 3   67%    

Nucleosome protein (320). 8 100% 87% 37% 75%

Cytochrome bc1 complex (420.30) 9   44% 78% 78%

Cytochrome c oxidase (420.40) 8 50% 38% 88% 50%

F0/F1 ATP synthase (complex V)(420.5) 15       60%

Ribonucleoside reductase (430) 4 50%      

Nuclear processing (440.10.10) 5   40%    

RNA polymerase I (510.10) 8 38% 38%   50%

RNA polymerase II (510.40.10) 9 44%      

Tornow & Mewes NAR 2003

Page 70: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Average Expression of all subnunits in a complex

y = 3028.4x1.0635

R2 = 0.6076

1

10

100

1000

10000

100000

1000000

10000000

0.1 1 10 100

mRNA expression (x103 )

pro

tein

ab

un

dan

ce

Page 71: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

PP INT Summary

Complexes broad catagories minimize noise

– Permanent complexes show strong co-expression Posttranscriptional regulation functions at a whole complex

level (Washburn et al PNAS 2003)

– Transient complexes have weaker co-expression

Aggregated BINARY interactions (Y2H, physical, genetic)Weak co-expression similar to transient complexes --noisy data?

ERROR ? minimized in larger groups

Page 72: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Global Summary

mRNA expression is related to protein abundance

Broad categories minimize noise that prevents us from seeing this correlation

Integrating various genomic data is integral to an analysis

Biologically relevant results can be seen when looking at mRNA and protein populations

Page 73: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Future Research

Further indepth analysis into protein degredation

Integrate new Tap Tagging data into protein abundance ref set

More intensive modeling of the relationship between mRNA and protein

Page 74: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Relationship between mRNA and Protein levels

dPi

dt= ks;i * mRNAi - kd;i Pi

where ks,i and kd,i are the protein synthesis and degradationrate constants, respectively, and is the growth rate

At steady state: Pi =ks;i * mRNAi

kdi

Page 75: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

N end rule PEST?

N End Rule in Yeast

1 10 100 1000 10000

Arg

Lys

Phe

Leu

Trp

Asn

His

Asp

Gln

Tyr

Ile

Glu

Cys

Ala

Ser

Thr

Gly

Val

Pro

Met

AA

In Vivo Hallf Life (Min)

Fast DecaySlow Decay

Page 76: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Results of protein degredation

Significantly higher correlation for fast decaying proteins

Not for slow decayhigh decay rate is indicative of greater

cellular control over level e.g. proteins with half lives of days – cell can’t tightly control

Results are same for mRNA degredation --half lives have been quantified

Page 77: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Acknowledgments

Gerstein Lab

This workRonald Jansen (MSKCC)Yuval Kluger (NYU)

Other ProjectsHaiyuan YuHedi HegyiJimmy LinRajdeep DasJiang QianNick Luscombe

Entire Gerstein Lab

Weissman LabZheng Lian

Keck (HHMI Biopolymer Laboratory and W. M. Keck Foundation Biotechnology Resource Laboratory)

Christopher ColangeloKen Williams

Thesis Committee

Mark GersteinSherman WeissmanKevin White

Genetics Department

SABRINA

Page 78: Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.

Liana