Correlating mRNA and protein abundance via genomic and proteomic characteristics

78

description

Correlating mRNA and protein abundance via genomic and proteomic characteristics. Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004. outline. Why analyze mRNA and protein correlations Background Disparate Data Sources Correlating mRNA and Protein Results Other analyses - PowerPoint PPT Presentation

Transcript of Correlating mRNA and protein abundance via genomic and proteomic characteristics

Page 1: Correlating mRNA and protein abundance via genomic and proteomic characteristics
Page 2: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Correlating mRNA and protein abundance via genomic and proteomic characteristics

Dov Greenbaum

Gerstein LabThesis Seminar

April 21, 2004

Page 3: Correlating mRNA and protein abundance via genomic and proteomic characteristics

outline

Why analyze mRNA and protein correlationsBackground

Disparate Data Sources Correlating mRNA and Protein

ResultsOther analysesFormalism – comparing genome, transcriptome and proteome in terms of broad categories

New Data SetsAnalysis via Broad CategoriesAnalysis of factors affecting correlations

Another reason to expect correlations Expression and Protein Interactions

Page 4: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Why Correlate mRNA & Protein?

0500

100015002000250030003500400045005000

mRNA Protein

Experiments

Page 5: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Both mRNA and Protein Levels are necessary for complete analysis

Combinations of RNA and protein detection approaches have recently aided in theidentification of biomarkers in cancer Hegde et al Current Opinion in Biotech 2003

Shown mathematically in Hatzimanikatis et al Biotechnology 1999

Page 6: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Relationship between mRNA and Protein levels

dPi

dt= ks;i * mRNAi - kd;i Pi

where ks,i and kd,i are the protein synthesis and degradationrate constants, respectively,

At steady state: Pi =ks;i * mRNAi

kdi

Page 7: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Methods for determining mRNA expressionEach have Strengths and Weaknesses

Page 8: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Methods for determining protein abundance

2DE Gel Electrophoresis– (Klose, 1975; O’Farrell, 1975)• Multiple staining options• Small dynamic range• limited in what it can detect

Page 9: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Methods for determining protein abundance

ICAT– ICAT reagent-- relative

levels– VB dynamic range– Cannot detect post-

translational modifications– it require proteins to contain

cysteine residues, & these residues must be in the region of a peptide that is produced during proteolytic

cleavage

Page 10: Correlating mRNA and protein abundance via genomic and proteomic characteristics

MudPit

Really only HT that candetect PT modifications

Page 11: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Other Methods for determining protein abundance

DIGE– e.g. Cy3 vs cy5

labeling– Very big dynamic

range

2D-electrophoresis

Tap Tagging Weissman & O’Shea(Oct 2003)

Page 12: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Other Methods for determining protein abundance

020000

4000060000

80000

2DE

DIG

ICA

TM

PT

apA

ffyMax

01000

20003000

4000

2DE

DIG

ICA

TM

PT

AP

Max Prot

Page 13: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Same mRNA levels yet protein data varied > 20X

N ~100, r = 0.9

Protein Quantification via measurement of radioactivity

Gygi et al Molecular and Cellular Biology,1999.

Page 14: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Same mRNA levels yet protein data varied > 20X

Do some ORFs bias the results?

73 proteins (69%) R = 0.356

Page 15: Correlating mRNA and protein abundance via genomic and proteomic characteristics

mRNA vs Proteinr = 0.74

Protein Quantification via image analysis

Futcher et al Molecular and Cellular Biology, 1999

Page 16: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Jury is out…

Gygi et al: “This study revealed that transcript levels provide little predictive value with respect to the extent of protein expression.”

Futcher et al: “there is a good correlation between protein abundance and mRNA

abundance for the proteins that we have studied”.

Page 17: Correlating mRNA and protein abundance via genomic and proteomic characteristics

mRNA vs Protein

r =0.67

Greenbaum et al Bioinformatics 2001

Page 18: Correlating mRNA and protein abundance via genomic and proteomic characteristics

3 Genes in Lung AdenocarcinomasOp18, Annexin IV, and GAPD r = 0.025

Chen et al Molecular & Cellular Proteomics, 2002.

Page 19: Correlating mRNA and protein abundance via genomic and proteomic characteristics

murine hematopoietic precursor MPROchange in expression 0 - 72 hr

Page 20: Correlating mRNA and protein abundance via genomic and proteomic characteristics

murine hematopoietic precursor MPROchange in expression 0 - 72 hr

R = 0.58~ 80% of the genes are located in the first and third quadrants

Page 21: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Ratios of wt+gal to wt gal ICAT vs microarray

N ~ 290, r = 0.6

Ideker et al Science, 2001

Page 22: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Yeast growth under two different mediar = 0.45 but almost 1.0 for same loci in same pathway

Washburn et al PNAS 2003

Page 23: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Integrating multiple sources of Information

The challenge for computational biology is to provide methodologies for transforming high-throughput heterogeneous data sets into biological insights about the underlying mechanisms. Although high-throughput assays provide a global picture, the details are often noisy, hence conclusions should be supported by several types of observations. Integration Integration of data from assays that examine cellular of data from assays that examine cellular systems from different viewpointssystems from different viewpoints (for instance, gene expression and protein-protein interactions) can lead to a more can lead to a more coherent reconstruction and reduce the coherent reconstruction and reduce the effects of noiseeffects of noise. Nir Friedman Science 2004

Page 24: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Sources of DataData set Description Size [ORFs] Reference

mRNA expression

YoungGene chip profiles yeast cells with mutations that affect transcription 5455 Holstege et al. (1998)

Church Gene chip profiles of yeast cells under four different conditions 6263 Roth et al. (1998)

SamsonComparing gene chip profiles for yeast cells subjected to alkylating agent 6090 Jelinsky et al. (1998)

SAGE Yeast cells during vegetative growth 3778 Velculescu et al. (1997)

Reference expressionScaling and integrating the mRNA expression set into one data source 6249 -

Protein abundance

2-DE #1Measurement of yeast protein abundance by two-dimensional (2D) gel electrophoresis and mass spectrometry 156 Gygi et al. (1999)

2-DE #2 Similar to 2-DE set #1 71 Futcher et al. (1999)

TransposonLarge-scale fusions of yeast genes with lacZ by transposon insertion 1410

Ross-Macdonald et al. (1999)

Reference abundanceScaling and integrating the 2-DE data sets into one data source 181 -

Annotation

Annotated Localization

Subcellular localizations of yeast proteins 2133 (6280) Drawid et al. (2000)

Transmem-brane segments

Predicted transmembrane and soluble proteins in yeast 2710 (6280) Gerstein (1998)

MIPS functions Functional categories for yeast ORFs 3519 (6194) Mewes et al. (2000)

GOR secondary structure

Predicted secondary structure for yeast ORFs 6280 Gerstein (1998)

Page 25: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Reference mRNA Sets

Young

ChurchSamson

SAGE

Page 26: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Fitting Protein Data

Original Set

Page 27: Correlating mRNA and protein abundance via genomic and proteomic characteristics

mRNA vs Protein

r =0.67

Greenbaum et al Bioinformatics 2001

mRNA expression Reference Set 3 Affy Chip sets and SAGE6249 ORFs

Page 28: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Outliers (2STDEV from the mean)

ORF FUNCTION MIPSYBR118W translation elongation factor eEF1 alpha-A chain 5,30YER065C Isocitrate Lyase 1,2, 30YMR303C Alcohol dehydrogenase II 1, 2, 30YOL086C Alcohol dehydrogenase I 1, 2, 30YJR009C Glyceraldehyde-3-phosphate dehydrogenase 2 1, 2, 30YGR192C Glyceraldehyde-3-phosphate dehydrogenase 3 1, 2, 30YJR104C Copper-zinc superoxide dismutase 11,30YML054C lactate dehydrogenase cytochrome b2 1,2,30YJL052W glyceraldehyde-3-phosphate dehydrogenase 1 1,2,30YKR059W Translation initiation factor 5,30YML008C S-adenosyl-methionine delta-24-sterol-c-methyltransferase 1,30YFL022C Phenylalanine-- tRNA Ligase beta chain 5,30YJL008C Component of chaperonin-containing T-complex 6,30YPL160W leucine--tRNA ligase 5,30YOR361C translation initiation factor eIF3 subunit 3,5,30YCL030C phosphoribosyl-AMP cyclohydrolase 1YNL209W heat shock protein of HSP70 family 5,30

abo

ve t

ren

dli

ne

bel

ow

tre

nd

lin

e

High ProteinMetabolism (1)

Energy(2)

Low ProteinProt. Syn. (5)Prot. Fate (6)

Page 29: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Later larger datasets concurred with these results in that Generally…

1

10

100

1000

10000

100000

1000000

10000000

0.1 1 10 100 1000

mRNA

pro

tein

Alcohol dehydrogenase is also a stress induced protein in many organisms (Matton et al. 1990; An et al. 1991; Millar et al. 1994), Faster Ramp Up?

AA metabolism & Energy are 2X as likely to have high protein vs mRNA than the general population

Protein synthesis (~35% of all protein synthesis genes) and Protein fate (folding, modification, destination) are more likely to have low protein vs mRNA than the general population

Page 30: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Non-Outliers Generally…Tight Regulation by the cell

Only 3% of transcription associated genes (n = 441) have significantly uncorrelated mRNA and protein levels (2STDEV from trendline)

Transcription Assoc. genes are 25% of the essential genes in yeast.

Essential Genes as a group have higher correlations than the general yeast population

7% of Cell Cycle associated genes (n = 432) have significant non-correlation

Page 31: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Quick Summary

• Why correlate mRNA and protein levels?• Merged Disparate Data Sets

– Distinct but complimentary

• Global Correlations• Outliers are interesting:

– Metabolism & Energy Relatively high protein levels

– Protein Synthesis & Protein Fate low protein levels

Page 32: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Data Set Size

~6,000 ORFs

~6,000 ORFs5 Affymetrix GeneChips+ SAGE data

~170 ORFs2 DE-gel datasets

Page 33: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Enrichments

(F,[v,S]) -(F,[w,G])(F,[w,G])(Feature, [v,S], [w,G]) =

V & W are weights (expression level) of Sets S & G

Page 34: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Visual Formalism

~170 ORFs ~6,000 ORFs

Page 35: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Depletion of Random Coil Secondary Structure STABILITY

Concurrence with data from Perczel et al Chemistry 2003Regarding stability of specific secondary structures

Page 36: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Alanine’s, Glycines, Valines result in more compact structures More compact = more stable (i.e. thermophilic enzymes tend to be very compact)

Enrichment of Amino Acids STABILITY

Page 37: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Enrichment of Amino Acids

Simple story: translatome is enriched in same way as

transcriptome

Page 38: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost

yeast cell favors the expression of shorter ORFs over longer ones (as opposed to long lightweight ORFs – see MW of aa)

This selection is happening, for the most part at the transcriptome level--------------------------------------------------------------------------------------------------

Neg Correlation between ORF length and mRNA expression Jansen & Gerstein 2000 (And to a lesser degree with Protein Abundance)

Effect of transcription

Page 39: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Enrichment of Molecular Weights/BiomassAbundant proteins are smaller = reduces cost

CONCURS with experimental results from Akashi, Genetics 2003See also: Akashi,Genetics 1996 & Moriyama and Powell, NAR 1998

hypothesize that this trend exists in S. cerevisiae, D. melanogaster and E. coli. (although probably not in C. elegans)

Effect of transcription

Page 40: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Enrichment of Functional Categories

1

10

100

1000

10000

100000

1000000

10000000

0.1 1 10 100 1000

mRNA

pro

tein

Page 42: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Enrichment of localization - BIAS?

(Drawid & Gerstein. 2000),

Page 43: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Review

Formalism

Different gene sets b/c of limited data

Enrichments

concur with experimental results

Page 44: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 45: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 46: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 47: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 48: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Fitting Protein Data

Newer SetMudpit fit first into mRNA space

then inverse fit back into protein space then each of the data sets is fit via least squares onto the Aebersold data set

  Aebersold Futcher Reference Yates Gygi mRNA

Aebersold 125 29 113 102 116 125

Futcher   73 61 56 64 69

Reference     150 143 128 150

Yates       1436 785 1346

Gygi         1504 1480

mRNA           6250

Page 49: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Global Correlation

0.1

1

10

100

1000

0.1 1 10 100 1000

mRNA Expression

Pro

tein

Ab

un

dan

ce

MudPit (1)MudPit (2)2DE (1)2DE (2)R = 0.66

mRNA Set 6249 ORFs Protein Set # 2 2 2DE sets & 2 Mudpit ~2000 ORFs

Page 50: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Functional Categories

0.1

1

10

100

1000

0.1 1 10 100

mRNA Expression

Pro

tein

Ab

un

dan

ce

Cell Cycle (R=0.71)

Reference Data (R=0.66)

Cell Rescue (R=0.45)

Co-regulated proteins

High: ion transport , INTERACTION WITH THE CELLULAR ENVIRONMENT, CELL FATE LOW: METABOLISM ,FATE. CELLULAR COMMUNICATION/SIGNAL TRANSDUCTION MECHANISM

Page 51: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Subcellular Localization

0.1

1

10

100

0.1 1 10 100mRNA Expression

Pro

tein

Ab

un

da

nc

e

Nucleolus (R=0.8)

Cell Periphery (R=0.74)

Reference Data (R=0.66)

Mitochondria (R=0.42)

Subcellular LocalizationMudpit does not have the 2DE biases

Lack of correlation in mitochondria Concurs

with experimental results from

Ohlmeier S et al.JBC 2004

Page 52: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Budr =0.76

Golgir = 0.28

Extracellularr = 0.33

Nucleusr = 0.49

Cytoplasmr = 0.50

Mitochondriar = 0.50

Cell Wallr =0.52

Endosomer = 0.87

ER r = 0.61

Membraner = 0.73

P M

r global = 0.46

Expression as a function of localization is well correlated with protein levels (latest data)

Page 53: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Why would we not find strong correlations?

Post translational modifications

Protein degradation

Error and Bias

Page 54: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Top

Top

Top

Bottom

Bottom

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Co

rrel

atio

n

Occupancy CAI Coefficient of Variation

Ribosomal OccupancyArava et al. (2003) Proc. Natl. Acad. Sci. USA

Ribosomal Occupancy

Top Frac. 0.78Bot. Frac. 0.30

Our results concurred with experimental findings by Brown and Herschlag’s groups:

Moreover:mRNAs not associated with any polysomes have even less of a correlation r = 0.2 v. strong translational control

Page 55: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Variability of mRNA expression

Top

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Co

rre

lati

on

Coefficient of Variation

mRNA Expression Variability

Top Frac. 0.89Bot. Frac. 0.20

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

timemR

NA

ex

pres

sion

Page 56: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Variability of mRNA expression

Top

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Co

rre

lati

on

Coefficient of Variation

mRNA Expression Variability

Top Frac. 0.89Bot. Frac. 0.20

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

timemR

NA

ex

pres

sion

Page 57: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Codon Adaptation Index

Top

Bottom

0

0.1

0.2

0.3

0.4

0.5

0.6

Co

rrel

atio

n

CAI

Codon Usage

Top Frac. 0.48Bot. Frac. 0.02

Concurs with experimental data: CAI does not Predict mRNA and protein the same way shown to be the result of different levels ofdegredation

Page 58: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Another summary

Newer, larger data setLooking at Broad Catagories

I Post translational modifications?where we expect PT control --> low r. Where we don’t expect --> high r

Occupancy Variability

II Protein Degradation? CAI

III Experimental Error? next section

Page 59: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Expression and interactions

Types of protein-protein interactions– Protein complexes

• For example: proteasome, ribosome

– Aggregated interactions• Yeast two-hybrid (Y2H)• Genetic/physical interactions from MIPS

Page 60: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Relationship of P-P-interactions to abs. expression level

EE

EED

i

ji

ij

similar protein results

Page 61: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Protein-Protein Interactions & Expression

Correlations

between selected expression timecourses

(all pairs, control)

(strong interactions in perm- anent complexes, clearly diff.)

Cell Cycle CDC28 expt. (Davis) Sets of interactions

(from MIPS)

(Uetz et al.)

Pairwise interactions

Page 62: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Protein-Protein Interactions & Expression Correlations

Sets of interactions

between selected expression timecourses

(all pairs, control)

(from MIPS)

(strong interactions in perm- anent complexes, clearly diff.)

(Uetz et al.)

Cell Cycle CDC28 expt. (Davis)

Pairwise interactions

Page 63: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Permanent vs. Transient Complexes

-0.2

0

0.2

0.4

0.6

0.8

1

-0.2 0 0.2 0.4 0.6 0.8 1 1.2CC

Ro

sett

a

transient

Permanent

.

L Ribosome

S Ribosome

SAGA

Page 64: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Representing Expression Correlations within a Large Complex in a Matrix

MCM3MCM6CDC47MCM2CDC46CDC54

DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1

MC

M3

MC

M6

CD

C4

7M

CM

2C

DC

46

CD

C5

4

DP

B3

CD

C4

5D

PB

2C

DC

2C

DC

7P

OL

2H

YS

2P

OL

32

DB

F4

OR

C2

OR

C6

OR

C5

OR

C4

OR

C3

OR

C1

correlation

Page 65: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Permanent? Transient?

correlation

Page 66: Correlating mRNA and protein abundance via genomic and proteomic characteristics

L7/L12

correlation

Cell degrades all excess riboosmal proteins, except L7 & L12

Page 67: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Expression Correlations Segment Large Replication Complex into Component Parts

MCM3MCM6CDC47MCM2CDC46CDC54

DPB3CDC45DPB2CDC2CDC7POL2HYS2POL32DBF4ORC2ORC6ORC5ORC4ORC3ORC1

MCMsprots.

ORC

Polym.&

Temporally transient

Page 68: Correlating mRNA and protein abundance via genomic and proteomic characteristics

No distinction visible between components

indicative of the possibility that the two components are really one?

Division is an artifact of their discovery—M Hochstrasser

ProteasomeOverall .43 20S .5019S .51

Proteasome

Page 69: Correlating mRNA and protein abundance via genomic and proteomic characteristics

%ORFs in complexes with significant correlation

Complex (> 2 ORFS, P < 0.001) n alpha Cdc15 Cdc28 Rosetta

Alpha, al-treh. anchor (50) 4     75% 75%

Cacinerum B (100) 3 67%   67%  

Chaperone containing T-complex TRiC (130) 8 50%   25%  

Pho85p (133.20) 6     33%  

Glycine decarboxylase (200) 3   67%    

ATPase (210) 4   100% 50%  

TRAPP (260.60) 10 40%      

Vps4p ATPase (260.70) 3   67%    

Nucleosome protein (320). 8 100% 87% 37% 75%

Cytochrome bc1 complex (420.30) 9   44% 78% 78%

Cytochrome c oxidase (420.40) 8 50% 38% 88% 50%

F0/F1 ATP synthase (complex V)(420.5) 15       60%

Ribonucleoside reductase (430) 4 50%      

Nuclear processing (440.10.10) 5   40%    

RNA polymerase I (510.10) 8 38% 38%   50%

RNA polymerase II (510.40.10) 9 44%      

Tornow & Mewes NAR 2003

Page 70: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Average Expression of all subnunits in a complex

y = 3028.4x1.0635

R2 = 0.6076

1

10

100

1000

10000

100000

1000000

10000000

0.1 1 10 100

mRNA expression (x103 )

pro

tein

ab

un

dan

ce

Page 71: Correlating mRNA and protein abundance via genomic and proteomic characteristics

PP INT Summary

Complexes broad catagories minimize noise

– Permanent complexes show strong co-expression Posttranscriptional regulation functions at a whole complex

level (Washburn et al PNAS 2003)

– Transient complexes have weaker co-expression

Aggregated BINARY interactions (Y2H, physical, genetic)Weak co-expression similar to transient complexes --noisy data?

ERROR ? minimized in larger groups

Page 72: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Global Summary

mRNA expression is related to protein abundance

Broad categories minimize noise that prevents us from seeing this correlation

Integrating various genomic data is integral to an analysis

Biologically relevant results can be seen when looking at mRNA and protein populations

Page 73: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Future Research

Further indepth analysis into protein degredation

Integrate new Tap Tagging data into protein abundance ref set

More intensive modeling of the relationship between mRNA and protein

Page 74: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Relationship between mRNA and Protein levels

dPi

dt= ks;i * mRNAi - kd;i Pi

where ks,i and kd,i are the protein synthesis and degradationrate constants, respectively, and is the growth rate

At steady state: Pi =ks;i * mRNAi

kdi

Page 75: Correlating mRNA and protein abundance via genomic and proteomic characteristics

N end rule PEST?

N End Rule in Yeast

1 10 100 1000 10000

Arg

Lys

Phe

Leu

Trp

Asn

His

Asp

Gln

Tyr

Ile

Glu

Cys

Ala

Ser

Thr

Gly

Val

Pro

Met

AA

In Vivo Hallf Life (Min)

Fast DecaySlow Decay

Page 76: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Results of protein degredation

Significantly higher correlation for fast decaying proteins

Not for slow decayhigh decay rate is indicative of greater

cellular control over level e.g. proteins with half lives of days – cell can’t tightly control

Results are same for mRNA degredation --half lives have been quantified

Page 77: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Acknowledgments

Gerstein Lab

This workRonald Jansen (MSKCC)Yuval Kluger (NYU)

Other ProjectsHaiyuan YuHedi HegyiJimmy LinRajdeep DasJiang QianNick Luscombe

Entire Gerstein Lab

Weissman LabZheng Lian

Keck (HHMI Biopolymer Laboratory and W. M. Keck Foundation Biotechnology Resource Laboratory)

Christopher ColangeloKen Williams

Thesis Committee

Mark GersteinSherman WeissmanKevin White

Genetics Department

SABRINA

Page 78: Correlating mRNA and protein abundance via genomic and proteomic characteristics

Liana