Deep proteome and trancriptome mapping of human cervical cancer cell line
-
Upload
jamie-barba -
Category
Technology
-
view
763 -
download
2
Transcript of Deep proteome and trancriptome mapping of human cervical cancer cell line
Deep proteome and
transcriptome mapping
of a human cancer
cell line
BARBA | BIADOMANG |CUA | DAYANAN | LOPEZ
AUTHORS: N. Nagaraj, J. Wisniewski, T. Geiger, J. Cox, M. Kircher, J. Kelso, S. Paabo, M. Mann
JOURNAL: Molecular Systems Biology
DATE PUBLISHED: October 29, 2011
Human genome is comprise of a mere 20,000protein coding genes.
RNA-Seq Transcriptomics
Transcripts between
8000-16000
of protein coding genes
High-res MS-based Proteomics
Essentially complete proteome of model
organism (yeast)
Limited to 4000-6000 protein groups in
mammalian systems
“…explore a human proteome in the depth achievable with current technology and to
compare it with the corresponding transcriptome.”
METHODOLOGY
HeLa Cells Lysate
Flash freezingStore at
-80oC
Cell LysisSonication
CentrifugationProtein content determination
Protein Fractionation by Gel
Filtration
0.1mL cell lysate
Load onto GL Column
Elution with buffer
Protein Digestion and Peptide
Fractionation
Removal of detergent
Protein Digestion
LysC
Gluc
Trypsin
Mass Spectroscopy
PurificationRP C18
ChromatographyMS
RNA sequence
Extraction QuantificationRNA library preparation
Enrichment
FragmentationRNA fragments
copied into DNABlunt ends conversion
Addition of deoxyadenosine
Ligation of forked adaptors
Amplify Sequencing
Data Availability
Gene and Transcript Quantification
Data analysis
RESULTS AND DISCUSSION
Sample: The HeLa cells
• HeLa cells– Human cervical carcinoma cell line
• “Immortal cells”: can grow indefinitely, be frozen for decades
• Standardized field of tissue culture
– Named after Henriette Lacks by a scientist at John Hopkins Hospital• A piece of her tumor was taken
• Her cells never died
– Prolific growth maybe due to HPV18• HPV18 viral proteins (E6 and E7) suppresses p53 and pRb
gene products, respectively.
Proteome Coverage Study
• Objective: to achieve maximum proteome coverage at a reasonable measurement time
– Procedure: Investigate effects of protein fractionation, proteolytic digestion, peptide fractionation, and reverse phase chromatography
Proteome Coverage Study
• Protein fractionation– Gel filtration: separation based on size and shape
• Proteolytic digestion– Trypsin: C-terminal side of lysine and arginine
– Glu-C: C-terminal of glutamic residues
– Lys-C: carboxyl side of lysine residues
– Note: Protein digestion heavily affects effective protein characterization and identification by mass spectrometry• Overlapping fragments = larger sequence coverage
Proteome Coverage Study
• Pipette-based prefractionations
– Strong anion exchange resin
• Independent of pH
• Mostly used for deep coverage of the composition of the sample or if specific peptides should be enriched
• Reverse phase chromatography
– Reduced the complexity of the peptide mixture by selecting peptides for tandem mass spectrometry according to their polarity
Proteome Coverage Study
• LC-MS/MS analysis– Peptide MS spectra
• Interpretation by comparison with lists from generated from theoretical digestion of protein
– Fragment MS/MS spectra• Interpretation by comparison from theoretical
fragmentation of peptide
– Elution time of peptide is based on its polarity
– Repeated extensively, in order to increase the number of peptides, thereby making the protein less complex, for which tandem mass spectra are acquired
Proteome Coverage Study
• Procedure is referred as “shotgun” proteomics
– Most successful strategy to achieve extensive proteome coverafe
– Summary: protein sample is extracted from their biological source, subjected to enzymatic digestion, the resulting peptide mixtures are analysed by LC-MS/MS
• Additional augmented fractionation steps for proteins/peptides can also be conducted
MaxQuant Computational Proteomics Environment
• MaxQuant– Quantitative proteomics software package
designed for analyzing large mass spectrometric data sets
– Has an integrated search engine, Andromeda
– Supported instrument: LTQ-Orbitrap• Orbitrap: ions circulate around a central, spindle-
shaped electrode
• Highly accurate: axial frequency oscillation, determined with high precision, is proportional to the square root of m/z.
MaxQuant Computational Proteomics Environment
• Number of runs: 2 337 336 high resolution fragmentation spectra and high-accuracy precursor masses
• Search Engine: Andromeda– Algorithm: uses a probability based approach to
match tandem mass (MS/MS) spectra to peptide sequences in databases
– Median peptide score: 121, 6% below has a score of 60• For each score, corresponds to the sum of the highest ions
score for each distinct sequence
MaxQuant Computational Proteomics Environment
• Average identification of fragmentation spectra: 43%
• Average absolute mass deviation of the precursors for the matched fragment masses: 1.2 and 4.8 p.p.m
MaxQuant Computational Proteomics Environment
• Result of analysis
– Identified and quantified number of peptides
• 163 784 peptides
– FDR (false discovery rate): 1%
– Out of 163 784 peptides
• 84 051 from tryptic digestion
• 52 108 from Lys-C digestion
• 44 704 from Glu-C digestion
MaxQuant Computational Proteomics Environment
• Result of analysis– From obtained data, MaxQuant identified 10 255
proteins with 99% confidence• Lower bound of the number of proteins expressed in HeLa
cells
– There were observed overlapping fragments of enzymatic cleavage• Tryptic digestion: yielded highest number of identifications• Lys-C digewstion: 85% overlapped with Trypsin• Glu-C digestion: 5.2% novel identifications• Shows that <5% of all proteins were only identified by one
peptide• Taken all together, >25% median sequence coverage
ENSEMBL database and GENSCAN predictions
• ENSEMBL-annotated human protein-coding genes– MS/MS spectra: searched against the ENSEMBL
database with GENSCAN predictions
– 10 255 proteins were mapped to 9207 human protein-coding genes• Most identified number of genes at chromosome 1
• Least number of identified genes at chromosome 21
– GENSCAN preidictions: >1900 peptides not known to ENSEMBL genes
ZAB
Completeness of Detected Proteome
• Inspecting the macrocomplexes which are functionally necessary
• Proteosome, spliceosome, histone modifying complexes and respiratory chain complexes
• Corum protein complex database
Corum protein complex database
• Collection of experimentally verified mammalian protein complexes
– protein complex function
– Localization
– subunit composition
– literature references
• Mean proteome coverage of all Corum protein was >95%
• Transcriptome coverage 96.5%
• Among the lower coverage which is due to cell type specificity are (next slide)
• Sarcoglycan-sarcospan provides structural integrity in muscle tissues
• SNARE for neurotransmitter release in synapses• ITGA2b-ITGB3 - a fibronectin receptor that plays a
crucial role in coagulation
Complex Normally Expressed % Coverage
Sarcoglycan-sarcospan
Muscle 20
SNARE (Soluble N-
ethylmaleimide sensitive factor Attachment protein
Receptor )
Neuronal tissue 40
ITGA2b-ITGB3 Platelets 50
• 5% of the HeLa cell population was in mitosis
– 61/63 proteins in a reference set of cell cycle-specific proteins
– High coverage of the most metabolic pathways pertaining to basic cellular function
• Comprehensiveness of the proteome is hard to determine by comparison with pathway databases because they contain cell type-specific proteins
Quantitative Analysis
• Deep-sequencing transcriptomics– Proteomics data - >90% complete
• Transciptome + proteome data – 10,000 - 12, 000 genes expressed in HeLa cells
• iBAQ (intensity based absolute quantification)– incorporating individual peptide signals in MS and
normalized by the number of observable peptides of the protein
– Estimate the absolute amount of each protein
Quantitative Analysis
• 40 most abundant protein comprised 25% of the proteome
– Filamin A, pyruvate kinase, enolase, vimentin, Hsp 60
• 600 proteins-> 75% of the HeLa cell proteome mass
Quantitative Analysis
• Contribution of each protein to the total mass in combination with the knowledge of number of cells in the initial sample
– roughly estimate the absolute copy number of the proteins in HeLa cells
Quantitative Analysis
• Ranked distribution of proteins
– 90% protein is within a range of a factor of 60 above or below the median protein copy number of 18, 000 molecules per cel
– The lower half accounts for <2% of its total mass
Quantitative Analysis
• Protein abundance values
– Used to estimate the proportional contribution of any:
• individual protein,
• protein complex and
• protein class
to the total proteome
Quantitative Analysis
• Ribosomes (encoded by only 1% human gene)
– 195 proteins contributed 6% to total protein mass
• Actin cytoskeletoncontributes four-fold more to the proteome mass than expected from the number of genes and proteins
Quantitative Analysis
• Integral membrane
– 25% of the genome
– 7.6% protein mass
Quantitative Analysis
• Protein folding
– 2% of the identified proteome by number
– 8% of proteome mass
Quantitative Analysis
Percentage to the Total Mass
“Protein folding” proteins
Integral membrane Proteins
Human genome 25 2
Protein Mass 7.6 8
• Differences are due to cell-type specific functions of these proteins
Structural proteins and proteins in basic machineries
Regulatory proteins
>
Ribosome proteins form tight cluster at the top end
Proteosome also abundant but not its regulatory subunits (factor of 100 less)
Cytoskeletal and metabolic proteins extend over a broad range
Enolase – highest expression value
Glycogen phosphorylase – 100,000-fold less at protein level and 10,000 less at transcript level
Regulatory proteins such as protein kinases and transcription factors have, on average, lower expression than the structural proteins
Each category spans a large expression range
Expression levels can provide starting points for systems biologicalmodeling of the cell
“Given the rapid technological progress in both fields, we predict that the required
depth of 10,000–12,000 genes will be routinely reachable
soon.”
TRANSCIPTOME RNA-Seq
PROTEOME High-res MS
THANK YOU