Protein interaction codes(s)? • Real world programming • … · 2020. 7. 9. · Harvard-MIT...
Transcript of Protein interaction codes(s)? • Real world programming • … · 2020. 7. 9. · Harvard-MIT...
Harvard-MIT Division of Health Sciences and Technology
HST.508: Genomics and Computational Biology
Protein1: Last week's take home lessons
• Protein interaction codes(s)? • Real world programming • Pharmacogenomics : SNPs • Chemical diversity : Nature/Chem/Design • Target proteins : structural genomics • Folding, molecular mechanics & docking • Toxicity animal/clinical : cross-talk
1
Protein2: Today's story & goals
• Separation of proteins & peptides • Protein localization & complexes • Peptide identification (MS/MS)
– Database searching & sequencing. • Protein quantitation
– Absolute & relative • Protein modifications & crosslinking
• Protein - metabolite quantitation 2
Why purify?
• Reduce one source of noise (in identification/quantitation)
• Prepare materials for in vitro experiments (sufficient causes)
• Discover biochemical properties
3
(Protein) Purification Methods
• Charge: ion-exchange chromatography, isoelectric focusing
• Size: dialysis, gel-filtration chromatography, gel-electrophoresis, sedimentation velocity
• Solubility: salting out • Hydrophobicity: Reverse phase chromatography • Specific binding: affinity chromatography • Complexes: Immune precipitation (± crosslinking)• Density: sedimentation equilibrium
4
Protein Separation by Gel Electrophoresis
• Separated by mass: Sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis.– Sensitivity: 0.02ug protein with a silver stain.– Resolution: 2% mass difference.
• Separated by isoelectric point (pI): polyampholytes pH gradient gel.– Resolution: 0.01 pI.
5
Comparison of predicted with observed protein properties (localization, postsynthetic
modifications) E.coli
See Link et al. 1997 Electrophoresis 18:1259-313 (Pub) (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9298646&dopt=Abstract)
6
Computationally checking proteomic data
Property Basis of calculation
Protein charge RKHYCDE (N,C), pKa, pH (Pub)(http://gcg.nhri.org.tw/isoelectric.html#algorithm)
Protein mass Calibrate with knowns (complexes) Peptide mass Isotope sum (incl.modifications) Peptide LC aa composition linear regression Subcellular Hydrophobicity, motifs (Pub) Expression Codon Adaptation Index (CAI)
7
Protein2: Today's story & goals
• Separation of proteins & peptides • Protein localization & complexes • Peptide identification (MS/MS)
– Database searching & sequencing. • Protein quantitation
– Absolute & relative • Protein modifications & crosslinking
• Protein - metabolite quantitation 8
Cell fraction: Periplasm2D gel: SDS mobility
isoelectic pH
See Link et al. 1997 Electrophoresis 18:1259-313 (Pub) (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9298646&dopt=Abstract)
9
Cell localization predictions
TargetP: using N-terminal sequence discriminates mitochondrion, chloroplast, secretion, & "other" localizations with a success rate of 85%. (pub) (http://www.cbs.dtu.dk/services/TargetP/), (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10891285&dopt=Abstract)
Gromiha 1999, Protein Eng 12:557-61. A simple method for predicting transmembrane alpha helices with better accuracy. (pub) (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10436081&dopt=Abstract)
Using the information from the topology of 70 membrane proteins... correctly identifies 295 transmembrane helical segments in 70 membrane proteins with only two overpredictions.
10
------ ---------- ------ ----------- -------------
Isotope calculations
Mass resolution 0.1% vs. 1 ppm
Symbol Mass Abund. Symbol Mass Abund.
H(1) 1.007825 99.99 H(2) 2.014102 0.015 C(12) 12.000000 98.90 C(13) 13.003355 1.10 N(14) 14.003074 99.63 N(15) 15.000109 0.37 O(16) 15.994915 99.76 O(17) 16.999131 0.038 S(32) 31.972072 95.02 S(33) 32.971459 0.75
11
Computationally checking proteomic data
Property Basis of calculation
Protein charge RKHYCDE (N,C), pKa, pH (Pub)(http://gcg.nhri.org.tw/isoelectric.html#algorithm)
Protein mass Calibrate with knowns (complexes) Peptide mass Isotope sum (incl.modifications) Peptide LC aa composition linear regression Subcellular Hydrophobicity, motifs (Pub) Expression Codon Adaptation Index (CAI)
12
trypsin
High Performance Liquid Chromatography
13
Mobile Phase of HPLC
• The interaction between the mobile phase and sample determine the migration speed. – Isocratic elution: constant migration speed in
the column. – Gradient elution: gradient migration speed in
the column.
14
Stationary Phase of HPLC
• The degree of interaction with samples determines the migration speed. – Liquid-Solid: polarity. – Liquid-Liquid: polarity. – Size-Exclusion: porous beads. – Normal Phase: hydrophilicity and lipophilicity.– Reverse Phase: hydrophilicity and lipophilicity.– Ion Exchange. – Affinity: specific affinity.
15
See
Sereda, T. et al. “Effect of the α-amino group on peptide retention behaviour in reversed-phase chromatography.
Wilce, et al. “High-performance liquid chromatography of amino acids, peptides and proteins.” Journal of Chromatography, 632 (1993) 11-18.
16
17
First Dimension:Reverse Phase Chromatography Separation By Hydrophobicity
P S W
CMV
A R
C C
T KDQ
G
A G
L F E K
Second Dimension: Mass Spectrometry Separation by Mass
RT
m/z
min
A Map is Like a 2D Peptide Gel
18
abun
danc
e
DAFLGSFLYEYSR
0 X 13C 1 X 13C
2 X 13C
3 X 13C
rt
abun
danc
e @ 36.418 min
K.Leptos 2001
What Information Can Be Extracted From A Single Peptide Peak
m/z
Isotopic Variants of
m/z
Directed Analysis of Large Protein
Complexes by 2D separation:
strong cation exchange and reversed-phased
liquid chromatography.
Link, et al. 1999, Nature Biotech. 17:676-82. (Pub) (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_ui ds=10404161&dopt=Abstract)
19
A new 40S subunitprotein
See Link AJ, et al., Nat Biotechnol. 1999 Jul;17(7):676-82. Direct analysis of protein complexes using mass spectrometry.
20
Protein2: Today's story & goals
• Separation of proteins & peptides • Protein localization & complexes • Peptide identification (MS/MS)
– Database searching & sequencing. • Protein quantitation
– Absolute & relative • Protein modifications & crosslinking
• Protein - metabolite quantitation 21
Tandem Mass Spectrometry
See Siuzdak, Gary. “The emergence of mass spectrometry in biochemical research.” Proc. Natl. Acad. Sci. 1994, 91, 11290-11297.
Roepstorff, P.; Fohlman, J. Biomed. Mass Spectrom. 1994, 11, 601.
Quadrople Q1 scans or selects m/z. Q2 transmits those ions through collision gas (Ar).Q3 Analyzes the resulting fragment ions.
22
Ions
24
Peptide Fragmentation and Ionization
25
Tandem Mass Spectra Analysis
See Gygi et al. Mol. Cell Bio. (1999)
26
Mass Spectrum Interpretation Challenge
• It is unknown whether an ion is a b-ion or an y-ion or else.
• Some ions are missing. • Each ion has multiple of isotopic forms. • Other ions (a or z) may appear. • Some ions may lose a water or an ammonia.• Noise. • Amino acid modifications.
27
A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry
See Chen et al 2000. 11th Annual ACM-SIAM Symp. of Discrete Algorithms pp. 389-398.
28
SEQUEST: Sequence-Spectrum Correlation
Given a raw tandem mass spectrum and a protein sequence database.
• For every protein in the database, • For every subsequence of this protein
– Construct a hypothetical tandem mass spectrum – Overlap two spectra and compute the
correlation coefficient (CC).• Report the proteins in the order of CC score.
Eng, et al. 1994, Amer. Soc. for Mass Spect. 5: 976-989 (Sequest)(http://thompson.mbt.washington.edu/sequest/)
29
Protein2: Today's story & goals
• Separation of proteins & peptides • Protein localization & complexes • Peptide identification (MS/MS)
– Database searching & sequencing. • Protein quantitation
– Absolute & relative • Protein modifications & crosslinking
• Protein - metabolite quantitation 30
Expression quantitation methods
RNA Protein
Genes immobilized labeled RNA Antibody arrays RNAs immobilized labeled genes-
Northern gel blot Westerns QRT-PCR -none-Reporter constructs same Fluorescent In Situ (Hybridization) same (Antibodies) Tag counting (SAGE) -none-Differential display mass spec
31
Molecules per cellE.coli/yeast Human
Individual mRNAs:10-1 to 103 10-4 to 105
Proteins:10 to 106 10-1 to 108
32
MS Protein quantitation R=.84Yeast Protein ESI-MS Quantitation
R2
1
10
100
1000
10000
y = 0.8754x + 0.1573 = 0.8381
0.1
Day
2 M
easu
re
0.1 1 10 100 1000
Link, et al Day 1 measure
33
34
Coefficients of Variance
0
1
2
3
4
5
Freq
uenc
y Sample: Angiotensin, Neurotensin, Bradykinin
Map: 600 – 700 m/z
CV = σ/µ
MS quantitation reproducibility
.10
.20
00
20
40
60
8 0 20
40
60
8 0 40
600 0 0 0 1 1 1 1 .22 0 2 2. . . . . . . . . . .
CV
28
Correlation between protein and mRNA abundance in yeast
See Gygi et al. 1999, Mol. Cell Biol. 19:1720-30 (Pub) (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10022859&dopt=Abstract)
35
Normality tests
See Weiss 5th ed. Page 920.Types of non-normality: kurtosis, skewness (www)(log) transformations to normal. (http://www.marketminer.com/prophet/statguide/n-dist_exam_res.html)
Futcher et al 1999, A sampling of the yeast proteome. Mol.Cell.Biol. 19:7357-7368. (Pub) (http://mcb.asm.org/cgi/content/full/19/11/7357?view=full&pmid=10523624)
36
Spearman correlation rank test
rs = 1 - {6S/(n3-n)} Rank (from 1 to n, where n is the number of pairs of data) the numbers in each column. If there are ties within a column , then assign all the measurements that tie the same median rank. Note, avoids ties (which reduce the power of the test) by measuring with as fine a scale as possible. S= sum of the square differences in rank. (ref) (http://www.wisc.edu/ecology/labs/phenology/spearman.html)
X Y Rx Ry1 8 1 4 6 2 3 1 6 3 3 2
n=4 6 4 3 3 37
Correlation of (phosphorimager 35S met) protein & mRNA
rp = 0.76 for log(adjusted RNA) to log(protein)
rs = .74 overall; 0.62 for the top 33 proteins & 0.56 (not significantly different) for the bottom 33 proteins
38
Observed (Phosphorimage) protein levels vs. Codon Adaptation Index (CAI)
Codon Adaptation Index (CAI) Sharp and Li (1987); fi is the relative frequency of codon i in the coding sequence, and Wi the ratio of the frequency of codon i to the frequency of the major codon for the same amino-acid.
ln(CAI)= Σ fi ln (Wi)i=1,61
39
ICAT Strategy for Quantifying Differential Protein
Expression.
X= H or D
See Gygi et al. Nature Biotechnology (1999)
40
Mass Spectrum andReconstructed Ion Chromatograms.
See Gygi et al. Nature Biotechnology (1999)
41
Protein & mRNA Ratios +/- Galactose
See Ideker et al 2001(http://arep.med.harvard.edu/pdf/Ideker01.pdf)
42
Protein2: Today's story & goals
• Separation of proteins & peptides • Protein localization & complexes • Peptide identification (MS/MS)
– Database searching & sequencing. • Protein quantitation
– Absolute & relative • Protein modifications & crosslinking
• Protein - metabolite quantitation 43
Post-synthetic modifications
• Radioisotopic labeling: PO4 S,T,Y,H• Affinity selection:
Cys: ICAT biotin-avidin selection PO4: immobilized metal Ga(III) affinity chromatography(IMAC)
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10424175&dopt=Abstract)
Specific PO4 Antibodies Lectins for carbohydrates
• Mass spectrometry 44
32P labeled phoshoproteomics
Low abundance cell cycle proteins not detected above background from abundant proteins
See Futcher et al 1999, A sampling of the yeast proteome. Mol.Cell.Biol. 19:7357-7368. (Pub)(http://mcb.asm.org/cgi/content/full/19/11/7357?view=full&pmid=10523624)
45
Natural crosslinks
Disulfides Cys-Cys Collagen Lys-Lys Ubiquitin C-term-Lys
Fibrin Gln-Lys Glycation Glucose-Lys
Adeno primer proteins dCMP-Ser
46
Crosslinked peptide Matrix-assisted laser desorption ionization Post-Source Decay (MALDI-PSD-MS)
tryptic digest of BS3 cross-linked FGF-2. Cross-linked peptides are identified by using the program ASAP and are denoted with an asterisk (9). (B) MALDI-PSD spectrum of cross-linked peptide E45R60 (M + H+ = m/z 2059.08).
47
Constraintsfor homology modeling based on MS
crosslinking distancesThe 15 nonlocal throughspace distance constraints generated by the chemical cross-links (yellow dashed lines) superimposed on the average NMR structure of FGF-2 (1BLA). The 14 lysines of FGF-2 are shown in red.
See Young et al 2000, PNAS 97: 5802 (Pub) (http://www.pnas.org/cgi/content/full/97/11/5802)
48
Homology modeling accuracy
20
30
40
50
60
70
80
90
100
i
% se
quen
ce id
entit
y
Ser es1
1 1.5 2 2.5 3 3.5 4
Swiss-model RMSD of the test set in Angstroms
(http://www.expasy.ch/swissmod/SM_LikelyPrecision.html)
49
Top 20 threading models for FGF ranked by crosslinking constraint error
See Young et al. PNAS | May 23, 2000 | vol. 97 | no. 11 | 5802-5806http://www.pnas.org/cgi/content/full/97/11/5802
50
Protein2: Today's story & goals
• Separation of proteins & peptides • Protein localization & complexes • Peptide identification (MS/MS)
– Database searching & sequencing. • Protein quantitation
– Absolute & relative • Protein modifications & crosslinking
• Protein - metabolite quantitation 51
Challenges for accurately measuring metabolites
• Rapid kinetics • Rapid changes during isolation • Idiosyncratic detection methods:
enzyme-linked, GC, LC, NMR (albeit fewer molecular types than RNA& protein)
52
li256 ami i
0
200
400
600
80 320 560 800
Databases
598 have identical mass e.g. Ile & Leu = 131.17
160 240
Y=
X = Mass
1634 Metabo te Masses no ac ds
1040 1280 1520
Frequency
Karp et al. (1998) NAR 26:50. EcoCyc; Selkov, et al. (1997) NAR 25:37. WIT Ogata et al. (1998) Biosystems 47:119-128 KEGG
53
54
Y= RP LC retention time in min. (higher hydrophocity)
X = Mass
I L
W
Metabolite fragmentation & stable isotope labeling
See Wunschel J Chromatogr A 1997, 776:205-19 Quantitative analysis of neutral & acidic sugars in whole bacterial cell hydrolysates using high-performance anion-exchange LC-ESI-MS2.(Pub) (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=9291597&dopt=Abstract)
55
IsotopomersSee Klapa et al. Biotechnol Bioeng 1999; 62:375. Metabolite and isotopomer balancing in the analysis of metabolic cycles: I. Theory. (Pub) "accounting for the contribution of all pathways to label distribution is required, especially ... multiple turns of metabolic cycles... 13C (or 14C) labeled substrates.“ (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10099550&dopt=Abstract)
56
MetaFoR: Metabolic Flux Ratios Fractional 13C labeling > Quantitative 2D NMR Why use amino acids from proteins rather than metabolites directly?
See:
Sauer J et al. Bacteriol 1999;181:6679-88 (Pub)(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10542169&dopt=Abstract)
Szyperski et al 1999 Metab. Eng. 1:189. (http://www.biotech.biol.ethz.ch/sauer/pdf/1999 - Uwe Sauer Metabol Eng.pdf)
Dauner et al. 2001 Biotec Bioeng 76:144 (http\www.biotech.biol.ethz.ch\sauer\pdf\2001 - Michael Dauner et al B&B.pdf)
57
A functional genomics strategy that uses metabolome data to reveal the phenotype of
silent mutations
See Raamsdonk et al. 2001 Nature Biotech 19:45. (http://arep.med.harvard.edu/pdf/Raamsdonk01.pdf)
58
Types of interaction modelsQuantum Electrodynamics Quantum mechanics Molecular mechanics Master equations
Phenomenological rates ODEFlux Balance Thermodynamic models Steady State Metabolic Control Analysis Spatially inhomogenous models
subatomic electron clouds spherical atoms (101Pro1) stochastic single molecules (Net1)
Concentration & time (C,t) dCik/dt optima steady state (Net1) dCik/dt = 0 k reversible reactions ΣdCik/dt = 0 (sum k reactions) d(dCik/dt)/dCj (i = chem.species) dCi/dx
Increasing scope, decreasing resolution 59
How do enzymes & substrates formally differ?
E
A EA EB B
ATP E2+P ADP
E EATP EP Catalysts increase the rate (&specificity) without being consumed. 60
Enzyme rate equations with one Substrate & one Product
ES P
dP/dt = V (S/Ks - P/Kp) 1 + S/Ks + P/Kp
As P approaches 0:S dP/dt = V
1+ Ks/S
61
Enzyme Kinetic Expressions
Phosphofructokinase
• PFK
ATP
Mg ATPK
•F 6P
PFK PFKKv MgF 6 PmxvPFK F P Mg ATP6 K
N •1 1PFK PFKK PFK F 6 P Mg•ATP
4
4 1+
ATPfree1 Mg
K Allosteric kineticPFK ATP
PFKMgK
=
++
+
LPFKN 1+ parameters forPFK 0 4 41+ PFK
AMP
1+F PAMP 6 AMP, etc.PFKK K F 6 P 62
=
Human Red Blood Cell ODE model
LACe
GLCe
Na+
ODE model Jamshidi et al.
2000 (Pub)
GLCi
G6P
F6P
FDP
GA3P
DHAP
1,3
2,3
PEP
PYR
LACi
GL6P GO6P X5P
GA3P
S7P
F6P
E4P
GA3P
NADP NADPH
NADP NADPH
ADP ATP
ADP ATP
ADP ATP NADH NAD
ADP ATP
NADH NAD
K+ADP
ATP ADP
ATP
2 NADPH NADP
ADO
INO
AMP
IMPADOe
INOe
ADE
ADEeHYPX
PRPP
PRPP
ATP
AMPATP
ADP
Cl-
pH
HCO3 -
DPG
DPG
3PG
2PG
RU5P R5P
F6P GSH GSSG
R1P R5P
(http://atlas.med.harvard.edu/gmc/rbc.html) 63
Red Blood Cell in Mathematica
ODE model Jamshidi et al.
2000 (Pub)(http://atlas.med.harvard.edu/gmc/rbc.html)
64
Protein2: Today's story & goals
• Separation of proteins & peptides • Protein localization & complexes • Peptide identification (MS/MS)
– Database searching & sequencing. • Protein quantitation
– Absolute & relative • Protein modifications & crosslinking
• Protein - metabolite quantitation 65