Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov...
Transcript of Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov...
![Page 1: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/1.jpg)
Integrating protein-protein interaction data: navigating the maze
Shoshana J. Wodak
VIB Structural Biology Research Center, VUB, Brussels Belgium
Omics data integration, Gent Nov 19, 2018
![Page 2: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/2.jpg)
Genome-scale protein interaction (PPI) networks: an embarrassment of riches
Hairy monster: Typical PPI network Yeast, Human, Fly..
Over 30 PPI networks derived from experiments (yeasts, human, E.coli, D. melanogaster, C. elegans, P. falsiparum and more..)
25 PPI networks (and counting) inferred by computational methods
In the last 15 years
![Page 3: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/3.jpg)
Predict protein function
Model evolutionary processes
Predict disease associations
Interpret information on mutations
Interpret information phenotype perturbations
Use as restraints in mutliscale modeling
Build 3D models
No information on stoichiometry, limited or absent temporal spatial and functional information… MUST MAKE MEANINGFUL USE OF THE DATA
Interactions explain everything, do they ?
![Page 4: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/4.jpg)
ka A + B < -‐-‐-‐-‐-‐ > C kd
[A] [B] Kd = kd/ka = -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ [C]
Kd !Equilibrium dissociation constant (molar units)!∆Gd = -RT ln Kd /c° !Gibbs free energy of dissociation!
!(RT thermal energy, standard state c°=1M)!
Kd and ∆Gd quantify the binding affinity"!Their values determine whether the complex is formed given the component concentrations.!
The dynamics and time scales are governed by the rate constants ka (bimolecular) and kd (monomolecular):!
• it takes τa = 1/ka[A] to form a complex ([A] in excess)!
• the complex has a life-time τd = 1/kd!Adapted from J. Janin, 2014
Binding affinities and rates
Genome-wide studies answer by YES / NO the question Do proteins A and B form a complex ?
Yet PPI are dynamic and subject to the law of mass action !
![Page 5: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/5.jpg)
Measurable range Kd 1M 1mM 1µM 1nM 1pM
τd <microsecond millisecond !second hour days
random short-lived transient stable permanent
Type of cell adhesion
assembly redox complexes antigen-antibody crystal enzyme-substrate enzyme-inhibitor
packing signal transduction
! ! ! !weak dimers
oligomeric proteins !
non-specific specific
The functional role of a PPI depends on Kd and the life-time τd = 1/kd
PPI in the cell: wide range of binding affinities & life-times
Adapted from J. Janin, 2014
![Page 6: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/6.jpg)
Experimentally derived genome-scale PPI datasets: prominent examples
Y A
D
B
C E
LC/MS
AP/MS
Co-‐frac;ona;on/MS + massive data integra;on
Y2H
Split Ubiqui;n ( Membrane Y2H)
PCA
nucleus
Cytosol Cytosol
Binary A B
Co-‐complex
≤80Å
Roland et al human 2014 Y2H 4300 14000 NA Hain et al. Human 2015 AP-‐MS 5400 28500 195
Yang et al. human 2016 Y2H (248-‐if & 381) 629 1043 NA
![Page 7: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/7.jpg)
758 268
380 310
778
120
1407
Y2H (union)
PCA (2008)
AP-MS (Babu et al., 2012)
523 360
966 858
230
355
821
Y2H (union) BioGRID HC
AP-MS (Babu et al, 2012)
2394 2264
207 42
248
21
12846
Y2H (union)
PCA (2008)
AP-MS (Babu et al., 2012)
2074 2690
3119 268
22
341
9934
Y2H (union)
BioGRID HC
AP-MS (Babu et al. 2012)
Interac;ons Proteins
Limited overlap of interaction networks from different experimental methods (yeast)
![Page 8: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/8.jpg)
Why is the overlap so limited ?
Different methods probe complementary subspaces of the interactome: AP-MS probe mainly ‘stable’ interactions, Y2H/PCA more transient ones? Biases for proteins in different cellular processes?
Network quality and coverage vary for different methods: AP-MS have a higher rate of FP, Y2H have a higher rate of FN ?
Co-complex associations (AP-MS) ≠ Binary interactions (Y2H/PCA..)
Is there a sampling problem? If so, why? Vlasblom et al. Curr. Opin. Struct. Biol (2013) Pu et al. J. Proteomics, (2015)
![Page 9: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/9.jpg)
The challenge of deriving the network (AP-MS)
High Confidence (HC) Co-complex Network
(~13,000 PPI) Raw co-complex data
(~700,000 PPI)
Scoring methods
A plethora of methods; HGScore, SAINT, PE, ComPASS, HART, Dice etc.
(Soluble PPI, yeast)
Y A D
B C
E
LC/MS
AP/MS
![Page 10: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/10.jpg)
‘Quality’ assessment of PPI network (yeast membrane (2012)
Babu et al. Nature 2012
Comparison to Gold Standard PPI {GO annota;ons}
TAP-MS Y2H
Random
Correlation of mRNA expression profiles Experimental verification
by other methods
Yeast integrated PPI network, Babu et al. Nature 2012
![Page 11: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/11.jpg)
-0.15 -0.1
-0.05 0
0.05 0.1
0.15 0.2
0.25 0.3
log 1
0(R
elat
ive
Ann
otat
ion
Freq
uenc
y)
Y2H
PCA
APMS
Log 10 ( Protein abundance)
Den
sity
Biases of different methods
Biases towards different cellular process, or in sampling co-complex association can be rationalized The bias towards high abundance proteins (PCA & AP-MS) is expected in the raw data (long history of contaminants), but not in the HC networks! It is by far the most consequential since abundant proteins are more likely to form non-specific interactions
Wodak et al., Curr. Opin Struct. Biol.. (2013)
![Page 12: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/12.jpg)
Different PPI networks may yield different results.
PPIs of yeast soluble proteins
HC-Yeast BioGRID Network Integrate HC-Yeast HTP network
Hub End
?
Mauricio Macossay et al. SubmiJed
![Page 13: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/13.jpg)
Over 100 databases specialize in curating information on functional and physical interactions from publications describing small scale and large-scale studies -Contain unique as well as redundant information -May focus on different areas of biology -Different coverage of the literature -Differences in cross referencing genes & proteins -Different conventions for representing interactions
How can one obtain a comprehensive view of currently known interactions?
Literature curated protein-protein interaction data
![Page 14: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/14.jpg)
iRefWeb: consolidated PPI data
OPHID
CORUM!
InnateDB MatrixDB MPIDB iRefIndex consolidation : Ian Donaldson, UK, (VIB-Bioinfo core)
iReWeb portal: IrefWeb (URL: Wodaklab.org/irefweb)
iRefWeb (IrefIndex V13)
Interactions:
Total 509,876 Human: 222,465
Proteins: Total: 91,645 Human: 18,841
Tracks source DBs and PubMeds for each PPI Matches protein on basis of aa sequence + taxon Total of 81,132 PUBMEDs
![Page 15: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/15.jpg)
PSICQUIC: ‘real time’ database federator
IrefIndex V15
![Page 16: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/16.jpg)
The importance of standards data representation
![Page 17: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/17.jpg)
PSI-MITAB 2.5 format
![Page 18: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/18.jpg)
How consistent is the information curated by different databases?
Turinsky A. et al. Donaldson I. and Wodak SJ. Database (Oxford) 2010
![Page 19: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/19.jpg)
Publica;on Publica;on
Measuring consistency between pairwise co-citations
Sorensen-Dice Similarity Coefficient: Size of overlap over average set size
Publication
DB2 DB1 PPI Overlaps
A-B A-C
A-B D-C
A-B A-C D-C
Protein Overlaps
A B C
D
Sppi = 1/2 Sprot = 6/7 Sets of annotated
protein-protein interactions
Analyzed 15,471 shared publications co-curated by two or more amongst 9 major public PPI DBs. When curating the same publication, on average two databases fully agree on : 42% of the interactions and 62% of the proteins Big variation of agreement levels for different organism categories
![Page 20: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/20.jpg)
Agreement and overlap between databases
Turinsky et al., Nat. Biotech, 2011
![Page 21: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/21.jpg)
Both proteins from same organism One protein from other organisms
Interactions curated from shared publicatio Interactions curated from shared publications
The Babel tower of organism assignment
Turinsky et al., Nat. Biotech, 2011
![Page 22: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/22.jpg)
Inconsistencies in Recording PPIs From HTP Studies
![Page 23: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/23.jpg)
Disagreement Between Databases: main Factors:
q Problems with mapping protein/gene ID’s, and divergent assignments of splice isoforms: ~10% of data
q Divergent assignments of organisms: ~21% of data
q Different ways of representing protein complexes: ~12% of the data q Inconsistent curation of HTP data: ~1-2% of the data
Most of these factors can be attributed to different curation policies by DBs
(Issues being addressed by PSI standards & IMEx consortium)
![Page 24: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/24.jpg)
PPI data consumers, beware! - Not all PPI data are created equal
- Different methods probe different types of interactions (e.g. binary/co-complex)
- Double check data quality claims - Literature curated PPIs are a mixed bag, filtering needs to be applied, no global reliability scores!
![Page 25: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches](https://reader030.fdocuments.net/reader030/viewer/2022040708/5e0b9550d53a63087b429fd0/html5/thumbnails/25.jpg)
Acknowledgements
Andrei Tourinsky (HSC, Toronto) Brian Turner (HSC, Toronto) Shuye Pu (HSC, Toronto) James Vlasblom (HSC, Toronto) Systems Support team (HSC, Toronto)
Andrew Emili (UoT) Jack Greenblatt (UoT) Edyta Marcon (UoT) Sadhna Phanse(UoT), Ruth Isserlin (UoT) Jonathan Olsen (UoT) Mohan Babu (UoT) Hyungwon Choi (NUS) Anne-Claude Gingras (SLRI, Toronto) Mathew E. Sowa (Harvard) Emmanuel Levy (Weizmann) Joel Janin (Orsay)
Funding Sources:
http://wodaklab.org