Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv...
Transcript of Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv...
![Page 1: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/1.jpg)
Roded Sharan
School of Computer Science, Tel Aviv University
Network alignment and querying
Network biology minicourse (part 4)
Algorithmic challenges in genomics
![Page 2: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/2.jpg)
Multiple Species PPI Data
• Rapid growth in number of species measured.
![Page 3: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/3.jpg)
Distilling Modules
Problem:
Data is partial and noisy.
![Page 4: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/4.jpg)
Being Comparative
Paradigm: Evolutionary conservation implies functional significance. Conservation: similarity in sequence and interaction topology.
Species A Species B
![Page 5: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/5.jpg)
Main challenges Local network alignment: detect conserved subnetworks across two (or more) networks. Global network alignment: find 1-1 mapping between networks. Network querying: given a query subnetwork in species A, find similar instances in the network of species B.
![Page 6: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/6.jpg)
Local Pairwise Alignment
![Page 7: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/7.jpg)
Problem definition Given two networks (of two species), find pairs of subnetworks (one from each species) that are significantly similar. • Similarity is measured both on vertices (sequence similarity) and edges (topological similarity). • Under certain formulations reduces to subgraph isomorphism (NP hard).
![Page 8: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/8.jpg)
Network Alignment
Protein pairs
Conserved interactions
Alignment graph: Nodes: pairs of sequence-similar proteins, one per species. Edges: conserved interactions. • Facilitates search for conserved subnetworks. • First introduced by Ogata et al.’00 and Kelley et al.’03.
![Page 9: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/9.jpg)
Yeast ~15000 PPIs
H. pylori ~1500 PPIs
PathBLAST (Kelley et al.’03)
![Page 10: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/10.jpg)
← P
rote
in se
quen
ce si
mila
rity
→
← Bacteria → ← Yeast →
Best match of bcp is Dot5, which does not interact with the pathway’s proteins.
![Page 11: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/11.jpg)
![Page 12: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/12.jpg)
Identifying conserved Complexes
S. et al. JCB 2005
• Generalize single-species scoring • Given two protein subsets, one in each species, with a many-to-many correspondence between them, wish: 1. Each subset induces a dense subgraph. 2. Matched protein pairs are sequence-similar.
∏⋅⋅=matched , ,
,
)random|Pr()homologs|Pr(
)'()()',(vu vu
vu
SS
CLCLCCL
∏∏∉∈ −
−=
'),('),( ),(11
),()( :Recall
EvuEvu vupp
vuppCL
![Page 13: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/13.jpg)
Evolutionary-based Scoring
![Page 14: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/14.jpg)
A Word on PPI Evolution
• PPI networks are shaped by duplication and indel events. • Indel events arise due to mutations that change protein surface and are much more frequent.
Koyuturk et al., JCB 2006
![Page 15: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/15.jpg)
Scoring (MaWish) • The score of two aligned protein subsets is based on the match, mismatch and duplication events they induce. • Each event is associated with a parameter (heuristically set) which determines its relative weight.
Match reward mS(u,v)S(u’,v’)
Indel penalty -nS(u,v)S(u’,v’)
Duplication reward/penalty
d(S(u,u’)-s)
![Page 16: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/16.jpg)
Score improvement
NetworkBLAST – Probabilistic
scoring model
Network 1 Network 2 A B
Score(A U B) = Score(A U C)
C
(S. et al. ’05) MaWish – Evolution based
scoring model
(Koyuturk et al. ’05)
Likelihood ratio score:
)()(),( BScoreAScoreBAScore ⋅=
M C - Protein Complex Model: Dense subnetwork
M N - Background (Null) Model: Random subnetwork A B Network 1 Network 2
C Match: Interaction in both species
Mismatch: Interaction in one of the species and not the other
The score that is assigned to every two pairs of homologous proteins is proportional to their sequence similarity level
Score(A U C) = 1 = Score(A U B) = 1
)|()|()(
N
C
MAPMAPAScore =
![Page 17: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/17.jpg)
Score improvement (cont.)
v
x
w
Network 1 Network 2
v’
x’
u’ u
B
C
A
Closest Common Ancestor Hypothetical PPI Network Link Gain Link Loss
)|,( CMBAP
Hirsh et al., ECCB 2006
![Page 18: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/18.jpg)
Local multiple alignment
![Page 19: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/19.jpg)
3-way comparison?
S. cerevisiae • 4389 proteins • 14319 interactions
C. elegans • 2718 proteins • 3926 interactions
D. melanogaster • 7038 proteins • 20720 interactions
S. et al. PNAS 2005
![Page 20: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/20.jpg)
Generalizing Network Alignment
• Alignment graph is extensible to multiple species. • Likelihood scoring is easily extensible, up to sequence
similariry terms: require scoring a multiple sequence alignment.
• Ignored till now: need to balance edge and vertex terms. • Practical solution:
– Sensible threshold for sequence similarity. – Nodes in alignment graph are filtered accordingly. – Node terms are removed from score.
![Page 21: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/21.jpg)
71 conserved regions: 183 significant clusters and 240 significant paths.
![Page 22: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/22.jpg)
Interaction Prediction A pair of proteins is predicted to interact if: 1. Sequence-similar proteins interact in the other two species. 2. The proteins co-occur in the same conserved complex.
Species
Sensitivity (%)
Specificity (%)
P-value
Strategy
Yeast
50
77
1-25 ]1[
Worm
43
82
1e-13
]1[
Fly
23
84
5e-5
]1[
Yeast
9
99
1e-6
]1]+[2[
Worm
10
100
6e-4
]1]+[2[
Fly
0.4
100
0.5
]1]+[2[
![Page 23: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/23.jpg)
Experimental Validation
• 65 predictions for yeast using strategies [1]+[2] were tested in lab. • Success rate: 40-52%. • Outperforms the interolog approach (Matthews et al.’01, Yu et al.’04) at 16-31%.
![Page 24: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/24.jpg)
The Scalability Problem • Network alignment scales as nk (in time and space) for n proteins and k species, hence practical only for k=2,3 (takes several hours). • Progressive alignment is fast (Graemlin by Flanick et al., GR 2006) but does not perform as well. Main idea: imitate the greedy search w/o explicitly constructing the alignment graph.
Kalaev et al., RECOMB 2008
![Page 25: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/25.jpg)
Scaling Up Network Alignment • Maintain linear representation. • Observe: “network alignment node” is a vertical “path” • Given a current seed, use dynamic programming to identify the vertical “path” which contributes most to the score. • Complexity reduces to O(m2k) !
![Page 26: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/26.jpg)
Network querying
![Page 27: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/27.jpg)
Problem definition
• Given a query graph Q and a network G, find the subnetwork of G that is: – Aligned with Q – The alignment has maximal score
Query Q
Network G
![Page 28: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/28.jpg)
Isomorphic Alignment
isomorphic to Q
match
match
match
match
match
match
Match of sequence-similar proteins
Species B Species A Q
![Page 29: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/29.jpg)
Homeomorphic Alignment
homeomorphic to Q
insertion
match
match
match
match
match
match
Match of sequence-similar proteins and deletion/insertion of degree-2 nodes
deletion
Species B Species A Q
![Page 30: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/30.jpg)
Score of Alignment
Score Sequence similarity score for matches
Penalty for deletions & insertions
Interaction reliability scores
+ + =
h(q1,v1) q1 v1
h(q2,v2) h(q3,v3)
h(q4,v4)
h(q6,v6)
h(q5,v5)
del pen
ins pen
v2
∑∑ +++= ),()(#)(#),( jiid vvwInsDelvqhScore δδ
![Page 31: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/31.jpg)
Complexity
• Network querying problem is NPC by reduction from subgraph isomorphism
(in contrast to sequence querying!!!)
• Naïve algorithm has O(nk) complexity – n = size of the PPI network, k = size of the query – Intractable for realistic values of n and k – n ~5000, k~10
• Reduction in complexity can be achieved by: – Constraining the network [Pinter et al., Bioinformatics’05] – Allowing vertex repetitions – Constraining the query (fixed parameter algs.)
![Page 32: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/32.jpg)
Reduction to finding paths in an “alignment” graph.
• Repetitions are possible.
• No general handling of insertions/deletions
PathBLAST
Kelley et al., PNAS’03
![Page 33: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/33.jpg)
DP-Based Approach • Use dynamic programming (a la sequence alignment):
W(i,j) is the maximal score of a partial alignment of query nodes {1…i} that ends at vertex j of the network.
+−∈++
∈++−=
d
i
jiWEjmjmwmiW
EjmjmwjihmiWjiW
δδ
),1(),(,),(),(
),(),,(),(),1(max),(
match
insertion
deletion
Shlomi et al., BMC Bioinformatics ’06; Yang & Sze, JCB’07
![Page 34: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/34.jpg)
Cross-Species Comparison of Signaling Pathways
• But DP may introduce protein repetitions along the path.
Yang & Sze, JCB’07
![Page 35: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/35.jpg)
QPath N
etw
ork
Gra
ph
high scoring subnetwork
query
randomly color
DP algorithm
repeat N times
Shlomi et al., BMC Bioinformatics ‘06
![Page 36: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/36.jpg)
Query Network
Ideas can be generalized to tree queries and beyond (QNet)
q1
q2
q3
q4
q5
q6
q7
v1
v2 v3
v6
v7
v4
Dost et al., RECOMB’07
![Page 37: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/37.jpg)
?
Is topology needed?
Bruckner et al., RECOMB 2009
![Page 38: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/38.jpg)
TORQUE: Topology-free querying
Input: Graph G=(V,E) Color set {1,2,...,k} A coloring of
network vertices Output: a connected
subgraph that is colorful.
![Page 39: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/39.jpg)
Algorithmic idea
Every connected subgraph has a spanning tree
Every colorful connected subgraph will have a colorful spanning tree
Instead of looking for a colorful subgraph, look for a colorful tree
• Two implemented approaches: ‒ Dynamic programming (color coding) ‒ ILP
![Page 40: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/40.jpg)
Comparison with QNet
0
5
10
15
20
25
30
35
40
TORQUE QNet
num
ber
of m
atch
es
Fly complexes in Human
no match found
not a quality match
quality match
0 20 40 60 80
100 120 140 160 180
TORQUE QNet
num
ber
of m
atch
es
Human complexes in Yeast
no match found
not a quality match
quality match
0
10
20
30
40
50
60
70
TORQUE QNet
num
ber
of m
atch
es
Yeast complexes in Human
no match found
not a quality match
quality match
0
5
10
15
20
25
30
35
40
TORQUE
num
ber
of m
atch
es
Rat complexes in Fly
no match found
not a quality match
quality match
![Page 41: Network biology minicourse (part 4) Algorithmic …...School of Computer Science, Tel Aviv University Network alignment and querying Network biology minicourse (part 4) Algorithmic](https://reader034.fdocuments.net/reader034/viewer/2022050204/5f57b9bbc8076b3ab81cb1f0/html5/thumbnails/41.jpg)
Summary & the road ahead…