Post on 18-Jan-2018
description
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
1
CS/CBB 545 - Data MiningPredicting Networks through
Bayesian Integration #1 - Theory
Mark Gerstein, Yale Universitygersteinlab.org/courses/545
(class 2007,02.27 14:30-15:45)
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
2
Predicting Networks via Bayesian Integration: Problem Motivation
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
3
Combining networks forms an ideal way of integrating diverse information
Metabolic pathway
Transcriptional regulatory network
Physical protein-protein Interaction
Co-expression Relationship
Part of the TCA cycle
Genetic interaction (synthetic lethal)Signaling pathways
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
4
Binding Experiments on Subunit Pairs
1 1 1 1 0 1 0Pull-down 2 1 1 0 1 0 1 0 1 1 0 1 11 0 0 1 01 0 0 0 0 0 0 0 0 0 0
1 1 0 1 0 1 0Pull-down 1 1 1 0 1 0 1 0 1 1 0 1 11 1 0 1 01 0 0 0 0 0 1 0 0 0 0
0 0 1 1 1 1 1 1 0 1Cross-linking 1 11 1 1 11111
1Pull-down 3 1 1 0 0 0 11 0
Far Western 3 1 0 0 10 0
Far Western 2 11 11 1 11 11 1 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1Far Western 1 0 0 0 0 0 0 0
Interaction experiments before structure was known
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 12
2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12
2 2 2 2 2 22 3 3 3 3 33 5 5 5 55SubunitsSubunits
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
5
RNA polymerase II: Structure
Which subunits interact?Based on Binding
experiments
Source: Cramer et al., 2000, Science, 288:632-633
Compare with Gold Std. Structure
5
6
8
9
10 11
12
1
2
3
Source: Edwards et al., 2002, Trends in Genetics
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
6
Gold-Standard Positives
5
6
8
9
10 11
12
1
2
3
Gold-Standard Positive (GSTD+): 13
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
7
Gold-Standard Negatives
Gold-Standard Negative (GSTD-): 32
5
6
8
9
10 11
12
1
2
3
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
8
RNA Polymerase II: Gold-Standards
Gold-Standard Negative (GSTD-): 32
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
Gold-Standard Positive (GSTD+): 13
5
6
8
9
10 11
12
1
2
3
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
9
Assess Quality and Coverage of PPints
1 1 1 1 0 1 0Pull-down 2 1 1 0 1 0 1 0 1 1 0 1 11 0 0 1 01 0 0 0 0 0 0 0 0 0 0
1 1 0 1 0 1 0Pull-down 1 1 1 0 1 0 1 0 1 1 0 1 11 1 0 1 01 0 0 0 0 0 1 0 0 0 0
0 0 1 1 1 1 1 1 0 1Cross-linking 1 11 1 1 11111
1Pull-down 3 1 1 0 0 0 11 0
Far Western 3 1 0 0 10 0
Far Western 2 11 11 1 11 11 1 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1Far Western 1 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
TrueFalse
GSTD+GSTD-
Likelihood Ratio for Feature f:
|
|f
ff
p x GSTDL
p x GSTD
“Quality Score”
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
10
Data integration: RNA polymerase II
Subunit A 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 11
Subunit B 2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 12 12
structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Far western 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0
Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1
Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0
Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0
Pull-down 1 1 1 0 1 0 0 1 0
Far western 1 0 0 0 1 0
= false
= true
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
11
Integrate using naive Bayes classifier
Majority 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Intersection 1 1 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Union 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Data integration: RNA polymerase II
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
12
Data integration: RNA polymerase II
Integrate using naive Bayes classifier
Majority 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Intersection 1 1 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Union 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
13
Data integration: RNA polymerase II
Integrate using naive Bayes classifier
Majority 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Intersection 1 1 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Union 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
(Cross validate)
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
14
Weighted Voting: the Likelihood Ratio
structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0
Far western 1 1 1 1 1 0 0 0 0 0 0 1 0
Far western (dup) 1 1 1 1 1 0 0 0 0 0 0 1 0
Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0
Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0
Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1
Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1
Pull-down 1 1 1 0 1 0 0 1 0
Far western 1 0 0 0 1 0
Combined 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0
Maj. Vote: 0 = round(avg( 0 + 0 + 0 + 1 + 1 + 0 + 0 )With weights: likelihood ratio L = L1 + L2 + L3 …
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
15
Predicting Networks via Bayesian Integration: Intuition & Formalism
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
16
Classification by Voting (N1)
Derived from "perceptron model" in earlier class R = <w,f> + b
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
17
Classification by Voting (N2)
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
18
Bayes Rule
Which is shorthand for:
[From Mitchell, Machine Learning]
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
19
More Bayes Rule
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
20
More Bayes Rule
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
21
Feature Correlation and Fully
Connected Bayes
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
22
Estimating Probabilities
• We have so far estimated P(X=x | Y=y) by the fraction nx|y/ny, where ny is the number of instances for which Y=y and nx|y is the number of these for which X=x
• This is a problem when nx is small E.g., assume P(X=x | Y=y)=0.05 and the training set is s.t. that ny=5. Then it is
highly probable that nx|y=0 The fraction is thus an underestimate of the actual probability It will dominate the Bayes classifier for all new queries with X=x
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
23
m-estimate
• Replace nx|y/ny by:
mnmpn
y
yx
|
• Where p is our prior estimate of the probability we wish to determine and m is a constant Typically, p = 1/k (where k is the number of possible values of X) m acts as a weight (similar to adding m virtual instances distributed according to
p)
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
24
Training and Testing Set
• Cross Validation: Leave one out, seven-fold
Credits: Munson, 1995; Garnier et al., 1996
4-fold
Cross Validation
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
25
Intuition N2.7 : ROC Curve & the prior
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
26
Predicting Networks via Bayesian Integration:
Worked Examples
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
27
Network Gold-Standards
Prediction of protein interactions: Bayesian integration
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
28
Network Gold-Standards
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Prediction of protein interactions: Bayesian integration
Frac. of Gold-Std Negatives with Feature
Frac. of Gold-Std Positives with Feature
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
= "Quality Score" =
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
29
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Network Gold-Standards
L1 = (4/4)/(3/6) =2
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
Prediction of protein interactions: Bayesian integration
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
30
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Network Gold-Standards
L1 = (4/4)/(3/6) =2
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
Prediction of protein interactions: Bayesian integration
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
31
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Network Gold-Standards
L1 = (4/4)/(3/6) =2
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
Prediction of protein interactions: Bayesian integration
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
32
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Network Gold-Standards
L1 = (4/4)/(3/6) =2
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
Prediction of protein interactions: Bayesian integration
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
33
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Network Gold-Standards
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
Prediction of protein interactions: Bayesian integration
L1 = (4/4)/(3/6) =2
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
34
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Network Gold-StandardsL1 = (4/4)/(3/6) =2L2 = (3/4)/(3/6) =1.5
For each protein pair:LR = L1 L2log(LR) = log(L1) + log(L2)
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
Prediction of protein interactions: Bayesian integration
Weighted Voting(Assuming uncorrelated features and Naïve Bayes)
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
35
Feature 2, e.g. Y2HFeature 1, e.g. co-expression
Gold-standard +Gold-standard –
Network Gold-StandardsL1 = (4/4)/(3/6) =2L2 = (3/4)/(3/6) =1.5
For each protein pair:LR = L1 L2log(LR) = log(L1) + log(L2)
[Jansen, Yu, et al., Science; Yu, et al., Genome Res.]
( | )( | )i
ii
p xLp x
Likelihood Ratio for Feature i:
Prediction of protein interactions: Bayesian integration
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
36
4
4
3
0
TPLR Cutoffs
4
3
2
1
Feature 1; L1 = 2Feature 2; L2 = 1.5Gold-standard +Gold-standard –Predicted PositivePredicted Negative
LR cutoffs1No Prediction made
Protein
Bayesian Integration and ROC Curves
2 <32
13<
3 <23/2<
4 3/2
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
37
6
3
0
0
FPLR Cutoffs
4
3
2
1
Feature 1; L1 = 2Feature 2; L2 = 1.5Gold-standard +Gold-standard –Predicted PositivePredicted Negative
LR cutoffs1No Prediction made
Protein
Bayesian Integration and ROC Curves
2 <32
13<
3 <23/2<
4 3/2
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
38
64
34
03
00
FPTPLR Cutoffs
4
3
2
1
Feature 1; L1 = 2Feature 2; L2 = 1.5Gold-standard +Gold-standard –Predicted PositivePredicted Negative
LR cutoffs1No Prediction made
Protein
43
2
1
FPR=FP/N"Error Rate"
TPR
=TP/
P"C
over
age"
ROC Curve
1/2 1
1
Bayesian Integration and ROC Curves
2 <32
13<
3 <23/2<
4 3/2
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
39
Likelihood Ratios
1 1 0 1 0 1 0Pull-down 1 1 1 0 1 0 1 0 1 1 0 1 11 1 0 1 01 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
TrueFalse
GSTD+GSTD-
00
0
||
p x GSTDL
p x GSTD
11
1
||
p x GSTDL
p x GSTD
Likelihood Ratio for Feature f:
|
|f
ff
p x GSTDL
p x GSTD
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
40
Calculating Likelihood Ratios
1 0 1 0 0Pull-down 1 1 0 1 1 1
6 /13
4 /13
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
TrueFalse
GSTD+GSTD-
00
0
||
p x GSTDL
p x GSTD
11
1
||
p x GSTDL
p x GSTD
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
41
Calculating Likelihood Ratios
1 1Pull-down 1 1 0 1 0 1 1 01 1 0 1 01 0 0 0 0 0 1 0 0 0 0
6 /1311/ 32
4 /1314 / 32
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
TrueFalse
GSTD+GSTD-
=1.34
1 1 0 1 0 1 0Pull-down 1 1 1 0 1 0 1 0 1 1 0 1 11 1 0 1 01 0 0 0 0 0 1 0 0 0 0
=0.70
00
0
||
p x GSTDL
p x GSTD
11
1
||
p x GSTDL
p x GSTD
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
42
Calculating Likelihood Ratios1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
TrueFalse
GSTD+GSTD-
Pull-down 1 L1 = (6/13) / (11/32) = 1.34 L0 = (4/13) / (14/32) = 0.70Pull-down 2 L1 = (7/13) / (9/32) = 1.91 L0 = (2/13) / (16/32) = 0.31Pull-down 3 L1 = (2/13) / (3/32) = 1.64 L0 = (2/13) / (2/32) = 2.46Cross-linking L1 = (10/13) / (7/32) = 3.52 L0 = (0/13) / (3/32) = 0Far Western 1 L1 = (2/13) / (4/32) = 1.23 L0 = (3/13) / (6/32) = 1.23Far Western 2 L1 = (6/13) / (5/32) = 2.95 L0 = (2/13) / (17/32) = 0.29Far Western 3 L1 = (1/13) / (1/32) = 2.46 L0 = (2/13) / (2/32) = 2.46
1 1 1 1 0 1 0Pull-down 2 1 1 0 1 0 1 0 1 1 0 1 11 0 0 1 01 0 0 0 0 0 0 0 0 0 0
1 1 0 1 0 1 0Pull-down 1 1 1 0 1 0 1 0 1 1 0 1 11 1 0 1 01 0 0 0 0 0 1 0 0 0 0
0 0 1 1 1 1 1 1 0 1Cross-linking 1 11 1 1 11111
1Pull-down 3 1 1 0 0 0 11 0
Far Western 3 1 0 0 10 0
Far Western 2 11 11 1 11 11 1 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1Far Western 1 0 0 0 0 0 0 0
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
43
Data Integration: ROC-Curve
0 0 1 0 1 1 1 0 0Combined (Bayes) 1 11 1 1 01110 00 01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 1 0Pull-down 2 1 1 0 1 0 1 0 1 1 0 1 11 0 0 1 01 0 0 0 0 0 0 0 0 0 0
1 1 0 1 0 1 0Pull-down 1 1 1 0 1 0 1 0 1 1 0 1 11 1 0 1 01 0 0 0 0 0 1 0 0 0 0
0 0 1 1 1 1 1 1 0 1Cross-linking 1 11 1 1 11111
1Pull-down 3 1 1 0 0 0 11 0
Far Western 3 1 0 0 10 0
Far Western 2 11 11 1 11 11 1 0 0 0 0 010 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1Far Western 1 0 0 0 0 0 0 0
3.52
18.2
11.1
13.9
26.6
0.22 0
0.22
0.22
0.29
0.06
0.12
0.22
0.29
0.06
0.06
0.22
0.22
0.36
0.08
0.91
0.220
0.52
2.16
132
19.5
0.53
3.68
5.52
12.7
10.4
2.25
26.6
0.220
2.25
11.1
18.2
10.4
2.25
0.06
0.29
0.290
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
TrueFalse
GSTD+GSTD- “Weighted Voting”
1 1,..., ...n nL f f L f L f
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
44
Data Integration: ROC Curve
0 0 1 0 1 1 1 0 0Combined (Bayes) 1 11 1 1 01110 00 01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3.52
18.2
11.1
13.9
26.6
0.22 0
0.22
0.22
0.29
0.06
0.12
0.22
0.29
0.06
0.06
0.22
0.22
0.36
0.08
0.91
0.220
0.52
2.16
132
19.5
0.53
3.68
5.52
12.7
10.4
2.25
26.6
0.220
2.25
11.1
18.2
10.4
2.25
0.06
0.29
0.290
1 1 1 0 1 0 1 0 1Majority 1 11 1 1 01110 00 01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 1 0 0 0 0Intersection 1 11 1 0 01110 00 01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1Union 1 11 1 1 11111 00 01 1 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0.2
0.4
0.6
0.8
1.0
0.2 0.4 0.6 0.8 1.0
FPR=FP/N=1-Specificity
TPR=
TP/P
=Se
nsit
ivit
y
ROC Curves of RNA Polymerase II
1 1 1 1 1 1 1 1 1 2 3 5 6 6 6 6 6 8 8 8 8 9 9 9 10 10 122 2 2 2 2 22 3 3 3 3 33 5 5 5 55Subunits2 3 5 6 8 9 10 11 12 3 5 6 8 9 10 11 12 5 6 8 9 10 11 12 6 8 9 10 11 12 8 9 10 11 12 9 10 11 12 10 11 12 11 11 12Subunits
TrueFalse
GSTD+GSTD-
Bayesian IntegrationMajorityIntersectionUnion
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
45
Predicting Networks via Bayesian Integration: Features Correlation
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
46
Correlations between similar features
structural contact 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0
Far western 1 1 1 1 1 0 0 0 0 0 0 1 0
Far western (dup) 1 1 1 1 1 0 0 0 0 0 0 1 0
Cross-linking 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0
Far western 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 0
Pull-down 1 1 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1
Pull-down 1 1 1 1 0 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1
Pull-down 1 1 1 0 1 0 0 1 0
Far western 1 0 0 0 1 0
Combined 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
47
Intuition N3 : Feature Correlation
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
48
Naive Bayes
APABPACPC,B,AP
B C
A
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
49
A ‘correct’ factorisation
APABPB,ACPC,B,AP
B C
A
(
c) M
Ger
stei
n '0
6, g
erst
ein.
info
/talk
s
50
A Typical BBN