Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR...
![Page 1: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/1.jpg)
Regulatory Network (Part II)
11/05/07
![Page 2: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/2.jpg)
Methods
• Linear– PCA (Raychaudhuri et al. 2000)– NIR (Gardner et al. 2003)
• Nonlinear– Bayesian network (Friedman et al. 2000;
Friedman 2004)
![Page 3: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/3.jpg)
Cell-cycle network
Data (Spellman et al. 1998)
• 76 arrays
• 7 time points
• 6177 yeast genes
• 800 cell-cycle related genes identified
![Page 4: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/4.jpg)
PCA
Raychaudhuri et al. 2000
![Page 5: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/5.jpg)
Raychaudhuri et al. 2000
The PCA components identify the dominant modes of variation.
![Page 6: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/6.jpg)
Limitations of PCA
• Does not directly associate regulators with their target genes.
• Alternatively, it can be interpreted as the network is fully connected. The expression of each gene is regulated by the linear combination of all other genes.
![Page 7: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/7.jpg)
NIR
Idea: The dynamics of gene activities can be approximated by
gene expression levels approximately reach steady state.
uAxdt
dx
perturbation
uAx
![Page 8: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/8.jpg)
NIR• Solve for A
• This is unidentifiable since M << N.• Add constraint that there are at most k-
connections for any given gene (k < M).• For each row, use multiple regression to find a
linear combination of k-genes so that the least square error is minimal.
MNMNNN uxA
#genes #perturbations
![Page 9: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/9.jpg)
Application of NIR
repression
activation
Known E Coli SOS pathway
![Page 10: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/10.jpg)
Application of NIR
Regression coefficients
![Page 11: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/11.jpg)
Limitation of NIR
• True dynamics is nonlinear.
• The choice of k is ad hoc.
• Steady state approximation does not apply to oscillatory genes.
![Page 12: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/12.jpg)
Bayesian network
Directed acyclic graph (DAG)
• Nodes: random variables
• Edges: direct effect --- conditional dependency
Friedman 2004
![Page 13: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/13.jpg)
An example
Earthquake Burglary
Radio Alarm
Call
![Page 14: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/14.jpg)
This is not a Bayesian network
A
B C
![Page 15: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/15.jpg)
A
B
C D
E
Tree: a special kind of DAG
Each node has only one parent node.
![Page 16: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/16.jpg)
Advantage
• Intuitive --- popular among biologists
• Graph structure is easy to interpret
• Well-established probabilistic tools for DAG models.
• Support all the features for probabilistic learning– Model selection criteria– Handling of missing data
![Page 17: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/17.jpg)
Known Structure, complete data
E B
A.9 .1
e
b
e
.7 .3
.99 .01
.8 .2
be
b
b
e
BE P(A | E,B)
? ?
e
b
e
? ?
? ?
? ?
be
b
b
e
BE P(A | E,B) E B
A
• Network structure is specified– Inducer needs to estimate parameters
• Data does not contain missing values
Learner
E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>
(Nir Friedman)
![Page 18: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/18.jpg)
Unknown Structure, Complete Data
E B
A.9 .1
e
b
e
.7 .3
.99 .01
.8 .2
be
b
b
e
BE P(A | E,B)
? ?
e
b
e
? ?
? ?
? ?
be
b
b
e
BE P(A | E,B) E B
A
• Network structure is not specified– Inducer needs to select arcs & estimate parameters
• Data does not contain missing values
E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>
Learner
(Nir Friedman)
![Page 19: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/19.jpg)
Learning parameters
E B
A
C
][][][][
]1[]1[]1[]1[
MCMAMBME
CABE
D
• Training data has the form:
![Page 20: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/20.jpg)
Likelihood Function E B
A
C
• Assume i.i.d. samples
• Likelihood function is
m
mCmAmBmEPDL ):][],[],[],[():(
![Page 21: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/21.jpg)
Likelihood FunctionE B
A
C
• By definition of network, we get
m
m
mAmCP
mEmBmAP
mBP
mEP
mCmAmBmEPDL
):][|][(
):][],[|][(
):][(
):][(
):][],[],[],[():(
][][][][
]1[]1[]1[]1[
MCMAMBME
CABE
![Page 22: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/22.jpg)
Likelihood FunctionE B
A
C
• Rewriting terms, we get
m
m
m
m
m
mAmCP
mEmBmAP
mBP
mEP
mCmAmBmEPDL
):][|][(
):][],[|][(
):][(
):][(
):][],[],[],[():(
][][][][
]1[]1[]1[]1[
MCMAMBME
CABE
![Page 23: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/23.jpg)
General Bayesian Networks
Generalization for any Bayesian network:
Parameters can be estimated independently!
iii
i miii
mn
DL
mPamxP
mxmxPDL
):(
):][|][(
):][, ... ],[():( 1
![Page 24: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/24.jpg)
Bayesian Inference
• Represent uncertainty about parameters using a probability distribution over parameters, data
• Using Bayes rule
])[ ..., ],1[(
)()|][ ..., ],1[(])[ ..., ],1[|(
MxxP
PMxxPMxxP
• Common prior distributions:– Dirichlet (discrete)– Normal (continuous)
![Page 25: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/25.jpg)
Why Struggle for Accurate Structure?
• Increases the number of parameters to be estimated
• Wrong assumptions about domain structure
• Cannot be compensated for by fitting parameters
• Wrong assumptions about domain structure
Earthquake Alarm Set
Sound
Burglary Earthquake Alarm Set
Sound
Burglary
Earthquake Alarm Set
Sound
Burglary
Adding an arcMissing an arc
![Page 26: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/26.jpg)
Score based Learning
E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>
E B
A
E
B
A
E
BA
Search for a structure that maximizes the score
Define scoring function that evaluates how well a structure matches the data
G1
S(G1) = 10 S(G2) = 1.5 S(G3) = 0.01
G2 G3
![Page 27: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/27.jpg)
Max likelihood params
Structure Score
Likelihood score:
Bayesian score:– Average over all possible parameter values
)θP(D|G,L(G:D) Gˆ
dGPGDPGDP )|(),|()|(
Likelihood Prior over parametersMarginal Likelihood
![Page 28: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/28.jpg)
Search for Optimal Network Structure
• Start with a given network– empty network– best tree – a random network
• At each iteration– Evaluate all possible changes– Apply change based on score
• Stop when no modification improves score
![Page 29: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/29.jpg)
• Typical operations:
S C
E
D Reverse C EDelete C
E
Add C
D
S C
E
D
S C
E
D
S C
E
D
Search for Optimal Network Structure
![Page 30: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/30.jpg)
• Typical operations:
S C
E
D Reverse C EDelete C
E
Add C
D
S C
E
D
S C
E
D
S C
E
D
score = S({C,E} D) - S({E} D)
Search for Optimal Network Structure
At each iteration only need to score the site that is being updated !
![Page 31: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/31.jpg)
Structure Discovery
Task: Discover structural properties– Is there a direct connection between X & Y– Does X separate between two “subsystems”– Does X causally effect Y
Example: scientific data mining– Disease properties and symptoms– Interactions between the expression of genes
![Page 32: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/32.jpg)
Discovering Structure
– There may be many high scoring models– Answer should not be based on any single model– Want to average over many models
E
R
B
A
C
E
R
B
A
C
E
R
B
A
C
E
R
B
A
C
E
R
B
A
C
P(G|D)
P(D)
P(D|G)P(G)P(G|D)
![Page 33: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/33.jpg)
Cell-cycle network
Friedman et al 2000
![Page 34: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/34.jpg)
![Page 35: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/35.jpg)
![Page 36: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/36.jpg)
Limitations for Bayesian network
• Computationally costly– It is NP hard problem to identify the globally
optimal network structure
• Heuristic approaches may be trapped to local maxima.
• Prior distribution for DAGs is tricky.
• In practice, failure to find more difficult network structures than cell-cycle data.
![Page 37: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/37.jpg)
Equivalence of graphs
• When two DAGs can represent the same set of conditional independence assertions, we say that these DAGs are equivalent
Y Z Y Z
• Are these graphs equivalent?
![Page 38: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/38.jpg)
X
Y Z
X
Y Z
Are these graphs equivalent?
![Page 39: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/39.jpg)
Therefore, the exact graph is unidentifiable!
![Page 40: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/40.jpg)
Reading List
• Raychaudhuri et al. 2000– Apply PCA to analyze gene expression
• Gardner et al. 2003– Developed NIR to find regulatory network
• Friedman et al. 2000– Applied Bayesian network to analysis cell-
cycle network.
• Friedman 2004– Review of probabilistic graphic models.
![Page 41: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.](https://reader037.fdocuments.net/reader037/viewer/2022102907/56649d3e5503460f94a16643/html5/thumbnails/41.jpg)
Acknowledgement
Some of the slides are obtained from
Nir Friedman