Bayesian network models of Biological signaling pathways
description
Transcript of Bayesian network models of Biological signaling pathways
Bayesian network models of Biological signaling pathways
K. Sachs2
From Phospho-molecular profiling to Signaling pathways
High throughput dataR
af
Erk
p38
PKA
PKC
Jnk
PIP2
PIP3
Plc
Akt
...
Cell1Cell2Cell3Cell4
Cell600
Signaling Pathways
Flow Measurments
Picture: John Albeck
K. Sachs
Outline What are signaling
pathways? What kind of data
is available study them?
How do we use Bayesian networks to learn their structure?
Two extensions: Markov
neighborhood algorithm
Bayesian network based cyclic networks (BBCs)
3
K. Sachs
Outline What are signaling
pathways? What kind of data
is available study them?
How do we use Bayesian networks to learn their structure?
Two extensions: Markov
neighborhood algorithm
Bayesian network based cyclic networks (BBCs)
4
K. Sachs5
Cell death ProliferationSecrete cytokines
Cells respond to their environment
Inside each cell is a molecular network
K. Sachs6
“Central Dogma”
Translation
ProteinDNA
Transcription
mRNA
Modification
Modified Protein
‘Blueprint’- instructions
for production
of all proteins
Delivers instruction
s for specific gene
Ribosome: Protein-
production factory
K. Sachs7
Signaling & Genetic pathways
A
B
C
A
BTF
DNA
RNA
C
Cell response
K. Sachs
Outline What are signaling
pathways? What kind of data
is available study them?
How do we use Bayesian networks to learn their structure?
Two extensions: Markov
neighborhood algorithm
Bayesian network based cyclic networks (BBCs)
8
K. Sachs9
d[R]dt k1[LR]
k2[R][L] ...
Spectrum of Modeling Tools in Systems Biology
K. Sachs10
Graph Node: Measured level/activity of protein Edge: Influence (dependency) between proteinsConditional probability distributions Each node has a conditional probability given its parents
Protein A
Protein B
Protein C Protein D
Protein E
Bayesian Networks
P(B|A=‘On’)0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
-1 0 10 1 2
K. Sachs
How do we use Bayesian Networks to infer pathways?
11
The Technical Details
BayesianScore (S) logP(S D)
logP(S) logP(D S) c
Score candidate models
Use a heuristic search to find high scoring models
... P(D,S)P( S)dn
1
... P(D, S)dn
1
P(DS)
(analytical solution!)
K. Sachs12
Protein data Western blot
K. Sachs13
Protein data Protein arrays
K. Sachs14
Protein data Mass Spectrometry
All of these lysate approaches give 1
measurement per protein for 10^3-10^7 cells
K. Sachs15
Flow Cytometry: Single Cell Analysis
Thousands of datapoints
K. Sachs16
MEK3/6
MAPKKK
PLC
Erk1/2
Mek1/2
Raf
PKC
p38
Akt
MAPKKK
MEK4/7
JNK
LATLck
VAVSLP-76
RAS
PKA
1 2 3CD28CD3
PI3K
LFA-1
CytohesinZap70
PIP3
PIP2
JAB-1
Activators 1.-CD3
2.-CD28 3. ICAM-2
4. PMA 5. 2cAMP
Inhibitors 6. G06976 7. AKT inh 8. Psitect 9. U0126
10. LY294002
10
5
46
7
9
8
Stimulations and perturbations
K. Sachs17
Datasets of cells• condition ‘a’• condition ‘b’•condition…‘n’
Raf
Mek
1/2
Erk
p38
PKA
PKC
Jnk
PIP2
PIP3
Plc
Akt
12 Color Flow Cytometryperturbation a
perturbation n
perturbation b
Conditions (multi-well format)
T-Lymphocyte Data
Primary human T-Cells 9 conditions
(6 Specific interventions)
9 phosphoproteins, 2 phospolipids
600 cells per condition 5400 data-pointsOmar Perez
K. Sachs18
Statistical Dependencies
A
B
C D
E
Phos
pho
APhospho B
K. Sachs19
Statistical Dependencies
Edges can be directed (primarily) due to the use of
interventions
A
B
C D
E
Phos
pho
APhospho B
K. Sachs20
Overview
Influence
diagram of
measured
variables
Bayesian Network Analysis
Datasets of cells• condition ‘a’• condition ‘b’•condition…‘n’
Raf
Mek
1/2
Erk
p38
PKA
PKC
Jnk
PIP2
PIP3
Plc
Akt
Multiparameter Flow Cytometry
perturbation a
perturbation n
perturbation b
Conditions (multi well format)
K. Sachs21
PKC
Raf
P44/42
Mek
Plc
PKA
Akt
Jnk P38
PIP2
PIP3
Phospho-Proteins Phospho-Lipids Perturbed in data
Inferred Network
K. Sachs22
PKC
Raf
P44/42
Mek
Plc
PKA
Akt
Jnk P38
PIP2
PIP3
Phospho-Proteins Phospho-Lipids Perturbed in data
How well did we do?
Direct phosphorylation
K. Sachs23
Features of Approach Direct phosphorylation:
Mek
Difficult to detect using other forms of high-throughput data:
-Protein-protein interaction data-Microarrays
Erk
K. Sachs24
PKC
Raf
P44/42
Mek
Plc
PKA
Akt
Jnk P38
PIP2
PIP3
Phospho-Proteins Phospho-Lipids Perturbed in data
How well did we do?
K. Sachs25
PKC
Raf
P44/42
Mek
Plc
PKA
Akt
Jnk P38
PIP2
PIP3
Phospho-Proteins Phospho-Lipids Perturbed in data
How well did we do?
Indirect Signaling
K. Sachs26
Indirect signaling
Dismissing edges
Raf Mek Erk
PKC Jnk PKC Mapkkk Jnk
Not measured
Mek4/7
Indirect connections can be found even when the intermediate molecule(s) are not
measured
Indirect signaling
K. Sachs27
Indirect signaling - Complex example
Is this a mistake?
The real picture
Phoso-protein specific More than one pathway of influence
PKC Raf Mek
PKC Rafs259 Mek
Rafs497
Ras
K. Sachs28
PKC
Raf
P44/42
Mek
Plc
PKA
Akt
Jnk P38
PIP2
PIP3
Expected Pathway
15/17 Classic
Phospho-Proteins Phospho-Lipids Perturbed in data
How well did we do?
K. Sachs29
PKC
Raf
Erk
Mek
Plc
PKA
Akt
Jnk P38
PIP2
PIP3
Expected PathwayReported
Missed
15/17 Classic 17/17 Reported 3 Missed
Reversed
Phospho-Proteins Phospho-Lipids Perturbed in data
Signaling pathway reconstruction
[Sachs et al 2005]
K. Sachs
Caveats Inhibitor specificity
Binding site similar across proteins
Reagent availability and specificity
Data quality These are issues in
many biological apps!
30
I think I’ll bind here
K. Sachs
Outline What are signaling
pathways? What kind of data
is available study them?
How do we use Bayesian networks to learn their structure?
Two extensions: Markov
neighborhood algorithm
Bayesian network based cyclic networks (BBCs)
31
K. Sachs32
Markov Neighborhood Algorithm
K. Sachs33
Building larger networks 12 color capability Model 50-100
variables 4 color capability Model 12
variables
PKC
Raf
P44/42
MekPlc
PKA
Akt
Jnk P38
PIP2
PIP3~80 proteins involved in
MAPK signaling (11- at the
cutting edge- is NOT enough!)
K. Sachs34
Measured subsets = Incomplete dataset (Missing data)
Insufficient information for standard approaches (will perform poorly)
Use a set of biologically motivated assumptions to constrain search..
And to reduce the number of experiments ( )11
4= 330
K. Sachs35
Constraining the search
Plus potential perturbation parents
Identify candidate parents
Using ‘Markov neighborhoods’(for each variable)
K. Sachs36
Bayesian Network Analysis
(Constrained search)
Raf
Mek
1/2
Erk
p38
PKA
PKC
Jnk
PIP2
PIP3
Plc
Akt
Molecules 1, 3, 7, 9
Molecules 2, 4, 7, 10
Molecules 1, 2, 6, 11
Approach overview
K. Sachs37
Neighborhood reduction
CB
ED
A
F
4 color capabilityConditional
independencies in the substructure?
ABC
411
K. Sachs38
Accurate Reproduction of Model ~15 experiments, 4-colors
Confidence value different from original
model
PKC
Raf
Erk
Mek
Plc
Akt
Jnk P38
PIP2
PIP3
PKA
K. Sachs39
Raf
Mek
1/2
Erk
p38
PKA
PKC
Jnk
PIP2
PIP3
Plc
Akt
Active learning approach
K. Sachs
Outline What are signaling
pathways? What kind of data
is available study them?
How do we use Bayesian networks to learn their structure?
Two extensions: Markov
neighborhood algorithm
Bayesian network based cyclic networks (BBCs)
40
K. Sachs41
Learning cyclic structures with Bayesian networks
Biological networks contain many loops
Bayesian networks are constrained to be acyclic
So…
K. Sachs
Overcoming acyclicity Signaling pathways contain
many cycles Bayesian networks are
constrained to be acyclic How can we accurately
model pathways with cycles?
42
GRB2/SOS
Raf
MEK
Erk
Ras
Develop a new, Bayesian network derived algorithm that models
cycles…
K. Sachs
Bayesian Network Based Cyclic Networks (BBNs)
I. Break loops with molecule inhibitors
II. Use BN to learn the structure (now not cyclic!)
III. Close loops
43
GRB2/SOS
Raf
MEK
Erk
Ras
Mek inhibitorSolomon
Itani
K. Sachs44
GRB2/SOS
Raf
MEK
Erk
Ras
I. Break loops with molecule inhibitors Detect loops P(A)A* ~= P(A)
II. Use BN to learn the structure (now not cyclic!)
III. Close loopsP(B|Pa(B)) A* ~= P(B|Pa(B)) AB
Bayesian Network Based Cyclic Networks (BBNs)
K. Sachs45
Future work Larger network from overlapping sets
(Markov neighborhood) Dynamic models over time Differences in signaling (sub-
populations, treatment conditions, cell types, disease states)
K. Sachs46
Acknowledgements
Shigeru Okumur
a
Funding
LLS post doctoral fellowship
Solomon Itani
Garry Nolan
Dana Pe’er
Doug Lauffenburge
r
Omar Perez
Dennis Mitchell
Mesrob Ohannessia
n
Extra slides
Mathematical IntuitionB C C is independent of A given
B.A
A B
CDC independent of A given B and
D
1) No need to introduce time!!!2) When loops are broken, the result is a
BN!!!
K. Sachs49
Prediction: ErkAktErk1/2 unperturbed Erk Akt not well established
in literature
Predictions: Erk1/2 influences Akt While correlated, Erk1/2
does not influence PKA
PKC
Raf
Erk1/2
Mek
PKA
Akt
K. Sachs50
Validation
control, stimulatedErk1 siRNA, stimulated
SiRNA on Erk1/Erk2 Select transfected cells Measure Akt and PKA
100 101 102 103 104
APC-A: p-akt-647 APC-A100 101 102 103 104
PE-A: p-pka-546 PE-AP-Akt P-PKA
P=9.4e-5 P=0.28