Leibniz-Institute for Natural Product Research and...
Transcript of Leibniz-Institute for Natural Product Research and...
R. Guthke :
Data and knowledge integration in systems
Leibniz-Institute for Natural Product Research and Infection BiologyHans-Knoell-InstituteJena, Germany
Guthke R: Data and knowledge integration in systems biological dynamic models
Data and knowledge integration in systems
biological dynamic models
Cyclic Operation of Experimental and Modeling Work
Feature
Data-Pre-processing
Experiment
Hypotheses
Guthke R: Data and knowledge integration in systems biological dynamic models
ModelOptimization
ModelValidation
FeatureSelection
Literature &Databases
extract hypotheses
Top-down Data-driven
entire system
descriptions
analysing whole systems
Two Complementary Approaches of Modeling
Guthke R: Data and knowledge integration in systems biological dynamic models
analysing andmerging sub-systems
Bottom-upKnowledge-driven
extract hypothesesabout relationships
between components
Knowledge-driven Modeling
Knowledge from e.g. Ingenuity Pathway Analysis
describing complex interactionsbetween
genes, metabolites
or proteins
Guthke R: Data and knowledge integration in systems biological dynamic models
Pathway-Analysis of representative genes of immune response to infection, Calvano et al. Nature, 437 (2005)
in a living cell or tissue
or organism
Data
Transcriptome
ProteomeData Matrix
Data-driven Modelling
Guthke R: Data and knowledge integration in systems biological dynamic models
Data Warehouse andAnalysis Tools
Metabolome
up-
load
Complexity of Gene Regulatory Network (GRN) Modelin g
Different types of molecular interaction in gene re gulation: Protein – DNA, Protein-Protein, Protein - Ligand
Guthke R: Data and knowledge integration in systems biological dynamic models
Metabolite 1 Metabolite 2
Protein 2Complex 3 -4
Complexity Reduction by Projection on the Transcriptome Level (Influence Models)
Guthke R: Data and knowledge integration in systems biological dynamic models
Protein 1
Protein 2
Protein 3
Protein 4
Complex 3 -4
Gene 1
Gene 2
Gene 3
Gene 4
Prominent examples
of large-scale GRN inference for microorganisms
Organism Reference Number ofGenes
Number ofsamples
Guthke R: Data and knowledge integration in systems biological dynamic models
Genes
Halobacteriumsp. NRC-1
Bonneau et al.(2006)
1934 268
Escherichia coli Faith et al. (2007) 4345 445
Saccharomycescerevisiae
Lee et al. (2002) 6312 300
Experimental Data and Prior Knowledge
Guthke R: Data and knowledge integration in systems biological dynamic models
http://gardnerlab.bu.edu. Faith & Gardner:Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles, PLoS Biol, , 5:54-66, 2007.CLR = Context Likelihood of Relatedness
Outline
Introduction
3 Approaches – 3 Examples:
1) Knowledge-driven modelingThe classical approach(Human liver cell bioreactor modeling)
Guthke R: Data and knowledge integration in systems biological dynamic models
(Human liver cell bioreactor modeling)
2) Data-driven modelingTool: NetGenerator(Infection)
3) Integrated data- and knowledge-diven modelingTool: TILAR(Response towards anti-rheumatic and anti-MS therapy)
Outline
Introduction
3 Examples:
1) Knowledge-driven modeling (and model fit to data)The classical approach(Human liver cell bioreactor modeling)
Guthke R: Data and knowledge integration in systems biological dynamic models
(Human liver cell bioreactor modeling)
2) Data-driven modeling(Infection)
3) Integrated data- and knowledge-diven modeling(Response towards an anti-rheumatic drug)
Bioreactor for liver support therapy
an option for assisting or replacing the failing organuntil regeneration occurs or transplantation can be performed
Guthke R: Data and knowledge integration in systems biological dynamic models
250
300
350
400
The BioProcess Kinetics
inoculation by suspensions of primary liver cells obtained from human livers (discarded from transplantation)
use for therapy
Guthke R: Data and knowledge integration in systems biological dynamic models
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
recovery phase (3 days) stand-by phase (up to 34 days)
Days
Aim
Quantitative Understanding of the Input/Output-Behaviour
over the time period of the first 6 days
Bioreactor Inflow Bioreactor OutflowBioreactor400 4000
T U]
Guthke R: Data and knowledge integration in systems biological dynamic models
0
200
t [d]
0
2000
t [d]
0
2000
4000
t [d]
0
100
200
t [d]
0
1000
2000
t [d]
600
800
1000
t [d]
0 2 4 60
200
400
t [d]0 2 4 6
0
50
100
t [d]
ME
TA
SP
GLU
NH
3
LEU
AS
NG
LNU
RE
A
0
100
FA [m
l/h]
0 2 4 60
0.51
t [d]
FB [m
l/h]
Bioprocess Biosyst Eng 28 (2006), 331
We selected 24 variables (from 99)
(N- and C-sources)
18 Amino acids (AA) & 6 others1234567
LEUHISARGVALTRPPHEILE
1920
2122
NH3UREA
GALSOR
AmmoniaUrea
GalactoseSorbitol
Guthke R: Data and knowledge integration in systems biological dynamic models
789
101112131415161718
ILEALATYRLYSMETSERGLYTHRASPASNGLUGLN
222324
SORGLCLAC
SorbitolGlucoseLactate
Non-directed:
e.g. Correlation Network
Directed, probabilistic:
e.g. Bayesian Network
Directed, deterministic:
ODE(Ordinary Differential Equation)
Types of Network Models (Examples)
Guthke R: Data and knowledge integration in systems biological dynamic models
Metabolic Network of Amino Acids, Amonia and Urea in a Liver Cell BioreactorSchmidt-Heck et al., LN Bioinformatics, 2004
The Model
48 ODEsOrdinaryDifferentialEquations
48 variables for
24 compounds & 1717281925171826
1518141414
9
11717,020
17
1616161915151616,02016
152915201919181915
15171716161515,02015
15291414141414,02014
,020
,010,01011,0
,)(
)(/
,)(/
,))/((
)(/
)(/
13,...,1for,)(/
24,...,1for
)(//)(/)(/)(
pppcpcppcp
cpcspcspccVpdt
dc
pppcpccpccVpdt
dc
cpcpcppcp
pppcpcpccVpdt
dc
cppppcpccVpdt
dc
ipppcpccVpdt
dc
i
ccVpcVtFcVtFcVtFdt
dc
AA
iiii
AA
AA
AA
AAiiiiii
iiiBiBAiAi
⋅+⋅+⋅+−⋅+
+⋅+⋅⋅+⋅⋅+−⋅=
⋅+⋅−⋅⋅+−⋅=
⋅++⋅++⋅−
−⋅+⋅+⋅+−⋅=
⋅+⋅+⋅−−⋅=
=⋅+⋅−−⋅=
=
−⋅−⋅−⋅+⋅=
∑=
Guthke R: Data and knowledge integration in systems biological dynamic models
24 compounds &2 comparments)
242419152015322323302424,02024
19152015312323222221212323,02023
,020
33
27
19152015192020,02020
191725151515201519
182616161728
13
101919,020
19
1818261917251818,02018
1717281925171826
)/(2)(/
,)/()(/
22and21for,)(/
0(t)else1dtfor(t)
(t)else3dtfor0(t)
,)/()(/
,))/((
))(1()(/
,)(/
,)(
cpccpcpcppccVpdt
dc
ccpcpcpcpcpccVpdt
dc
icpccVpdt
dc
ppppp
pgg
ccpcpccVpdt
dc
ccpcpcpcp
cpcpctgpcspccVpdt
dc
pppcpccpccVpdt
dc
pppcpcppcp
iiiii
iiii
AA
AA
⋅−⋅+⋅+⋅⋅⋅+−⋅=
⋅+⋅+⋅−⋅+⋅+−⋅=
=⋅−−⋅=
=<==<=
⋅+⋅+−⋅=
⋅⋅+⋅++⋅−
−⋅+⋅+⋅−⋅+⋅⋅+−⋅=
⋅+⋅−⋅⋅+−⋅=
∑=
ODEs 1-24 describing the
Inflow & Outflow and Diffusion
Inflow
Outflow
Guthke R: Data and knowledge integration in systems biological dynamic models
Measured Data c 0i
Waste
Perfusion Circuit
Outflow
Fresh MediumCAi, (all)CBi (only Aspartate)
250 ml/minFB =0����50 ml/h
FA =150 ml/h
F0
24,...,1for
)(//)(/)(/)( ,010,01011,0
=
−⋅−⋅−⋅+⋅=
i
ccVpcVtFcVtFcVtFdt
dciiiBiBAiA
i
Diffusion between the two Compartments :
1) Liver CellCompartmentV2 = 600 ml, ci
Flux via the Membrane
p0
Guthke R: Data and knowledge integration in systems biological dynamic models
Measured data
2) PerfusionCompartmentV1 = 900 ml, c0i
Membrane
The next 13 ODEs:
13,...,1for,)(/ ,020 =⋅+⋅−−⋅= ipppcpccVpdt
dcAAiiiii
i
�24,...,1for
)(//)(/)(/)( ,010,01011,0
=
−⋅−⋅−⋅+⋅=
i
ccVpcVtFcVtFcVtFdt
dciiiBiBAiA
i
The remaining 11 ODEs (textbook knowledge):
The first 24 ODEs:
Guthke R: Data and knowledge integration in systems biological dynamic models
Model Fit to the Measured Data
Example: Run 2
Guthke R: Data and knowledge integration in systems biological dynamic models
Model Fit to data of 7 liver cell culture runs
� model parameter setsPrediction: proteolytic inflow,...
Runs *24 variables
Guthke R: Data and knowledge integration in systems biological dynamic models
Hypothesis: Proteolysis
The temporal maximum in the time series of LYS and other AA (VAL, LEU, ...) at t=1 d was interpreted as the result of proteolysis during the first hour of cultivation (recovery phase)
Model Fit of LYS to 7 runs:
<
=
+⋅−−⋅=
else
dtforppp
ppcpccVpdt
dci
0
1
,)(/
33
999,0209
Guthke R: Data and knowledge integration in systems biological dynamic models
The Model parameters p9 and p33 �
(after model fit to randomly disturbed LYS data)
How to estimate the influx from proteolysis
to the different amino acids ?
LYS at t=1 h were found to be strong positively correlated
with other AA i at t=1 h.AAi(1 h) = pAAi*LYS(1 h) + bi
regression over 7 runs e.g. for ASN : pAA16 = 0.25, b16= -30
Guthke R: Data and knowledge integration in systems biological dynamic models
16
with r= 0.97
Mean correlation coefficient: r=0.90 (±0.05),
(averaged over 14 AA)
Outline
Introduction
3 Examples:
1) Knowledge-driven modeling (and model fit to data)(Human liver cell bioreactor modeling)
Guthke R: Data and knowledge integration in systems biological dynamic models
2) Data-driven modelingTool: NetGenerator(Infection)
3) Integrated data- and knowledge-diven modeling(Response towards an anti-rheumatic drug)
DataImmune response to bacterial infection
Boldrick, JC et al:
“Stereotyped and specific gene expression programs in human innate immune response to bacteria.”
PNAS 99 (2002), 972-977.
http://genome-www.stanford.edu/hostresponse/
Guthke R: Data and knowledge integration in systems biological dynamic models
• Peripheral blood mononuclear cells (PBMCs)
• infected by heat-killed pathogenic Escherichia coli
• gene expression of 18432 cDNAs of PBMCs measured
• at 5 time points t (t= 0, ½, 1, 2, 4 h) after infection
http://genome-www.stanford.edu/hostresponse/
Model Optimization
Experimental Data (pre-processed, selected)
Model Structure Search
Guthke R: Data and knowledge integration in systems biological dynamic models
Model Structure Search
Model Parameter Fit
Guthke et al., Bioinformatics, 21 (2005): 1626-1634
Töpfer et al., Lecture Notes in Bioinformatics, 4366 (2007), 119-130
Preprocessing of Data
Selection of differentially expressed genes1336 genes up- or down-regulated by at least the factor 8 (ratio <2-3 or >23) at
one or more time points t1,..., t4 versus t0 (all 73.728 ratios are between 2–10.4
and 28.7).
Scaling
Guthke R: Data and knowledge integration in systems biological dynamic models
ScalingAll profiles start at t=0 with zero and pass the value +1 or –1 at one of thefollowing time points t1,..., t4.
Imputing of missing datak nearest neighbour algorithm by Troyanskaya, O et al.: Bioinformatics 17(2001), 520
Feature Selection
by Fuzzy c-means Clustering
# Genes (membership >50%)
Cluster
Guthke R: Data and knowledge integration in systems biological dynamic models
4949767188269137
123456
Optimization of the Cluster Number 2…20 cluster � 6 classes
Cluster validity index (Kim et al. 2001)
Guthke R: Data and knowledge integration in systems biological dynamic models
4-level procedure (Moeller et al. 2002)
repeating the complete analysis amultiple number (40x) of times to verifythat the results do not depend on initialpartitions
Feature Selection: Cluster representative genes
Repesentatives:(1) IL-1interleukin 1, alpha
(2) CD59antigen
(3) NFKBIEIL-1
CD59 NFKBIE
Guthke R: Data and knowledge integration in systems biological dynamic models
(3) NFKBIEnuclear factor of kappa light polypeptide gene enhancer
(4) STAT1signal transducer and activator of transcription 1, 91kD (STAT1)
(5) STAT5signal transducer and activator of transcription 5A (STAT5)
(6) MHC-IImajor histocompatibility complex, class II, DM alpha (MHC-II)
STAT1 STAT5 MHC-II
Model Structure Optimization Methods
Model Optimization
Guthke R: Data and knowledge integration in systems biological dynamic models
NetGenerator
• a heuristic structure optimization method for differential equation systems
NetGenerator
• optimizes the model structure and the resulting model parameters
finds models with • a minimum number of relevant parameters • an adequate fit to the measured data
Guthke R: Data and knowledge integration in systems biological dynamic models
Guthke et al., Bioinformatics, 21 (2005): 1626-1634
Töpfer et al., Lecture Notes in Bioinformatics, 4366 (2007), 119-130
NetGenerator Inference Method
Model:
Algorithm:
nin
jijij
ij bxadt
dx,...,1,
1
=∑=
+=
Guthke R: Data and knowledge integration in systems biological dynamic models
Algorithm: heuristic optimization algorithm minimizing both- the model fit error mse and - the number N of non-zero model parameters
mse mean square error
n number of observations (sample size)
N number of parameters to be estimated
Nn
mse
−=2χ
2)1(N
nmse
GCV−
=(Generalized Cross-Validation Index)
• Separate identification of the gene expression time series by submodels
• Combination of an inner and an outer optimization loop• Outer loop: extension of the overall model by one newly optimized submodel• Inner loop: test of all potential time series and selection of the best one
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
= + + + +
= + + + +
= + + + +
&
&
M
&
1 1,1 1 1,2 2 1, 1
2 2,1 1 2,2 2 2, 2
,1 1 ,2 2 ,
( ) ( ) ( ) ... ( ) ( )
( ) ( ) ( ) ... ( ) ( )
( ) ( ) ( ) ... ( ) ( )
q q
q q
q q q q q q q
x t w x t w x t w x t b u t
x t w x t w x t w x t b u t
x t w x t w x t w x t b u t
submodel 1
submodel 2
submodel q
Guthke R: Data and knowledge integration in systems biological dynamic models
• Inner loop: test of all potential time series and selection of the best one
• Growing and pruning methods for the optimization of the submodel structure (add/remove interactions between genes, influences from the external input to genes, additional time delayelements)
• Nonlinear optimization of model parameters
= + + + + + += + + + + + += + + + + + +
= + + + +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3 4
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x + +5 6( ) 0 ( ) 0 ( )t x t u t
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
Out
er L
oop
4
6
t)
Guthke R: Data and knowledge integration in systems biological dynamic models
0 1 2 3 4-6
-4
-2
0
2
4
Time [h]t
Gen
e E
xpre
ssio
n x(
t)
Time Series: 1 2 3 4 5 6
= + + + + + += + + + + + += + + + + + +
= + + +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
,
4
11 1( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( )
0 0
0 ( ) 0 ( 0 (
0 0
)
0x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t t
w
x
b
x t x t + + +5 6) 0 ( ) 0 ( ) 0 ( )x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
t)
Guthke R: Data and knowledge integration in systems biological dynamic models
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time [h]t
Gen
e E
xpre
ssio
n x(
t)
= + + + + += + + + + + += + + + + + +
= + +
−
+
−&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 (
50.00 56.
) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( )
1
0 ( )
0 0 0 0 0
0 ( )
2x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t + + +4 5 60 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t
Time Series: 1 2 3 4 5 6
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e1 = 14.012Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
Time Series: 1 2 3 4 5 6
= + + + + += + + + + + += + + + + + +
= + +
−
+
−&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 (
50.00 59.
) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( )
6
0 ( )
0 0 0 0 0
0 ( )
7x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t + + +4 5 60 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e1 = 14.027e2 = 9.579
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
Time Series: 1 2 3 4 5 6
= + + + + + += + + + + + += + + + + + +
= +
−
+ +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 (
49.96 93.
) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( )
3
0 ( )
0 0 0 0 0
0 ( )
9x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t + + +4 5 60 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e1 = 14.027e2 = 9.579e3 = 4.536
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
Time Series: 1 2 3 4 5 6
= + + + + += + + + + + += + + + + + +
= + +
− −
+
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( )
0.63 1.77
0 (
0 0 0
0
0
)
0x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
= + + + + + += + + + + + += + + + + +
+
−
+
= + +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0
0 0 0 0 0
( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( )
3.81 1
0 ( )
8.33
0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t
Time Series: 1 2 3 4 5 6
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705e5 = 0.317
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
Time Series: 1 2 3 4 5 6
= + + + + += + + + + + += + + + + + +
= + +
− −
+
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( )
0.00 0.75
0 (
0 0 0
0
0
)
0x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705e5 = 0.317e6 = 2.943
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
= + + + + + += + + + + + += + + + + +
+
−
+
= + +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2 3
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0
0 0 0 0 0
( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( )
3.81 1
0 ( )
8.33
0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t
Time Series: 1 2 3 4 5 6
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705e5 = 0.317e6 = 2.943
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
BestSubmodel
= − + + + + + += + + + + + += + + + + + +
= + +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
2,1 2,2 2
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0 ( )
0
0
0 0 0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t
w w b
+ + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t
Time Series: 1 2 3 4 5 6
(5)
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
(5) = − + + + + + += + + + += + + + + + +
= + +
− −
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 (
0
) 0 ( ) 0 ( ) 0 ( )
0 0 0
( ) 0 ( ) 0 (
50.00 56.16
)
0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t + + + +3 4 5 60 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t
Time Series: 1 2 3 4 5 6
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e1 = 14.012
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
(5)
Time Series: 1 2 3 4 5 6
= − + + + + + += + + + += + + + + + +
= +
− −
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0
8.29 2.89 3
( ) 0 (
9.380 0 0 0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t + + + + +3 4 5 6) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e1 = 14.027e2 = 0.082
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
(5)
Time Series: 1 2 3 4 5 6
= − + + + + + +− −= + + + + +
= + + + + + +
= +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
12.36 6.10 66.
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
(
0 0 0 0 00
) 0 ( ) 0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x + + + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
(5)
Time Series: 1 2 3 4 5 6
= − + + + + + += + + + + += + + + + + +
= +
− −
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( )
5.9
0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( )
3 2.02 22.210 0 0 0
0 ( ) 0 (
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x + + + + +3 4 5 6) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05e4 = 14.469
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
(5)
Time Series: 1 2 3 4 5 6
= − + + + + + += + + + + += + + + + + +
= +
− −
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) ( ) ( ) ( ) (0.46 0.00 1.30) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( )
0 0 0 0
0 (
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t + + + + +3 4 5 6) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05e4 = 14.469e6 = 1.285
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
(5)
Time Series: 1 2 3 4 5 6
= − + + + + + +− −= + + + + +
= + + + + + +
= +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
12.36 6.10 66.
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
(
0 0 0 0 00
) 0 ( ) 0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x + + + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t
Out
er L
oop
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05e4 = 14.469e6 = 1.285
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
BestSubmodel
= − + + + + + += − − + + + + += + + + + + +
= +
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
6 1 2
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) 12.36 ( ) 6.10 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 66.00 ( )
( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )
( ) 0 ( ) 0
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x + + + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t
Time Series: 1 2 3 4 5 6
(5)
(3)O
uter
Loo
p
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e3 = 7.675e-05
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
= − + + + + + += − − + + + + += + + + + + +
=
&
&
&
M
&
1 1 2 3 4 5 6
2 1 2 3 4
3,1 3,
5 6
3 1 2 3,32 3 4 5 6 3
6
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) 12.36 ( ) 6.10 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 66.00 ( )
( ) ( ) ( ) ( ) ( ) ( ) (0 0 ) (0 )
( )
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x
w b
t
w w
+ + + + + +1 2 3 4 5 60 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t x t x t u t
Time Series: 1 2 3 4 5 6
(5)
(3)O
uter
Loo
p
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e3 = 7.675e-05
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
= − + + + + + += − − + + + + += + + + + + +
&
&
&
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 3,1 1 3,2 2 3,3 3 3,4 4 3,5 5 3,6 6 3
( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )
( ) 12.36 ( ) 6.10 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 66.00 ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) (
x t x t x t x t x t x t x t u t
x t x t x t x t x t x t x t u t
x t w x t w x t w x t w x t w x t w x t b u t
= + + + + + +M
&6,1 6,2 6,3 6,4 6,5 6,6 66 1 2 3 4 5 6
)
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )w w w w w wx t x t x t x t x t x t x t u tb
Time Series: 1 2 3 4 5 6
(5)
(3)O
uter
Loo
p
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
4
6
Guthke R: Data and knowledge integration in systems biological dynamic models
Error: e5 = 0.317e3 = 7.675e-05
Inner Loop
0 1 2 3 4-6
-4
-2
0
2
4
Time t [h]
Gen
e E
xpre
ssio
n x(
t)
The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm
-2
0
2
4
6
Exp
ress
ion
x(t)
Immune response of peripheral blood mononuclear cells to bacterial infection with heat-killed pathogenic Escherichia coli
Guthke R: Data and knowledge integration in systems biological dynamic models
0 1 2 3 4-6
-4
-2
Time t [h]
Gen
e
Graph of the identified modelwith 14 relevant model parameters(42 possible modelparameters)
Measured and simulated Kinetics of the expression of 6 cluster-representatives genes
Infection
Response of the human blood cells (PBMCs) to infect ion by
pathogen Escherichia coli
Guthke R: Data and knowledge integration in systems biological dynamic models
Cluster analysis
Guthke et al., Bioinformatics (2005)
Repres. Gene expression
profiles
Network model
Outline
Introduction
3 Approaches – 3 Examples:
1) Knowledge-driven modeling (and model fit to data)The classical approach(Human liver cell bioreactor modeling)
Guthke R: Data and knowledge integration in systems biological dynamic models
(Human liver cell bioreactor modeling)
2) Data-driven modelingTool: NetGenerator(Infection)
3) Integrated data- and knowledge-diven modelingTool: TILAR(Response towards anti-rheumatic and anti-MS therapy)
TILAR =
“Transcription Factor binding site I ntegrating Least A ngle R egression”
Guthke R: Data and knowledge integration in systems biological dynamic models
Hecker M , ..., Guthke R (2009): Integrative modeling of transcriptional regulation in response to antirheumatic therapy. BMC Bioinformatics, 10:262
R-tool TILAR public available: www.sysbio.hki-jena.de >> Software
1st step of TILAR: Selection of candiate genes
genegene
Guthke R: Data and knowledge integration in systems biological dynamic models
2nd step of TILAR: Addition of Transcription factor s
genegene
TFTF
TFBS predictionTFBS prediction
Guthke R: Data and knowledge integration in systems biological dynamic models
TFBS predictionTFBS prediction(TF(TF--gene interaction)gene interaction)
3rd step of TILAR: Addition of gene-TF-edges by LAR S
genegene
TFTF
TFBS predictionTFBS predictionβ2
β3β4
β5
Guthke R: Data and knowledge integration in systems biological dynamic models
TFBS predictionTFBS prediction(TF(TF--gene interaction)gene interaction)
genegene--TF interactionTF interaction(model parameter)(model parameter)
β1
β2 β5
A sparse GRN can be found using LARS = Least Angle Regression (Efron et al., 2004) = fast solution of LASSO = Least Absolute Shrinkage and Selection Operator
3rd step of TILAR: Addition of gene-TF-edges by LAR S
Guthke R: Data and knowledge integration in systems biological dynamic models
The TILAR Approach for integrative [and adaptive] G RN modeling
data modeltemplate(text mining)
template(TFBS pred.)
Guthke R: Data and knowledge integration in systems biological dynamic models
3rd Example: Response to Anti-Rheumatic Drug Administra tion
Guthke R: Data and knowledge integration in systems biological dynamic models
•Data: 19 Patients suffering from Rheumatoid Arthritis
•Anti-TNF-alpha therapy (Etanercept, Enbrel®)
• Samples: Peripheral Blood Mononuclear Cells (PBMC) before and 3 d after drug injection
•Affymetrix Chip U133A
Gene Regulatory Network
inferred from anti-TNF-alpha therapy Response of RA patients
TFBS werederived fromthe UCSCdatabase
Guthke R: Data and knowledge integration in systems biological dynamic models
databasebuild hg18and Biobase‘Transfac
Gene Regulatory Network - Subnetworks of Interest
inferred from anti-TNF-alpha therapy Response of RA patients
Guthke R: Data and knowledge integration in systems biological dynamic models
Integrative Gene Regulatory Network Inference
using TILAR (2nd Example)
inferred from IFN-beta -1a (Avonex®) therapy respon se of 24 Multiple sclerosis (MS) patients
Guthke R: Data and knowledge integration in systems biological dynamic models
Gene Regulatory Network - Subnetworks of Interest
inferred from IFN-beta therapy response of MS patie nts
Guthke R: Data and knowledge integration in systems biological dynamic models
Gene Regulatory Network Inference Assessment
ROC characteristics comparing the inferred edges with gene-gene links from the literature automatically extracted by PathwayArchitect
Guthke R: Data and knowledge integration in systems biological dynamic models
RA network MS network
Summary
1) Knowledge-driven modeling Model structure is identified using knowledgeModel parameters are identified by fit to the exper. data
2) Data-driven modelingModel structure as well as model parameters areidentified by model fit to the experimental data
Guthke R: Data and knowledge integration in systems biological dynamic models
identified by model fit to the experimental data(after feature selection by use of knowledge)
3) Integrated data- and knowledge-diven modelingDifferent approaches, current direction of research,…
Number of papers
on „Pathway Inference“ or „Reverse Engineering“ is doubling all 2 years since 1995
Guthke R: Data and knowledge integration in systems biological dynamic models
Stolovitzky G, et al. (2009): Lessons from the DREAM2 Challenges. Ann N Y Acad Sci. 1158:159-95.
Integration of Data and Prior Knowledge
Guthke R: Data and knowledge integration in systems biological dynamic models
Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R (2009): Gene Regulatory Network Inference - Data Integration in Dynamic Models BioSystems, 96:86-103