Leibniz-Institute for Natural Product Research and...

73
R. Guthke: Data and knowledge integration in systems Leibniz-Institute for Natural Product Research and Infection Biology Hans-Knoell-Institute Jena, Germany Guthke R: Data and knowledge integration in systems biological dynamic models biological dynamic models

Transcript of Leibniz-Institute for Natural Product Research and...

R. Guthke :

Data and knowledge integration in systems

Leibniz-Institute for Natural Product Research and Infection BiologyHans-Knoell-InstituteJena, Germany

Guthke R: Data and knowledge integration in systems biological dynamic models

Data and knowledge integration in systems

biological dynamic models

Cyclic Operation of Experimental and Modeling Work

Feature

Data-Pre-processing

Experiment

Hypotheses

Guthke R: Data and knowledge integration in systems biological dynamic models

ModelOptimization

ModelValidation

FeatureSelection

Literature &Databases

extract hypotheses

Top-down Data-driven

entire system

descriptions

analysing whole systems

Two Complementary Approaches of Modeling

Guthke R: Data and knowledge integration in systems biological dynamic models

analysing andmerging sub-systems

Bottom-upKnowledge-driven

extract hypothesesabout relationships

between components

Knowledge-driven Modeling

Knowledge from e.g. Ingenuity Pathway Analysis

describing complex interactionsbetween

genes, metabolites

or proteins

Guthke R: Data and knowledge integration in systems biological dynamic models

Pathway-Analysis of representative genes of immune response to infection, Calvano et al. Nature, 437 (2005)

in a living cell or tissue

or organism

Data

Transcriptome

ProteomeData Matrix

Data-driven Modelling

Guthke R: Data and knowledge integration in systems biological dynamic models

Data Warehouse andAnalysis Tools

Metabolome

up-

load

Complexity of Gene Regulatory Network (GRN) Modelin g

Different types of molecular interaction in gene re gulation: Protein – DNA, Protein-Protein, Protein - Ligand

Guthke R: Data and knowledge integration in systems biological dynamic models

Metabolite 1 Metabolite 2

Protein 2Complex 3 -4

Complexity Reduction by Projection on the Transcriptome Level (Influence Models)

Guthke R: Data and knowledge integration in systems biological dynamic models

Protein 1

Protein 2

Protein 3

Protein 4

Complex 3 -4

Gene 1

Gene 2

Gene 3

Gene 4

Prominent examples

of large-scale GRN inference for microorganisms

Organism Reference Number ofGenes

Number ofsamples

Guthke R: Data and knowledge integration in systems biological dynamic models

Genes

Halobacteriumsp. NRC-1

Bonneau et al.(2006)

1934 268

Escherichia coli Faith et al. (2007) 4345 445

Saccharomycescerevisiae

Lee et al. (2002) 6312 300

Experimental Data and Prior Knowledge

Guthke R: Data and knowledge integration in systems biological dynamic models

http://gardnerlab.bu.edu. Faith & Gardner:Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles, PLoS Biol, , 5:54-66, 2007.CLR = Context Likelihood of Relatedness

Outline

Introduction

3 Approaches – 3 Examples:

1) Knowledge-driven modelingThe classical approach(Human liver cell bioreactor modeling)

Guthke R: Data and knowledge integration in systems biological dynamic models

(Human liver cell bioreactor modeling)

2) Data-driven modelingTool: NetGenerator(Infection)

3) Integrated data- and knowledge-diven modelingTool: TILAR(Response towards anti-rheumatic and anti-MS therapy)

Outline

Introduction

3 Examples:

1) Knowledge-driven modeling (and model fit to data)The classical approach(Human liver cell bioreactor modeling)

Guthke R: Data and knowledge integration in systems biological dynamic models

(Human liver cell bioreactor modeling)

2) Data-driven modeling(Infection)

3) Integrated data- and knowledge-diven modeling(Response towards an anti-rheumatic drug)

Bioreactor for liver support therapy

an option for assisting or replacing the failing organuntil regeneration occurs or transplantation can be performed

Guthke R: Data and knowledge integration in systems biological dynamic models

250

300

350

400

The BioProcess Kinetics

inoculation by suspensions of primary liver cells obtained from human livers (discarded from transplantation)

use for therapy

Guthke R: Data and knowledge integration in systems biological dynamic models

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

recovery phase (3 days) stand-by phase (up to 34 days)

Days

Aim

Quantitative Understanding of the Input/Output-Behaviour

over the time period of the first 6 days

Bioreactor Inflow Bioreactor OutflowBioreactor400 4000

T U]

Guthke R: Data and knowledge integration in systems biological dynamic models

0

200

t [d]

0

2000

t [d]

0

2000

4000

t [d]

0

100

200

t [d]

0

1000

2000

t [d]

600

800

1000

t [d]

0 2 4 60

200

400

t [d]0 2 4 6

0

50

100

t [d]

ME

TA

SP

GLU

NH

3

LEU

AS

NG

LNU

RE

A

0

100

FA [m

l/h]

0 2 4 60

0.51

t [d]

FB [m

l/h]

Bioprocess Biosyst Eng 28 (2006), 331

We selected 24 variables (from 99)

(N- and C-sources)

18 Amino acids (AA) & 6 others1234567

LEUHISARGVALTRPPHEILE

1920

2122

NH3UREA

GALSOR

AmmoniaUrea

GalactoseSorbitol

Guthke R: Data and knowledge integration in systems biological dynamic models

789

101112131415161718

ILEALATYRLYSMETSERGLYTHRASPASNGLUGLN

222324

SORGLCLAC

SorbitolGlucoseLactate

Non-directed:

e.g. Correlation Network

Directed, probabilistic:

e.g. Bayesian Network

Directed, deterministic:

ODE(Ordinary Differential Equation)

Types of Network Models (Examples)

Guthke R: Data and knowledge integration in systems biological dynamic models

Metabolic Network of Amino Acids, Amonia and Urea in a Liver Cell BioreactorSchmidt-Heck et al., LN Bioinformatics, 2004

The Model

48 ODEsOrdinaryDifferentialEquations

48 variables for

24 compounds & 1717281925171826

1518141414

9

11717,020

17

1616161915151616,02016

152915201919181915

15171716161515,02015

15291414141414,02014

,020

,010,01011,0

,)(

)(/

,)(/

,))/((

)(/

)(/

13,...,1for,)(/

24,...,1for

)(//)(/)(/)(

pppcpcppcp

cpcspcspccVpdt

dc

pppcpccpccVpdt

dc

cpcpcppcp

pppcpcpccVpdt

dc

cppppcpccVpdt

dc

ipppcpccVpdt

dc

i

ccVpcVtFcVtFcVtFdt

dc

AA

iiii

AA

AA

AA

AAiiiiii

iiiBiBAiAi

⋅+⋅+⋅+−⋅+

+⋅+⋅⋅+⋅⋅+−⋅=

⋅+⋅−⋅⋅+−⋅=

⋅++⋅++⋅−

−⋅+⋅+⋅+−⋅=

⋅+⋅+⋅−−⋅=

=⋅+⋅−−⋅=

=

−⋅−⋅−⋅+⋅=

∑=

Guthke R: Data and knowledge integration in systems biological dynamic models

24 compounds &2 comparments)

242419152015322323302424,02024

19152015312323222221212323,02023

,020

33

27

19152015192020,02020

191725151515201519

182616161728

13

101919,020

19

1818261917251818,02018

1717281925171826

)/(2)(/

,)/()(/

22and21for,)(/

0(t)else1dtfor(t)

(t)else3dtfor0(t)

,)/()(/

,))/((

))(1()(/

,)(/

,)(

cpccpcpcppccVpdt

dc

ccpcpcpcpcpccVpdt

dc

icpccVpdt

dc

ppppp

pgg

ccpcpccVpdt

dc

ccpcpcpcp

cpcpctgpcspccVpdt

dc

pppcpccpccVpdt

dc

pppcpcppcp

iiiii

iiii

AA

AA

⋅−⋅+⋅+⋅⋅⋅+−⋅=

⋅+⋅+⋅−⋅+⋅+−⋅=

=⋅−−⋅=

=<==<=

⋅+⋅+−⋅=

⋅⋅+⋅++⋅−

−⋅+⋅+⋅−⋅+⋅⋅+−⋅=

⋅+⋅−⋅⋅+−⋅=

∑=

ODEs 1-24 describing the

Inflow & Outflow and Diffusion

Inflow

Outflow

Guthke R: Data and knowledge integration in systems biological dynamic models

Measured Data c 0i

Waste

Perfusion Circuit

Outflow

Fresh MediumCAi, (all)CBi (only Aspartate)

250 ml/minFB =0����50 ml/h

FA =150 ml/h

F0

24,...,1for

)(//)(/)(/)( ,010,01011,0

=

−⋅−⋅−⋅+⋅=

i

ccVpcVtFcVtFcVtFdt

dciiiBiBAiA

i

Diffusion between the two Compartments :

1) Liver CellCompartmentV2 = 600 ml, ci

Flux via the Membrane

p0

Guthke R: Data and knowledge integration in systems biological dynamic models

Measured data

2) PerfusionCompartmentV1 = 900 ml, c0i

Membrane

The next 13 ODEs:

13,...,1for,)(/ ,020 =⋅+⋅−−⋅= ipppcpccVpdt

dcAAiiiii

i

�24,...,1for

)(//)(/)(/)( ,010,01011,0

=

−⋅−⋅−⋅+⋅=

i

ccVpcVtFcVtFcVtFdt

dciiiBiBAiA

i

The remaining 11 ODEs (textbook knowledge):

The first 24 ODEs:

Guthke R: Data and knowledge integration in systems biological dynamic models

Model Fit to the Measured Data

Example: Run 2

Guthke R: Data and knowledge integration in systems biological dynamic models

Model Fit to data of 7 liver cell culture runs

� model parameter setsPrediction: proteolytic inflow,...

Runs *24 variables

Guthke R: Data and knowledge integration in systems biological dynamic models

Hypothesis: Proteolysis

The temporal maximum in the time series of LYS and other AA (VAL, LEU, ...) at t=1 d was interpreted as the result of proteolysis during the first hour of cultivation (recovery phase)

Model Fit of LYS to 7 runs:

<

=

+⋅−−⋅=

else

dtforppp

ppcpccVpdt

dci

0

1

,)(/

33

999,0209

Guthke R: Data and knowledge integration in systems biological dynamic models

The Model parameters p9 and p33 �

(after model fit to randomly disturbed LYS data)

How to estimate the influx from proteolysis

to the different amino acids ?

LYS at t=1 h were found to be strong positively correlated

with other AA i at t=1 h.AAi(1 h) = pAAi*LYS(1 h) + bi

regression over 7 runs e.g. for ASN : pAA16 = 0.25, b16= -30

Guthke R: Data and knowledge integration in systems biological dynamic models

16

with r= 0.97

Mean correlation coefficient: r=0.90 (±0.05),

(averaged over 14 AA)

Outline

Introduction

3 Examples:

1) Knowledge-driven modeling (and model fit to data)(Human liver cell bioreactor modeling)

Guthke R: Data and knowledge integration in systems biological dynamic models

2) Data-driven modelingTool: NetGenerator(Infection)

3) Integrated data- and knowledge-diven modeling(Response towards an anti-rheumatic drug)

DataImmune response to bacterial infection

Boldrick, JC et al:

“Stereotyped and specific gene expression programs in human innate immune response to bacteria.”

PNAS 99 (2002), 972-977.

http://genome-www.stanford.edu/hostresponse/

Guthke R: Data and knowledge integration in systems biological dynamic models

• Peripheral blood mononuclear cells (PBMCs)

• infected by heat-killed pathogenic Escherichia coli

• gene expression of 18432 cDNAs of PBMCs measured

• at 5 time points t (t= 0, ½, 1, 2, 4 h) after infection

http://genome-www.stanford.edu/hostresponse/

Model Optimization

Experimental Data (pre-processed, selected)

Model Structure Search

Guthke R: Data and knowledge integration in systems biological dynamic models

Model Structure Search

Model Parameter Fit

Guthke et al., Bioinformatics, 21 (2005): 1626-1634

Töpfer et al., Lecture Notes in Bioinformatics, 4366 (2007), 119-130

Preprocessing of Data

Selection of differentially expressed genes1336 genes up- or down-regulated by at least the factor 8 (ratio <2-3 or >23) at

one or more time points t1,..., t4 versus t0 (all 73.728 ratios are between 2–10.4

and 28.7).

Scaling

Guthke R: Data and knowledge integration in systems biological dynamic models

ScalingAll profiles start at t=0 with zero and pass the value +1 or –1 at one of thefollowing time points t1,..., t4.

Imputing of missing datak nearest neighbour algorithm by Troyanskaya, O et al.: Bioinformatics 17(2001), 520

Feature Selection

by Fuzzy c-means Clustering

# Genes (membership >50%)

Cluster

Guthke R: Data and knowledge integration in systems biological dynamic models

4949767188269137

123456

Optimization of the Cluster Number 2…20 cluster � 6 classes

Cluster validity index (Kim et al. 2001)

Guthke R: Data and knowledge integration in systems biological dynamic models

4-level procedure (Moeller et al. 2002)

repeating the complete analysis amultiple number (40x) of times to verifythat the results do not depend on initialpartitions

Feature Selection: Cluster representative genes

Repesentatives:(1) IL-1interleukin 1, alpha

(2) CD59antigen

(3) NFKBIEIL-1

CD59 NFKBIE

Guthke R: Data and knowledge integration in systems biological dynamic models

(3) NFKBIEnuclear factor of kappa light polypeptide gene enhancer

(4) STAT1signal transducer and activator of transcription 1, 91kD (STAT1)

(5) STAT5signal transducer and activator of transcription 5A (STAT5)

(6) MHC-IImajor histocompatibility complex, class II, DM alpha (MHC-II)

STAT1 STAT5 MHC-II

Model Structure Optimization Methods

Model Optimization

Guthke R: Data and knowledge integration in systems biological dynamic models

NetGenerator

• a heuristic structure optimization method for differential equation systems

NetGenerator

• optimizes the model structure and the resulting model parameters

finds models with • a minimum number of relevant parameters • an adequate fit to the measured data

Guthke R: Data and knowledge integration in systems biological dynamic models

Guthke et al., Bioinformatics, 21 (2005): 1626-1634

Töpfer et al., Lecture Notes in Bioinformatics, 4366 (2007), 119-130

NetGenerator Inference Method

Model:

Algorithm:

nin

jijij

ij bxadt

dx,...,1,

1

=∑=

+=

Guthke R: Data and knowledge integration in systems biological dynamic models

Algorithm: heuristic optimization algorithm minimizing both- the model fit error mse and - the number N of non-zero model parameters

mse mean square error

n number of observations (sample size)

N number of parameters to be estimated

Nn

mse

−=2χ

2)1(N

nmse

GCV−

=(Generalized Cross-Validation Index)

• Separate identification of the gene expression time series by submodels

• Combination of an inner and an outer optimization loop• Outer loop: extension of the overall model by one newly optimized submodel• Inner loop: test of all potential time series and selection of the best one

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

= + + + +

= + + + +

= + + + +

&

&

M

&

1 1,1 1 1,2 2 1, 1

2 2,1 1 2,2 2 2, 2

,1 1 ,2 2 ,

( ) ( ) ( ) ... ( ) ( )

( ) ( ) ( ) ... ( ) ( )

( ) ( ) ( ) ... ( ) ( )

q q

q q

q q q q q q q

x t w x t w x t w x t b u t

x t w x t w x t w x t b u t

x t w x t w x t w x t b u t

submodel 1

submodel 2

submodel q

Guthke R: Data and knowledge integration in systems biological dynamic models

• Inner loop: test of all potential time series and selection of the best one

• Growing and pruning methods for the optimization of the submodel structure (add/remove interactions between genes, influences from the external input to genes, additional time delayelements)

• Nonlinear optimization of model parameters

= + + + + + += + + + + + += + + + + + +

= + + + +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3 4

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x + +5 6( ) 0 ( ) 0 ( )t x t u t

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

Out

er L

oop

4

6

t)

Guthke R: Data and knowledge integration in systems biological dynamic models

0 1 2 3 4-6

-4

-2

0

2

4

Time [h]t

Gen

e E

xpre

ssio

n x(

t)

Time Series: 1 2 3 4 5 6

= + + + + + += + + + + + += + + + + + +

= + + +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

,

4

11 1( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( )

0 0

0 ( ) 0 ( 0 (

0 0

)

0x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t t

w

x

b

x t x t + + +5 6) 0 ( ) 0 ( ) 0 ( )x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

t)

Guthke R: Data and knowledge integration in systems biological dynamic models

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time [h]t

Gen

e E

xpre

ssio

n x(

t)

= + + + + += + + + + + += + + + + + +

= + +

+

−&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 (

50.00 56.

) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( )

1

0 ( )

0 0 0 0 0

0 ( )

2x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t + + +4 5 60 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t

Time Series: 1 2 3 4 5 6

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e1 = 14.012Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

Time Series: 1 2 3 4 5 6

= + + + + += + + + + + += + + + + + +

= + +

+

−&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 (

50.00 59.

) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( )

6

0 ( )

0 0 0 0 0

0 ( )

7x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t + + +4 5 60 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e1 = 14.027e2 = 9.579

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

Time Series: 1 2 3 4 5 6

= + + + + + += + + + + + += + + + + + +

= +

+ +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 (

49.96 93.

) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( )

3

0 ( )

0 0 0 0 0

0 ( )

9x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t + + +4 5 60 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e1 = 14.027e2 = 9.579e3 = 4.536

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

Time Series: 1 2 3 4 5 6

= + + + + += + + + + + += + + + + + +

= + +

− −

+

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( )

0.63 1.77

0 (

0 0 0

0

0

)

0x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

= + + + + + += + + + + + += + + + + +

+

+

= + +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0

0 0 0 0 0

( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( )

3.81 1

0 ( )

8.33

0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t

Time Series: 1 2 3 4 5 6

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705e5 = 0.317

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

Time Series: 1 2 3 4 5 6

= + + + + += + + + + + += + + + + + +

= + +

− −

+

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( )

0.00 0.75

0 (

0 0 0

0

0

)

0x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705e5 = 0.317e6 = 2.943

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

= + + + + + += + + + + + += + + + + +

+

+

= + +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2 3

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0

0 0 0 0 0

( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( )

3.81 1

0 ( )

8.33

0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t + + +4 5 6( ) 0 ( ) 0 ( ) 0 ( )x t x t x t u t

Time Series: 1 2 3 4 5 6

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e1 = 14.027e2 = 9.579e3 = 4.536e4 = 21.705e5 = 0.317e6 = 2.943

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

BestSubmodel

= − + + + + + += + + + + + += + + + + + +

= + +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

2,1 2,2 2

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0 ( )

0

0

0 0 0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t

w w b

+ + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t

Time Series: 1 2 3 4 5 6

(5)

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

(5) = − + + + + + += + + + += + + + + + +

= + +

− −

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 (

0

) 0 ( ) 0 ( ) 0 ( )

0 0 0

( ) 0 ( ) 0 (

50.00 56.16

)

0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t + + + +3 4 5 60 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t

Time Series: 1 2 3 4 5 6

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e1 = 14.012

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

(5)

Time Series: 1 2 3 4 5 6

= − + + + + + += + + + += + + + + + +

= +

− −

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0

8.29 2.89 3

( ) 0 (

9.380 0 0 0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t + + + + +3 4 5 6) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e1 = 14.027e2 = 0.082

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

(5)

Time Series: 1 2 3 4 5 6

= − + + + + + +− −= + + + + +

= + + + + + +

= +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

12.36 6.10 66.

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

(

0 0 0 0 00

) 0 ( ) 0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x + + + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

(5)

Time Series: 1 2 3 4 5 6

= − + + + + + += + + + + += + + + + + +

= +

− −

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( )

5.9

0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( )

3 2.02 22.210 0 0 0

0 ( ) 0 (

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x + + + + +3 4 5 6) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05e4 = 14.469

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

(5)

Time Series: 1 2 3 4 5 6

= − + + + + + += + + + + += + + + + + +

= +

− −

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) ( ) ( ) ( ) (0.46 0.00 1.30) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( )

0 0 0 0

0 (

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t + + + + +3 4 5 6) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05e4 = 14.469e6 = 1.285

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

(5)

Time Series: 1 2 3 4 5 6

= − + + + + + +− −= + + + + +

= + + + + + +

= +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

12.36 6.10 66.

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

(

0 0 0 0 00

) 0 ( ) 0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x + + + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t

Out

er L

oop

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e1 = 14.027e2 = 0.082e3 = 7.675e-05e4 = 14.469e6 = 1.285

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

BestSubmodel

= − + + + + + += − − + + + + += + + + + + +

= +

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 1 2 3 4 5 6

6 1 2

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) 12.36 ( ) 6.10 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 66.00 ( )

( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )

( ) 0 ( ) 0

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x + + + + +3 4 5 6( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )t x t x t x t x t u t

Time Series: 1 2 3 4 5 6

(5)

(3)O

uter

Loo

p

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e3 = 7.675e-05

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

= − + + + + + += − − + + + + += + + + + + +

=

&

&

&

M

&

1 1 2 3 4 5 6

2 1 2 3 4

3,1 3,

5 6

3 1 2 3,32 3 4 5 6 3

6

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) 12.36 ( ) 6.10 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 66.00 ( )

( ) ( ) ( ) ( ) ( ) ( ) (0 0 ) (0 )

( )

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x

w b

t

w w

+ + + + + +1 2 3 4 5 60 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( )x t x t x t x t x t x t u t

Time Series: 1 2 3 4 5 6

(5)

(3)O

uter

Loo

p

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e3 = 7.675e-05

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

= − + + + + + += − − + + + + += + + + + + +

&

&

&

1 1 2 3 4 5 6

2 1 2 3 4 5 6

3 3,1 1 3,2 2 3,3 3 3,4 4 3,5 5 3,6 6 3

( ) 3.81 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 18.33 ( )

( ) 12.36 ( ) 6.10 ( ) 0 ( ) 0 ( ) 0 ( ) 0 ( ) 66.00 ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) (

x t x t x t x t x t x t x t u t

x t x t x t x t x t x t x t u t

x t w x t w x t w x t w x t w x t w x t b u t

= + + + + + +M

&6,1 6,2 6,3 6,4 6,5 6,6 66 1 2 3 4 5 6

)

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )w w w w w wx t x t x t x t x t x t x t u tb

Time Series: 1 2 3 4 5 6

(5)

(3)O

uter

Loo

p

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

4

6

Guthke R: Data and knowledge integration in systems biological dynamic models

Error: e5 = 0.317e3 = 7.675e-05

Inner Loop

0 1 2 3 4-6

-4

-2

0

2

4

Time t [h]

Gen

e E

xpre

ssio

n x(

t)

The NetThe NetGeneGenerator Structure Optimization Algorithmrator Structure Optimization Algorithm

-2

0

2

4

6

Exp

ress

ion

x(t)

Immune response of peripheral blood mononuclear cells to bacterial infection with heat-killed pathogenic Escherichia coli

Guthke R: Data and knowledge integration in systems biological dynamic models

0 1 2 3 4-6

-4

-2

Time t [h]

Gen

e

Graph of the identified modelwith 14 relevant model parameters(42 possible modelparameters)

Measured and simulated Kinetics of the expression of 6 cluster-representatives genes

Infection

Response of the human blood cells (PBMCs) to infect ion by

pathogen Escherichia coli

Guthke R: Data and knowledge integration in systems biological dynamic models

Cluster analysis

Guthke et al., Bioinformatics (2005)

Repres. Gene expression

profiles

Network model

Outline

Introduction

3 Approaches – 3 Examples:

1) Knowledge-driven modeling (and model fit to data)The classical approach(Human liver cell bioreactor modeling)

Guthke R: Data and knowledge integration in systems biological dynamic models

(Human liver cell bioreactor modeling)

2) Data-driven modelingTool: NetGenerator(Infection)

3) Integrated data- and knowledge-diven modelingTool: TILAR(Response towards anti-rheumatic and anti-MS therapy)

TILAR =

“Transcription Factor binding site I ntegrating Least A ngle R egression”

Guthke R: Data and knowledge integration in systems biological dynamic models

Hecker M , ..., Guthke R (2009): Integrative modeling of transcriptional regulation in response to antirheumatic therapy. BMC Bioinformatics, 10:262

R-tool TILAR public available: www.sysbio.hki-jena.de >> Software

1st step of TILAR: Selection of candiate genes

genegene

Guthke R: Data and knowledge integration in systems biological dynamic models

2nd step of TILAR: Addition of Transcription factor s

genegene

TFTF

TFBS predictionTFBS prediction

Guthke R: Data and knowledge integration in systems biological dynamic models

TFBS predictionTFBS prediction(TF(TF--gene interaction)gene interaction)

3rd step of TILAR: Addition of gene-TF-edges by LAR S

genegene

TFTF

TFBS predictionTFBS predictionβ2

β3β4

β5

Guthke R: Data and knowledge integration in systems biological dynamic models

TFBS predictionTFBS prediction(TF(TF--gene interaction)gene interaction)

genegene--TF interactionTF interaction(model parameter)(model parameter)

β1

β2 β5

A sparse GRN can be found using LARS = Least Angle Regression (Efron et al., 2004) = fast solution of LASSO = Least Absolute Shrinkage and Selection Operator

3rd step of TILAR: Addition of gene-TF-edges by LAR S

Guthke R: Data and knowledge integration in systems biological dynamic models

The TILAR Approach for integrative [and adaptive] G RN modeling

data modeltemplate(text mining)

template(TFBS pred.)

Guthke R: Data and knowledge integration in systems biological dynamic models

3rd Example: Response to Anti-Rheumatic Drug Administra tion

Guthke R: Data and knowledge integration in systems biological dynamic models

•Data: 19 Patients suffering from Rheumatoid Arthritis

•Anti-TNF-alpha therapy (Etanercept, Enbrel®)

• Samples: Peripheral Blood Mononuclear Cells (PBMC) before and 3 d after drug injection

•Affymetrix Chip U133A

Gene Regulatory Network

inferred from anti-TNF-alpha therapy Response of RA patients

TFBS werederived fromthe UCSCdatabase

Guthke R: Data and knowledge integration in systems biological dynamic models

databasebuild hg18and Biobase‘Transfac

Gene Regulatory Network - Subnetworks of Interest

inferred from anti-TNF-alpha therapy Response of RA patients

Guthke R: Data and knowledge integration in systems biological dynamic models

Integrative Gene Regulatory Network Inference

using TILAR (2nd Example)

inferred from IFN-beta -1a (Avonex®) therapy respon se of 24 Multiple sclerosis (MS) patients

Guthke R: Data and knowledge integration in systems biological dynamic models

Gene Regulatory Network - Subnetworks of Interest

inferred from IFN-beta therapy response of MS patie nts

Guthke R: Data and knowledge integration in systems biological dynamic models

Gene Regulatory Network Inference Assessment

ROC characteristics comparing the inferred edges with gene-gene links from the literature automatically extracted by PathwayArchitect

Guthke R: Data and knowledge integration in systems biological dynamic models

RA network MS network

Summary

1) Knowledge-driven modeling Model structure is identified using knowledgeModel parameters are identified by fit to the exper. data

2) Data-driven modelingModel structure as well as model parameters areidentified by model fit to the experimental data

Guthke R: Data and knowledge integration in systems biological dynamic models

identified by model fit to the experimental data(after feature selection by use of knowledge)

3) Integrated data- and knowledge-diven modelingDifferent approaches, current direction of research,…

Number of papers

on „Pathway Inference“ or „Reverse Engineering“ is doubling all 2 years since 1995

Guthke R: Data and knowledge integration in systems biological dynamic models

Stolovitzky G, et al. (2009): Lessons from the DREAM2 Challenges. Ann N Y Acad Sci. 1158:159-95.

Integration of Data and Prior Knowledge

Guthke R: Data and knowledge integration in systems biological dynamic models

Hecker M, Lambeck S, Toepfer S, van Someren E, Guthke R (2009): Gene Regulatory Network Inference - Data Integration in Dynamic Models BioSystems, 96:86-103

Thank you

Guthke R: Data and knowledge integration in systems biological dynamic models

Michael Hecker Jörg Linde

Jena