Reasoning about Uncertainty in Biological Systems Andrei Doncescu LAAS CNRS Aix-en-Province 18...

Reasoning about Uncertainty in Biological Systems

Andrei Doncescu

LAAS CNRS

Aix-en-Province 18 September

Structural Bioinformatics• Cells buzz with activity. They take nutrients and convert to

energy for a number of purposes. Reproduce themselves and are called upon constantly to synthesize protein molecules

• Gene : a segment of DNA that are programmed for the • production of a specific protein

• Gene expression: cell produces the protein encoded • by a particular gene

• Genome: the entire set of genetic instruction for a givenorganism

• Nucleotide : the fundamental unit of DNA and RNA

• Protein: a molecule consisting of up to thousand of amino acids

• Amino Acid : a class of 20 different molecules (C,H,N,O,S) which can merge to form a bond

Structure and Modeling of Metabolic Pathway

Genome

Proteomic

Transcriptomic

Metabolomics/Fluxomics

DNA

RNA

Proteins

Metabolites

Systemic approach : reconciliation of the 3 levels of Systemic approach : reconciliation of the 3 levels of observation (3M : macro,micro,molecular) observation (3M : macro,micro,molecular)

•Mixing power, macro, micromixing, reactivity, - coupled systems•Expert systems, supervision•Scale-up and down ; CFD

Tool : bioreactorMetrologyKineticsStoechiometry, mediaClassification of populationsPhysico-mechanical et physico chemical environment Hydrodynamics, transfers

Information flux

Biochips DNA, proteins,bioinformatic, network of genes, of proteins, of metabolites

Microorganism: a production facility

• Biological kinetics• Implementation• Metabolic flux ; fluxome• Metabolic network •In vivo, ex vivo enzymology, stock flux, energy/matter • Thermodynamics•In vivo, ex vivo NMR • Structured modelling and metabolic descriptor

Signal

TranscriptomeProteomeMetabolomeBiochips

MICROSCOPIC LEVEL

MACROSCOPIC LEVEL

MOLECULAR LEVEL

Scientific Reasoning

Hypothesis Generation

PredictionObservation

Deduction

Verification

Abduction

Reasoning about biological systems

• Construction of a system model – The task of forming a model to explain a given set of experimental

results is called model identification. – This is a form of inductive inference. For example, if the levels of the

metabolites in glycolysis are observed over a series of time steps, and from this data the reactions of glycolysis are inferred, this would be model identification.

• Simulation of the system behavior based on the model constructed – This is a form of deductive inference. For example, a dynamic model of

glycolysis might tell you how the level of pyruvate in a cell varies over time as the amount of glucose increases. If the deductive predictions of a model are inconsistent with observed behaviour then the model is falsified.

• A Model is a simplified description of a complex entity or process and consists :– A set of systems constraints in terms of state variable– And/or Their time derivatives

http://ardictionary.com/A/1

http://ardictionary.com/Simplified/7856

http://ardictionary.com/Description/3415

http://ardictionary.com/Of/869

http://ardictionary.com/A/1

http://ardictionary.com/Complex/9256

http://ardictionary.com/Entity/3554

http://ardictionary.com/Or/1924

http://ardictionary.com/Process/12122

Representation of Biological Systems

• Directed graphs (for example, decision trees, cluster analysis)

• Matrix models (for example, linear systems, Markov processes),

• Dynamical systems

• Cellular automata .

The Problem

• Development of Molecular Biology produces a huge quantity of data

• Interaction between molecules has an effect on the cell behavior

• Mathematical Models are used to extract the emergent laws of the combinatory interactions.

• Difficulties : – interactions non-linear – Model parameters difficult to measure

M Gactivation

inhibition

Our approach

Fuzzy logic

Hierarchical Classification

Inductive LogicProgramming

Classification MachineMeasures-

Biologic Knowledge

Biologic Rules

Hypotheses or

« Classes »

3 levels ofanalysis

Relevant Information

Time Series

• Time series analysis is

often associated with discovery

of patterns such as :

– Increasing– Decreasing– frequency of sequences, repeating sequences– prediction of future values or specifically termed

forecasting in the time series context.

paramètres de fermentation

temps (h)

0 2 4 6 8 10 12

tem

p. (°

C)

36.8

37.0

37.2

37.4

2D Graph 2

0 2 4 6 8 10 12

RP

M

600

900

1200

1500

1800

2D Graph 3

0 2 4 6 8 10 12

pH

6.6

6.7

6.8

6.9

7.0

Paramètres de fermentation

0 2 4 6 8 10 12

Pre

ssio

n (m

b)

510

515

520

Batch Fermentation

CENPK 133-7D ("CFM" glucose 15 g/l)

0

1

2

3

4

5

6

0 5 10 15 20 25

Temps (h)

Biomasse(g/l)

0

3

6

9

12

15

Glucose EthanolGlycérol

(g/l)

Métabolisme fermentaire


0

1

2

3

4

5

6

0 5 10 15 20 25

Temps (h)

Biomasse(g/l)

0

3

6

9

12

15


(g/l)

Métabolisme fermentaire Diauxie

Batch Fermentation

µmax= 0,45 h-1 YS/X= 0,37 g.(g glucose)-1


0

1

2

3

4

5

6

0 5 10 15 20 25

Temps (h)

Biomasse(g/l)

0

3

6

9

12

15


(g/l)

Métabolisme fermentaire Diauxie Métabolisme oxydatif

Batch Fermentation

• We have 4 potential state for the bio-reactor.(e1,e2,e3,e4)

• We add a specific state e5 corresponding to a stationary state

• The predicate to learn with our ILP machine is:

– to-state(Ei,Et,P1,P2,T)

We want to obtain a causal relationship between the transition of the system and the values of differential Or the wavelet coefficients of the curve

Formalization of our problem : CProgol4.4

• Solution: add a predicate

»derive(P1,P,T)

– Express the fact that, for the curve of the parameter P at time T, the value of the differential is P1

Formalization of our problem

• We get a lot of rules but the next one could be explain by biochemical experts

• to_state(E,E,A,B,C,T) :- derive(p1,A,T),• derive(p2,B,T), derive(p3,C,T),

• positive(p1,T), positive(p2,T) positive(p3,T).•

Results

5 13 21 29

0

0.2

0.4

0.6

Ap

pa

rte

na

nc

e

5

5.25

5.5

5.75

0123456

5 13 21 290

1

2

3

4

5

X pH CO2

CO2 pHL

fermentaire diauxie oxydatif fin batch

• Instead of simply giving classification results, we get some logical rules establishing a causality relationship between different parameters of the bio-machinery.

This rule indicates that there is no evolution of the metabolismstate (the bio-reactor remains in the same state) when The parameters have an increasing slope but that we do not encounter maxima or minima

Data Processing : Regularity Analysis

5,4

5,6

5,8

6

6,2

6,4

6,6

6,8

0 20 40 60 80 100temps (h)

pH

0

200

400

600

800

1000

1200

1400

1600

1800

volu

me

de

bas

e (m

l)

Consommation d’Ac. Aminés

Acid

Comment caractériser une singularité ?

Which tool for analysis on-line ???

• Multrifactal analysis studies functions of which punctually regularity varies from a point to other

– Derivability continuity – Holder exponent

Lipschitz Regularity

A signal is considered to have regularity if it is possible to approximate it by a polynomial.

Analysis of singularities

• The Taylor development of f in x0

)(

00000....)(' )()()(

xxxCxfxxxfxf

Using Wavelet Analysis the dominant behavior is given by the term :

)(0

0xxxC

Caracterisation of Lipschitz exponent

• Définition• A function is Lipschitz of order in a point if

in this point it exists point a K>0 and a polynomial p of degree m= such :

tKtptfRt )()(,

Fourier Condition

• Theorem A function f is bounded and uniformly Lipschitz on if :

• Global regularity condition

df )1()(ˆ

Holder Regularity

• Hölder exponents measures the remainder of a Taylor expansion.

• Characterize the local scaling properties.

• Measure the local regularity/differentiability.

• Is linked to the decay rate of the Fourier and wavelet coefficients.

Holder Regularity

• Measures the local differentiability:

• 1≤ α , f(t) is continuous and differentiable.

• 0 < α < 1, f(t) is continuous but no differentiable.

• -1 < α ≤ 0, f(t) is discontinuous and non-differentiable.

• α ≤ -1, f(t) is not longer locally integrable

• Théorème

• If f is Lipschitz in x0 , n then

)(),( 0

xxsAxsWf

If f(x) is Lipschitz in x0 , 0n if

AsxsWf ),( )log

(),(0

0

xx

xxsBxsWf

Characterization of Lipschitz exponent by CWT

•Efficiency for non-stationary signals

•Good localization in time and frequency

•The Wavelet Transform is defined as an integral operator which transforms a signal of energy f(x)L2(R) using a set of functions ab.

WT(f,ab)= < f | ab >

where < > is the dot product .

Morlet Wavelets

a

bt

atba 1)(,

Elementary Function :

The wavelet coefficients are numbers :

dtttffbaC babaf )()(),(),( ,,

< s(.) , gt,f(.) > = Q(t,f)

• < s(.) , δ(. - t) >

Combining time and frequencyShort-time Fourier Transform

< s(.) , δ(. – f) >

= <s(.) , TtFf g0(.) >

Ff

Tt

222

4π1 f Δ Δt

Combining time and frequencyWavelet Transform

time

frequency

< s(.) , TtDa Ψ0 > = O(t,f = f0/a)

Ψ0(u)

Ψ0( (u–t)/a )

D

a

Tt

Maximum modulus of the wavelet transform (MMWT). is equivalent to the Canny edge detector.

Differentiation of biological phenomena's from bio-physiques phenomena's (fed-batch).

1. Detection of singularities (Hölder <0)

2. Temporally Segmentation

3. Calculus of the correlation between signal used to control the fermentation and others signal

4. Comparison of the correlation sign before and after singularities

Fed-batch Processus for biomass production

Oxydation Spontaneous oscillations of the yeast

Fuzzy Logic : Clustering and Aggregation

Our approach

Fuzzy logic

Hierarchical Classification

Inductive LogicProgramming

Classification MachineMeasures-

Biologic Knowledge

Biologic Rules

Hypotheses or

« Classes »

3 levels ofanalysis

Relevant Information

Fuzzy

• Logic– Semantically using tables or Boolean algebra– Syntactically via proof method

• Fuzzy logic based on real numbers– Dealing with vagueness e.g. for formalising

common natural language

LAMDA (Learning Algorithm for Multivariate Data Analysis)

Cj(X)

xn

x2

x1 DAM de x1 pour Cj

DAM de x2 pour Cj

DAM de xn pour Cj

LObjet

Degré d’Adéquation Marginal (DAM) pour

la classe Cj

Degré d’Adéquation Global (DAG)

pour la classe Cj

•Opérateurs logiques

d’agrégation

DAM= Membership function

• Parametrized membership function

• And its solution is given

• By Similar membership function

)(1)( )(

xuxukdx

xdu

)exp(1

1)(

bxaxu

)(1

1)(

xdxu

Membership is defined as a function of the distanced(x) between a given object and a standard member

LAMDA

Generalization of a binomial low {0,1} in [0,1]

DAMij(xi)= ija(xi,cij) (1 - ij )

(1 - a(xi,cij))

a(xi,cij)=1- distance between xi et cij

ij depends of the statical properties of the class

Aggregation Operators

Indépendance cognitive

Definition

• An aggregation operator is simply a function, which assigns a real number y to any n-tuple

• (x1,x2, …,xn) of real numbers : y =Aggreg( x 1, x2 , , xn )

• We define an aggregation operator as a function :

• Aggreg (x) = x Identity when unary

• Aggreg (0,…,0) = 0 and Aggreg (1,…,1) = 1 Boundary conditions

• Aggreg (x1,…, xn) ≤ Aggreg (y1,…, yn) Non decreasing• if (x1,…, xn) ≤ (y1,…, yn)

1,01,0:

n

Nn

Aggreg

T-norm

• A t-norm is a function * : [0,1]2[0,1] such that for all x,y,z [0,1] :• Commutativity

• Associativity

• Monotonicity

• Identity

• Lukasiewicz

• Godel t-norm

• Product t-norm

Dummett ),min( yxyx

T-norms generalize intersection to fuzzy set

)1,0max( yxyx

Mean Operator

• A mean operator is a function * : [0,1]2[0,1] such that :

• Example : – Median

– Bisymmetrical

argumentsboth w.r.t.increasing is

),(),(

maxmin

)maxmin

m

xymyxm

, mm

(x,ym(x,y)x,y

Reinforcement

• One characteristic of many types of human information processing is what Yager and Rybalov full reinforcement.

• A collection of high scores reinforces each other to give a resulting score more affirmative then any of the individual scores alone and on the other hand the tendency of a collection of low scores to reinforce each other to give a resulting score more "disfirmative" than any of the individual scores.

• Good modeling of the human behavior

• Refine the information related to the real world

Completely Reinforced Operators 3

• (Silvert 1979, Yager & Rybalov 1998)

Completely reinforced and symmetrical sum:

5,0, yiiIf then

5,0, yiiIf then

ini

ini

ini

i

i

ni

n yy

y

yy

yyy

111

1,...,,3

:1:1

:1

:1

21

Remark

• The T-norms are negative reinforced, but they are not positive reinforced

• The T-conorme are positive reinforced, but they are not negative reinforcement

• The combination T-norms and T-conorms is not completly reinforced

• The means operators are not positively or negative reinforced by definition

ii

n xTxxxT min),...,,( 21

Mean 3

• Approach: Mean Operator

n

i

i

ni

n

yy

yyyM 1

:1

21

11

1,...,,3

nxxG /1)( Generatrix Function: positive and increasing

)1()(

)(,...,,3

:1:1

:121

ini

ini

ini

n xGxG

xGyyyM

A new mean : Mean 3• The commutativity: M3(x,y)=M3(y,x)

• The monotonic: M3(x,y) M 3(z,t) • if x z and y t

• The idempotance M3(x,…,x)=x

• The self identity M3 [B,<MPI(B)>]= M3(B)The first three conditions could be deduce easily from the

properties of the product of n-square functions

« Mean Reinforced »

5,0 , yii

Then:

n

iini y

n,...y,...,yyΠM

11

1 )(3

IF 5,0 , yii

n

iini y

nyyyΠM

11

1 ),...,...,(3

Then:

Elimination of the uncertainties between physiological states

Introduction des singularités (détectées par les ondelettes) sous forme de fonctions par paliers

Utilisation du DAM et de la propriété du renforcement total du triple Pi pour renforcer la classification à l’aide de fonctions par paliers

I.L.P.

Overlap between classes

Expert 1

Expert 2

Etat 1 Etat 2

Etat 1 Etat 2

Uncertainties: noise, acid pulse, oscillations.

Degré d’Adéquation Globale

• Opérateur d’agrégation complètement renforcé Triple • (Silvert 1979, Yager & Rybalov 1998)

DAG(X1,…,Xn) = PI(X1,…,Xn) =

1+ i=1 à n

1

1-Xi

Xi

Total Renforcement:

5,0, yiiIf yiynyiyPI imax),...,...,1( then

5,0, yiiIf then yiynyiyPI imin),...,...,1(

Triple Pi and Mean Triple Pi

% classification M triple Pi

% classification triple Pi

SNR=46,02 dB

13,47% Classification non

pertinente

SNR=40 dB 11,56% Classification non

pertinente

SNR=30,40 dB

Classification non

pertinente

Classification non

pertinente

Mean Triple Pi more robuste in the case of noisy signals

Tests effectués sur les données de type batch

Issues in Kinetic Modelling [SCHI02]• Enzyme = Complex protein produced by living cells that promotes a

specific biochemical reaction by acting as a catalyst

• Kinetic properties (rate constants, etc) are not completely known

• Discrepancies exist between in vitro and in vivo behavior

• Enzyme activities in vivo are subject to frequent changes due to inhibition or activation. In contrast: the structure..in terms of how substances are „connected“ can be considered constant, unless evolutionary time scales are studied

Metabolism

• Union of two processes : anabolism and catabolism

• Metabolism consists of :– Biochemical reaction– Metabolites– Enzymes

• Modeler Perspective – Network of interconnected reaction, each reaction corresponding

to the transformation of metabolites into other metabolites

What pathways make up metabolism?

• Glycolysis

• Pentose phosphate pathway

• Electron transport & oxidative phosphorylation

• Fatty acid oxidation

• Photosynthesis

Metabolic networks• If we can prepare a list of the reactions occurring in the metabolism

of an organism, can we decide:– what nutrients it can utilize and what products it can produce?

– if there is a route from a particular nutrient to a product?

– which route to a product has the highest yield?

– what are the consequences of deleting an enzyme?

– whether genome annotations for an organism generate a connected and self-consistent metabolism?

• Finding promising targets for the development of newdrugs

Metabolic Flux Analysis• Metabolic pathway are sequences of enzyme-catalyzed

reaction steps, converting substrate to a variety of product to meet the needs of the cell.

A huge set of biochemical reactions assume the reproduction and the survey of the cell.

• Flux is defined as the rate at which material is processed trough a metabolic pathway. They are useful for the determining of maximum theoretical yields.

)(1

/

SSDXYdt

dSin

SX

Model

• In the 1780s, Euler invented network theory and for most of the last two hundred years, network theory remained a form of abstract mathematics.

• A network is made up of nodes and links and mathematicians assumed the links between the nodes were randomly distributed.

• If there are, say, 10 nodes and 50 links, they assumed the distribution would be random and each node would get, on average, five links.

Representation

• Consider a metabolic network including r reactions labeled by their enzymes e1,e2..er or their speed v1,v2,…,vr and m metabolites x1,x2,…,xm with the stoichiometry nij

Representation

• Consider a simple pathway, e.g.:

Separating structure and kinetics

• The rate at which the substrate concentrations are changing is given by N:v, where N is the stoichiometry matrix, and v are the enzyme kinetic functions. So for our substrate cycle pathway:

• where each vi is the rate function for enzyme i, depending

on the metabolites, Vm, Km etc.

What kinds of constraints do cellshave to abide by?

• Environmental constraints– Condition-dependent -> variable constraints

– pH, temperature, osmolarity, availability of electron, receptors, etc.

– Availability of carbon, oxygen, sulfur, nitrogen, and phosphate sources in surrounding media

• Regulatory constraints– Self-imposed “restraints”

– Subject to evolutionary change

– Allow cells to eliminate suboptimal phenotypes and confine themselves to behaviors of increased fitness

Constraints

Goal• Modeling of high dimensional non-linear

systems Application on metabolite intracellular network of E-coli: – Metabolite flux balancing,

• Estimation of maximum reaction rates.

– Estimation of non-measured steady-state concentration

– Simulation and Results – Conclusions.

Mathematical Model

• Mass Balance :

• The flux is :

ij

jiji Crv

dt

dC..

),(.maxjjjjj PCfrr

Ci = the concentration of metabolite i,

= is the specific growth rate,

ij = is the stoichiometric coefficient

for this metabolite in reaction j, which occurs at the rate rj.

Since masses were balanced, the equation for extracellular glucose needs to include a conversion factor for the difference between the intracellular volume and the culture volume.

maxjr Flux maximum

x

PTSpulse

larextracelluglcglcfeed

larextracelluglc Xr

fCCDdt

dC

)(

Here, Cfeedglc is the glucose concentration in the feed, X is the biomass concentration, and x denotes the specific weight of biomass in the culture volume. The term fpulse takes into account the sudden change of the glucose concentration due to a glucose pulse.

X

PTSXpulse

laireextracelluglc

feedglc

laireextracelluglc rC

fCCDdt

dC

pgPGDHPDHGpg Crr

dt

dC66

6

pgGIPATPGMpg Crr

dt

dC1

1

pfMurSynthTATKbPFKPGIpf Crrrrr

dt

dC6

6 2

Extracellulaire Glucose

6-phosphogluconate (6pg)

Fructose-6-phosphate (f6p)

Glucose-1-phosphate (g1p)

Dynamic response of the co-substrates

max6

6

,1

, ,1 , ,1

( ) ( )

( )( )( ) 1 1

PG NADP

G P DH

MgATPNADPH

NADP NADP

i NADPH i MgATP

dC t C tr

dt C tC tC t K

K K

max

6 6

,2

, ,2 , ,2

( )

( )( )( ) 1 1

NADP

G P DH PG

MgATPNADPH

NADP NADP

i NADPH i MgATP

C tr C t

C tC tC t K

K K

Estimation of maximum reaction rates

• As the in vitro enzymatic activities determination is not really representative of the in vivo maximal rates . In order to avoid this problem, these values were estimated with the approach suggested by Rizzi et al. (1997), which is based on a calculation of the flux distribution under steady state conditions. The rate of the enzyme i at steady state

iiiii PCfrr~

,~~ max

Where is the parameter vector and is the steady-state concentration vector of the metabolites involve in the reaction. From the above equation the maximal rate could be calculated.

iii

ii

PCf

rr ~

,~~

max

Assuming

PDHGNADPHNADPMgATPDHPGDHPGDHPG PtCtCtCfrr 66max66 ),(ˆ),(ˆ),(ˆ

This approach assumes that during the transient enzyme concentration remain at their steady state value

)(ˆ),(),(ˆ

666max666

max6

6 tCPtfrPtfrdt

CdPGPGDHDHPGDHPGPGDHDHPGDHPG

PG

The balance equation for 6PG takes the form :

Parameters Identification

• THE KINETIC PARAMETERS OF THESE EQUATIONS WERE FIT TO THE MEASUREMENTS OF INTRACELLULAR METABOLITES BY MINIMIZING THE SUM OF RELATIVE SQUARED ERRORS USING Differential Evolution Algorithm

DHPGP 6DHPGP 6

Differential Evolution

A stochastic nonlinear optimization algorithm by Storn and Price 1996

A population of solution vectors are successively updated by addition, subtraction and component swapping until the population converges hopefully to the optimum :

• no derivatives are used• very few parameters to set• very reliable method

Evolutionary Operators : forming the mutant vector

DE / RAND / 1 Vi G+1 = X1 G + F * (X2 G - X3 G)

DE / RAND / 2

Vi G+1 = X1 G + F * (X2 G + X3 G - X4 G - X5 G )DE / BEST / 1

Vi G+1 = Xbest G + F * (X2 G - X3 G)

DE / BEST / 2Vi G+1 = Xbest G + F * (X2 G + X3 G - X4 G - X5 G )

F = crossover constant

The new child replaces a randomly selected vector from the population only if it is better than it

Objective Function

2

6, )(

)(ˆ)(

k ki

kikiPGrel tC

tCtC

The relative error square sum for the time course of the different metabolites and co-metabolites concentration were calculated in order to asses the quality of the identification process

Metabolite Concentration (mM)

Glucose extracellulaire 0.0556

g6p 3.480

f6p 0.600

fdp 0.272

gap 0.218

dhap 0.167 (estimation)

pgp 0.008 (estimation)

3pg 2.130 (estimation)

2pg 0.399 (estimation)

pep 2.670

pyr 2.670

6pg 0.808

ribu5p 0.111 (estimation)

xyl5p 0.138 (estimation)

sed7p 0.276 (estimation)

rib5p 0.398 (estimation)

e4p 0.098 (estimation)

g1p 0.653

amp 0.955

adp 0.595

atp 4.270

nadp 0.195

nadph 0.062

nad 1.470

nadh 0.100

Benchmark

1s

Integrator9

1s

Integrator8

1s

Integrator7

1s

Integrator6

1s

Integrator5

1s

Integrator4

1s

Integrator3

1s

Integrator2

1s

Integrator1

1s

Integrator

Cext,glc

Cpep

Cpy r

Cg6p

Cf 6p

C6pg

Catp

Cadp

Camp

Cf dp

rPTS

rPGI

rPFK

rALDO

rTIS

rGAPDH

rPGK

rPGluMu

rENO

rPK

rPDH

rPEPCxy lase

rPGM

rG1PAT

Calcul des flux

rPTS

rPGI

rPFK

rALDO

rTIS

rGAPDH

rPGK

rPGluMu

rENO

rPK

rPDH

rPEPCxy lase

rPGM

rG1PAT

dCext,glc

dCpep

dCpy r

dCg6p

dCf 6p

dC6pg

dCatp

dCadp

dCamp

dCf dp

Calcul des concentrations

Model’s Parameters

Results

Glucose extracellulaire

Fructose-6-phosphate (f6p) glucose-1-phosphate (g1p)

6-phosphogluconate (6pg)

ss

ss

mg/l

mg/l mg/l

mg/l

But– Explication and

prédiction of metabolic pathway

– Models : • Saccharomyces

Cerevisiae,

• E-coli

– ILP can explain the qualitative evolution of a dynamic system

21.2 vvdt

dCB

A

E

DB

C

rA

rC

rE

rD

v3+v5

v4v3-

v2v1

Approaches

• Steady State therefore the variation of metabolic concentration is ignored : Tamadoni-Nezdah 2004

• Intégration of abduction and induction to analyse the inhibitions in metabolic pathway Muggleton et al.

CF-Induction: Principe　　 B ∧ H ⊨ E

B ∧ ￢ E ⊨ ￢ H inverse entailment

B ∧ ￢ E ⊨ Carc(B∧￢ E, P) ⊨ CC(B,E) ⊨ ￢ H

CC(B,E) ⊆ Carc(B∧￢ E, P) ,

￢ CC(B,E) ≡ F, H ⊨ F (où F est CNF )

Système complet qui permet les clauses non-Horn

Clausal theory

Generalizer : dropping

Input of CF-induction• clause(e1,bg,[concentration(a,up)]).• clause(e2,obs,[concentration(d,up)]).• clause(e3,obs,[concentration(e,down)]).• clause(e4,obs,[concentration(c,down)]).

Reaction• clause(bR1,bg,[reaction(a,b)]).• clause(bR2,bg,[reaction(b,d)]).• clause(bR3,bg,[reaction(d,e)]).• clause(bR4,bg,[reaction(e,c)]).• clause(bR5,bg,[reaction(c,b)]).• clause(bR6,bg,[reaction(b,c)]).

A

E

DB

C

rA

rC

rE

rD

v3+v5

v4v3-

v2v1

Explanation : clause(be1,bg,[-reaction(Y,X),-reaction(X,Z),inhibited(Y,X),-inhibited(Y,Z),concentration(X,up)]).clause(be2,bg,[-concentration(Y,down),-reaction(Y,X),inhibited(Y,X),concentration(X,down)]).

[concentration(b, up), [concentration(b, up), inhibited(b, c), inhibited(b, inhibited(b, c), inhibited(b, d), inhibited(c,b), d), inhibited(c,b), inhibited(a, b), -inhibited(a, b), -concentration(a, up)]concentration(a, up)]

Conclusion

CF-induction explains the evolution of dynamic systems

Pertinence of information

Quels signaux sont utiles pour la classification à un

instant donné?

Méthode objective d’évaluation de la pertinence sans connaissance a priori de l’expert (Knowledge Basis Discovery) pour:

1. Confirmer et enrichir la connaissance de l’expert

2. Réduire le nombre de capteurs pour les signaux biologiques

Pas de définition générique de la pertinence.

(Blum & Langley 1997, Subramanian et al. 1997, Zadeh 2004)

Définition de la pertinence (dictionnaire): caractère de ce qui est plus ou moins approprié, qui s’inscrit dans la ligne de l’objectif poursuivi.

Un signal est pertinent si:

1. il ne fournit pas de résultat aberrant

2. il génère des décisions en accord avec la plupart des autres signaux

3. il fournit une information significative pour la classification

Utilisation d’une autre mesure du conflit baptisée D:

i

CimCimD )()(21 212,1

D1,2=0 (le conflit est nul entre les sources 1 et 2)

Cette mesure est utilisable uniquement pour des sources ayant les mêmes éléments focaux

1 / 2 : facteur de normalisation

Problème: conflit non nul pour deux sources (2 paramètres biochimiques) fournissant la même répartition des masses d’évidenceAlternatives à la combinaison de DS (IEEE Fuzzy Systems 2004 Budapest Hungary , RNTI 2004):

Pertinent

Non Pertinent

11 paramètres en accord par rapport

au seuil de 0,3

Classification batch

Classification LAMDA

Classification avec prise en compte de la pertinence

% classification 88,84% 95,06%

Conclusion• The discrepancy between simulation and experimental points can

be explained by various factors, for example experimental measurement errors, as some metabolites like pep and pyruvate are especially difficult to measure.

• Another reason could be a lack in the model structure with regard to the mass balance equations or the rate expressions.

• Most of the stoichiometry is well known, but some effectors are involved in a great number of reactions.

• The metabolic regulations and allosteric properties used in these equations have been determined by in vitro studies. Some of these effects might be different under in vivo conditions.

Glyco

lysis P

athw

ay

CALCULATION OF ELEMENTARY FLUX MODE

• The elementary flux mode contains all the possible conversion pathways in a subsystem with specific inputs and outputs (exchange flux).

• Constraints on the exchange flux also influence the calculated elementary modes. So before calculation the constraints on the exchange flux should be determined.

• While a common metabolite in one subsystem is only used as a substrate in other subsystems, it is an output for that subsystem; while only used as a product, it is an input; otherwise it is unconstrained.

• After determining the constraints on every exchange flux, the elementary modes for each subsystem were calculated from the stoichiometric matrix of the subsystem using the method presented by Schuster

Adding constraints

0,iv i Constraints on internal fluxes:

0jb Constraints on external fluxes:

Source

Sink

Sink/source is unconstrained

In other words flux going into the system is considered negative while flux leaving the system is considered positive.

Remark: later on we will impose further constraints both on the internal flux as well as the external flux…

0jb

jb

Flux cone and metabolic capabilities

Observation: the number of reactions considerably exceeds the number of metabolites0 S v

0

0

0

The S matrix will have more columns than rows

The null space of viable solutions to our linear set of equations contains an infinite number of solutions.

“The solution space for any system of linear homogeneous equations and inequalities is a convex polyhedral cone.” - Schilling 2000

C

Our flux cone contains all the points of the null space with non negative coordinates (besides exchange fluxes that are constrained to be negative or unconstrained)

What about the constraints?

(V) Flux cone and metabolic capabilities

What is the significance of the flux cone?

•It defines what the network can do and cannot do!

•Each point in this cone represents a flux distribution in which the system can operate at steady state.

•The answers to the following questions (and many more) are found within this cone:•what are the building blocks that the network can manufacture? •how efficient is energy conversion?•Where is the critical links in the system?

(VI) Navigating through the flux cone –using “Extreme pathways”

Next thing to do is develop a way to describe and interpret any location within this space.

•We will not use the traditional reaction/enzyme based perspective

•Instead we use a pathway perspective:

Extreme rays - “extreme rays correspond to edges of the cone. They are said to generate the cone and cannot be decomposed into non-trivial combinations of any other vector in the cone.”- schilling 2000

Differences

•Unlike a basis the set of, extreme pathways is typically unique

•Any flux in the cone can be described using a non negative combination of extreme rays.

We use the term Extreme Pathways when referring to Extreme rays of a convex polyhedral cone that represents metabolic fluxes

What is the analogy in

linear algebra?

(VI) Navigating through the flux cone –using “Extreme pathways”

,1

{ | 0 }k

i i ii

C v v w EP w i

Pw v

•Extreme Pathways will be denoted by vector EPi (0≤ i ≤ k)

•Every point within the cone can be written as a non-negative linear combination of the extreme pathways. C v

In biological context this means that :

any steady state flux distribution can be represented by a non-negative linear combination

of extreme pathways.

The entire process from a bird’s eyes

Compute steady state flux Convex

Identify Extreme pathways (using

algorithm presented in Schilling 2000)

+ constraints

2. Elementary (flux) modes• Elementary (flux) mode analysis introduced by Schuster

& Hilgetag (1994)

•Example 1

Clarify the mechanisms of metabolic pathway regulation by

KBS An elementary flux mode is a minimal set of enzymes that could operate at steady state, with all the irreversible reactions used in the appropriate direction (cf. Schuster et al., 2000).

reversible reactions need not be split into their forward andreverse steps.

As in several other metabolic simulators, this list is automatically translated into astoichiometry matrix

Estimation of maximum reaction rates

• As the in vitro enzymatic activities determination is not really representative of the in vivo maximal rates . In order to avoid this problem, these values were estimated with the approach suggested by Rizzi et al. (1997), which is based on a calculation of the flux distribution under steady state conditions. The rate of the enzyme i at steady state

iiiii

PCfrr~

,~~ max

Where is the parameter vector and is the steady-state concentration vector of the metabolites involve in the reaction. From the above equation the maximal rate could be calculated.

iii

i

i

PCf

rr ~

,~~

max

Extraction of System Constraints

• Hypothesis : in a metabolic pathway all the kinetic properties of the enzymes involved are known, then an ordinary differential equation (ODE) model would be appropriate.

• if these kinetics properties are not known then an ODE model could be inappropriate; the use of arbitrary kinetics would produce a misleading impression of precision, and a simpler qualitative model would be preferred.

Similarly for system identification, it is often appropriate to first learn a qualitative structural model, and then parameterise this to form an ODE.

Dynamic Equations

where αi and βi are multiplicative parameters called rate constants and gi,j and hi,j are exponential parameters called kinetic orders.

Let {X1, . . .,Xn} be a set of genes and/or chemical substances in the underlying biological network. Let Xi(t) be the value (expression level or concentration) of a gene or a chemical substance Xi at time t.

The inference problem is, given time series data Xi(t) that are assumed to be generated from an D-system D, to estimate to parameters αi, βi, gi,j and hi,j of D.

Emergent Information in Biological Systems

• The behavior of systems with near-critical connectivity is essentially chaotic and may be an important source of variety in biology.

• In most models of biological systems (for example, dynamical systems, cellular automata, state spaces) the patterns of connectivity between elements are isomorphic to directed graphs.

• These isomorphisms suggest that the above sources of emergent behaviour may be universal

Reasoning

• mathematical reasoning, i.e., that process by which mathematicians arrive to conclusions on the basis of a small number of clear and distinct basic principles.

Goal of our Research

• Analyse and understand patterns in temporal data.

Modeling

• The task of forming a model to explain a given set of experimental results is called model identification.

• This is a form of inductive inference. – For example, if the levels of the metabolites in

glycolysis are observed over a series of time steps, and from this data the reactions of glycolysis are inferred, this would be model identification.

Qualitative EquationReason to consider qualitative states is that it is experimentally much easier to measure qualitative metabolic states than quantitative ones.

In the case of qualitative reasoning we don’t need numerical parameters

Displacement of the steady state

Les opérateurs d’agrégation flous

• Une fois que l’information dite marginale a été calculée par la fonction d’appartenance pour chacune des sources d’information, il est nécessaire de combiner toutes ces informations afin de prendre une décision concernant la classe dans laquelle on va placer l’élément.

Quel est le lien entre ces différentes informations ?

Indépendance cognitive

Regularity Analysis

• The Fourier Analysis allows to characterize the global regularity of a function.

• The Wavelet Transform allows to characterize locally regularity of a function.

Reasoning about Uncertainty in Biological Systems Andrei Doncescu LAAS CNRS Aix-en-Province 18...

Documents

Transcript of Reasoning about Uncertainty in Biological Systems Andrei Doncescu LAAS CNRS Aix-en-Province 18...