Reasoning about Uncertainty in Biological Systems Andrei Doncescu LAAS CNRS Aix-en-Province 18...
-
Upload
harold-stevens -
Category
Documents
-
view
215 -
download
0
Transcript of Reasoning about Uncertainty in Biological Systems Andrei Doncescu LAAS CNRS Aix-en-Province 18...
Reasoning about Uncertainty in Biological Systems
Andrei Doncescu
LAAS CNRS
Aix-en-Province 18 September
Structural Bioinformatics• Cells buzz with activity. They take nutrients and convert to
energy for a number of purposes. Reproduce themselves and are called upon constantly to synthesize protein molecules
• Gene : a segment of DNA that are programmed for the • production of a specific protein
• Gene expression: cell produces the protein encoded • by a particular gene
• Genome: the entire set of genetic instruction for a givenorganism
• Nucleotide : the fundamental unit of DNA and RNA
• Protein: a molecule consisting of up to thousand of amino acids
• Amino Acid : a class of 20 different molecules (C,H,N,O,S) which can merge to form a bond
Structure and Modeling of Metabolic Pathway
Genome
Proteomic
Transcriptomic
Metabolomics/Fluxomics
DNA
RNA
Proteins
Metabolites
Systemic approach : reconciliation of the 3 levels of Systemic approach : reconciliation of the 3 levels of observation (3M : macro,micro,molecular) observation (3M : macro,micro,molecular)
•Mixing power, macro, micromixing, reactivity, - coupled systems•Expert systems, supervision•Scale-up and down ; CFD
Tool : bioreactorMetrologyKineticsStoechiometry, mediaClassification of populationsPhysico-mechanical et physico chemical environment Hydrodynamics, transfers
Information flux
Biochips DNA, proteins,bioinformatic, network of genes, of proteins, of metabolites
Microorganism: a production facility
• Biological kinetics• Implementation• Metabolic flux ; fluxome• Metabolic network •In vivo, ex vivo enzymology, stock flux, energy/matter • Thermodynamics•In vivo, ex vivo NMR • Structured modelling and metabolic descriptor
Signal
TranscriptomeProteomeMetabolomeBiochips
MICROSCOPIC LEVEL
MACROSCOPIC LEVEL
MOLECULAR LEVEL
Scientific Reasoning
Hypothesis Generation
PredictionObservation
Deduction
Verification
Abduction
Reasoning about biological systems
• Construction of a system model – The task of forming a model to explain a given set of experimental
results is called model identification. – This is a form of inductive inference. For example, if the levels of the
metabolites in glycolysis are observed over a series of time steps, and from this data the reactions of glycolysis are inferred, this would be model identification.
• Simulation of the system behavior based on the model constructed – This is a form of deductive inference. For example, a dynamic model of
glycolysis might tell you how the level of pyruvate in a cell varies over time as the amount of glucose increases. If the deductive predictions of a model are inconsistent with observed behaviour then the model is falsified.
• A Model is a simplified description of a complex entity or process and consists :– A set of systems constraints in terms of state variable– And/or Their time derivatives
Representation of Biological Systems
• Directed graphs (for example, decision trees, cluster analysis)
• Matrix models (for example, linear systems, Markov processes),
• Dynamical systems
• Cellular automata .
The Problem
• Development of Molecular Biology produces a huge quantity of data
• Interaction between molecules has an effect on the cell behavior
• Mathematical Models are used to extract the emergent laws of the combinatory interactions.
• Difficulties : – interactions non-linear – Model parameters difficult to measure
M Gactivation
inhibition
Our approach
Fuzzy logic
Hierarchical Classification
Inductive LogicProgramming
Classification MachineMeasures-
Biologic Knowledge
Biologic Rules
Hypotheses or
« Classes »
3 levels ofanalysis
Relevant Information
Time Series
• Time series analysis is
often associated with discovery
of patterns such as :
– Increasing– Decreasing– frequency of sequences, repeating sequences– prediction of future values or specifically termed
forecasting in the time series context.
paramètres de fermentation
temps (h)
0 2 4 6 8 10 12
tem
p. (°
C)
36.8
37.0
37.2
37.4
2D Graph 2
0 2 4 6 8 10 12
RP
M
600
900
1200
1500
1800
2D Graph 3
0 2 4 6 8 10 12
pH
6.6
6.7
6.8
6.9
7.0
Paramètres de fermentation
0 2 4 6 8 10 12
Pre
ssio
n (m
b)
510
515
520
Batch Fermentation
CENPK 133-7D ("CFM" glucose 15 g/l)
0
1
2
3
4
5
6
0 5 10 15 20 25
Temps (h)
Biomasse(g/l)
0
3
6
9
12
15
Glucose EthanolGlycérol
(g/l)
Métabolisme fermentaire
CENPK 133-7D ("CFM" glucose 15 g/l)
0
1
2
3
4
5
6
0 5 10 15 20 25
Temps (h)
Biomasse(g/l)
0
3
6
9
12
15
Glucose EthanolGlycérol
(g/l)
Métabolisme fermentaire Diauxie
Batch Fermentation
µmax= 0,45 h-1 YS/X= 0,37 g.(g glucose)-1
CENPK 133-7D ("CFM" glucose 15 g/l)
0
1
2
3
4
5
6
0 5 10 15 20 25
Temps (h)
Biomasse(g/l)
0
3
6
9
12
15
Glucose EthanolGlycérol
(g/l)
Métabolisme fermentaire Diauxie Métabolisme oxydatif
Batch Fermentation
• We have 4 potential state for the bio-reactor.(e1,e2,e3,e4)
• We add a specific state e5 corresponding to a stationary state
• The predicate to learn with our ILP machine is:
– to-state(Ei,Et,P1,P2,T)
We want to obtain a causal relationship between the transition of the system and the values of differential Or the wavelet coefficients of the curve
Formalization of our problem : CProgol4.4
• Solution: add a predicate
»derive(P1,P,T)
– Express the fact that, for the curve of the parameter P at time T, the value of the differential is P1
Formalization of our problem
• We get a lot of rules but the next one could be explain by biochemical experts
• to_state(E,E,A,B,C,T) :- derive(p1,A,T),• derive(p2,B,T), derive(p3,C,T),
• positive(p1,T), positive(p2,T) positive(p3,T).•
Results
5 13 21 29
0
0.2
0.4
0.6
Ap
pa
rte
na
nc
e
5
5.25
5.5
5.75
0123456
5 13 21 290
1
2
3
4
5
X pH CO2
CO2 pHL
fermentaire diauxie oxydatif fin batch
• Instead of simply giving classification results, we get some logical rules establishing a causality relationship between different parameters of the bio-machinery.
This rule indicates that there is no evolution of the metabolismstate (the bio-reactor remains in the same state) when The parameters have an increasing slope but that we do not encounter maxima or minima
Data Processing : Regularity Analysis
5,4
5,6
5,8
6
6,2
6,4
6,6
6,8
0 20 40 60 80 100temps (h)
pH
0
200
400
600
800
1000
1200
1400
1600
1800
volu
me
de
bas
e (m
l)
Consommation d’Ac. Aminés
Acid
Comment caractériser une singularité ?
Which tool for analysis on-line ???
• Multrifactal analysis studies functions of which punctually regularity varies from a point to other
– Derivability continuity – Holder exponent
Lipschitz Regularity
A signal is considered to have regularity if it is possible to approximate it by a polynomial.
Analysis of singularities
• The Taylor development of f in x0
)(
00000....)(' )()()(
xxxCxfxxxfxf
Using Wavelet Analysis the dominant behavior is given by the term :
)(0
0xxxC
Caracterisation of Lipschitz exponent
• Définition• A function is Lipschitz of order in a point if
in this point it exists point a K>0 and a polynomial p of degree m= such :
tKtptfRt )()(,
Fourier Condition
• Theorem A function f is bounded and uniformly Lipschitz on if :
• Global regularity condition
df )1()(ˆ
Holder Regularity
• Hölder exponents measures the remainder of a Taylor expansion.
• Characterize the local scaling properties.
• Measure the local regularity/differentiability.
• Is linked to the decay rate of the Fourier and wavelet coefficients.
Holder Regularity
• Measures the local differentiability:
• 1≤ α , f(t) is continuous and differentiable.
• 0 < α < 1, f(t) is continuous but no differentiable.
• -1 < α ≤ 0, f(t) is discontinuous and non-differentiable.
• α ≤ -1, f(t) is not longer locally integrable
• Théorème
• If f is Lipschitz in x0 , n then
)(),( 0
xxsAxsWf
If f(x) is Lipschitz in x0 , 0n if
AsxsWf ),( )log
(),(0
0
xx
xxsBxsWf
Characterization of Lipschitz exponent by CWT
•Efficiency for non-stationary signals
•Good localization in time and frequency
•The Wavelet Transform is defined as an integral operator which transforms a signal of energy f(x)L2(R) using a set of functions ab.
WT(f,ab)= < f | ab >
where < > is the dot product .
Morlet Wavelets
a
bt
atba 1)(,
Elementary Function :
The wavelet coefficients are numbers :
dtttffbaC babaf )()(),(),( ,,
< s(.) , gt,f(.) > = Q(t,f)
• < s(.) , δ(. - t) >
Combining time and frequencyShort-time Fourier Transform
< s(.) , δ(. – f) >
= <s(.) , TtFf g0(.) >
Ff
Tt
222
4π1 f Δ Δt
Combining time and frequencyWavelet Transform
time
frequency
< s(.) , TtDa Ψ0 > = O(t,f = f0/a)
Ψ0(u)
Ψ0( (u–t)/a )
D
a
Tt
Maximum modulus of the wavelet transform (MMWT). is equivalent to the Canny edge detector.
Differentiation of biological phenomena's from bio-physiques phenomena's (fed-batch).
1. Detection of singularities (Hölder <0)
2. Temporally Segmentation
3. Calculus of the correlation between signal used to control the fermentation and others signal
4. Comparison of the correlation sign before and after singularities
Fed-batch Processus for biomass production
Oxydation Spontaneous oscillations of the yeast
Fuzzy Logic : Clustering and Aggregation
Our approach
Fuzzy logic
Hierarchical Classification
Inductive LogicProgramming
Classification MachineMeasures-
Biologic Knowledge
Biologic Rules
Hypotheses or
« Classes »
3 levels ofanalysis
Relevant Information
Fuzzy
• Logic– Semantically using tables or Boolean algebra– Syntactically via proof method
• Fuzzy logic based on real numbers– Dealing with vagueness e.g. for formalising
common natural language
LAMDA (Learning Algorithm for Multivariate Data Analysis)
Cj(X)
xn
x2
x1 DAM de x1 pour Cj
DAM de x2 pour Cj
DAM de xn pour Cj
LObjet
Degré d’Adéquation Marginal (DAM) pour
la classe Cj
Degré d’Adéquation Global (DAG)
pour la classe Cj
•Opérateurs logiques
d’agrégation
DAM= Membership function
• Parametrized membership function
• And its solution is given
• By Similar membership function
)(1)( )(
xuxukdx
xdu
)exp(1
1)(
bxaxu
)(1
1)(
xdxu
Membership is defined as a function of the distanced(x) between a given object and a standard member
LAMDA
Generalization of a binomial low {0,1} in [0,1]
DAMij(xi)= ija(xi,cij) (1 - ij )
(1 - a(xi,cij))
a(xi,cij)=1- distance between xi et cij
ij depends of the statical properties of the class
Aggregation Operators
Indépendance cognitive
Definition
• An aggregation operator is simply a function, which assigns a real number y to any n-tuple
• (x1,x2, …,xn) of real numbers : y =Aggreg( x 1, x2 , , xn )
• We define an aggregation operator as a function :
• Aggreg (x) = x Identity when unary
• Aggreg (0,…,0) = 0 and Aggreg (1,…,1) = 1 Boundary conditions
• Aggreg (x1,…, xn) ≤ Aggreg (y1,…, yn) Non decreasing• if (x1,…, xn) ≤ (y1,…, yn)
1,01,0:
n
Nn
Aggreg
T-norm
• A t-norm is a function * : [0,1]2[0,1] such that for all x,y,z [0,1] :• Commutativity
• Associativity
• Monotonicity
• Identity
• Lukasiewicz
• Godel t-norm
• Product t-norm
Dummett ),min( yxyx
T-norms generalize intersection to fuzzy set
)1,0max( yxyx
Mean Operator
• A mean operator is a function * : [0,1]2[0,1] such that :
• Example : – Median
– Bisymmetrical
argumentsboth w.r.t.increasing is
),(),(
maxmin
)maxmin
m
xymyxm
, mm
(x,ym(x,y)x,y
Reinforcement
• One characteristic of many types of human information processing is what Yager and Rybalov full reinforcement.
• A collection of high scores reinforces each other to give a resulting score more affirmative then any of the individual scores alone and on the other hand the tendency of a collection of low scores to reinforce each other to give a resulting score more "disfirmative" than any of the individual scores.
• Good modeling of the human behavior
• Refine the information related to the real world
Completely Reinforced Operators 3
• (Silvert 1979, Yager & Rybalov 1998)
Completely reinforced and symmetrical sum:
5,0, yiiIf then
5,0, yiiIf then
ini
ini
ini
i
i
ni
n yy
y
yy
yyy
111
1,...,,3
:1:1
:1
:1
21
Remark
• The T-norms are negative reinforced, but they are not positive reinforced
• The T-conorme are positive reinforced, but they are not negative reinforcement
• The combination T-norms and T-conorms is not completly reinforced
• The means operators are not positively or negative reinforced by definition
ii
n xTxxxT min),...,,( 21
Mean 3
• Approach: Mean Operator
n
i
i
ni
n
yy
yyyM 1
:1
21
11
1,...,,3
nxxG /1)( Generatrix Function: positive and increasing
)1()(
)(,...,,3
:1:1
:121
ini
ini
ini
n xGxG
xGyyyM
A new mean : Mean 3• The commutativity: M3(x,y)=M3(y,x)
• The monotonic: M3(x,y) M 3(z,t) • if x z and y t
• The idempotance M3(x,…,x)=x
• The self identity M3 [B,<MPI(B)>]= M3(B)The first three conditions could be deduce easily from the
properties of the product of n-square functions
« Mean Reinforced »
5,0 , yii
Then:
n
iini y
n,...y,...,yyΠM
11
1 )(3
IF 5,0 , yii
n
iini y
nyyyΠM
11
1 ),...,...,(3
Then:
Elimination of the uncertainties between physiological states
Introduction des singularités (détectées par les ondelettes) sous forme de fonctions par paliers
Utilisation du DAM et de la propriété du renforcement total du triple Pi pour renforcer la classification à l’aide de fonctions par paliers
I.L.P.
Overlap between classes
Expert 1
Expert 2
Etat 1 Etat 2
Etat 1 Etat 2
Uncertainties: noise, acid pulse, oscillations.
Degré d’Adéquation Globale
• Opérateur d’agrégation complètement renforcé Triple • (Silvert 1979, Yager & Rybalov 1998)
DAG(X1,…,Xn) = PI(X1,…,Xn) =
1+ i=1 à n
1
1-Xi
Xi
Total Renforcement:
5,0, yiiIf yiynyiyPI imax),...,...,1( then
5,0, yiiIf then yiynyiyPI imin),...,...,1(
Triple Pi and Mean Triple Pi
% classification M triple Pi
% classification triple Pi
SNR=46,02 dB
13,47% Classification non
pertinente
SNR=40 dB 11,56% Classification non
pertinente
SNR=30,40 dB
Classification non
pertinente
Classification non
pertinente
Mean Triple Pi more robuste in the case of noisy signals
Tests effectués sur les données de type batch
Issues in Kinetic Modelling [SCHI02]• Enzyme = Complex protein produced by living cells that promotes a
specific biochemical reaction by acting as a catalyst
• Kinetic properties (rate constants, etc) are not completely known
• Discrepancies exist between in vitro and in vivo behavior
• Enzyme activities in vivo are subject to frequent changes due to inhibition or activation. In contrast: the structure..in terms of how substances are „connected“ can be considered constant, unless evolutionary time scales are studied
Metabolism
• Union of two processes : anabolism and catabolism
• Metabolism consists of :– Biochemical reaction– Metabolites– Enzymes
• Modeler Perspective – Network of interconnected reaction, each reaction corresponding
to the transformation of metabolites into other metabolites
What pathways make up metabolism?
• Glycolysis
• Pentose phosphate pathway
• Electron transport & oxidative phosphorylation
• Fatty acid oxidation
• Photosynthesis
Metabolic networks• If we can prepare a list of the reactions occurring in the metabolism
of an organism, can we decide:– what nutrients it can utilize and what products it can produce?
– if there is a route from a particular nutrient to a product?
– which route to a product has the highest yield?
– what are the consequences of deleting an enzyme?
– whether genome annotations for an organism generate a connected and self-consistent metabolism?
• Finding promising targets for the development of newdrugs
Metabolic Flux Analysis• Metabolic pathway are sequences of enzyme-catalyzed
reaction steps, converting substrate to a variety of product to meet the needs of the cell.
A huge set of biochemical reactions assume the reproduction and the survey of the cell.
• Flux is defined as the rate at which material is processed trough a metabolic pathway. They are useful for the determining of maximum theoretical yields.
)(1
/
SSDXYdt
dSin
SX
Model
• In the 1780s, Euler invented network theory and for most of the last two hundred years, network theory remained a form of abstract mathematics.
• A network is made up of nodes and links and mathematicians assumed the links between the nodes were randomly distributed.
• If there are, say, 10 nodes and 50 links, they assumed the distribution would be random and each node would get, on average, five links.
Representation
• Consider a metabolic network including r reactions labeled by their enzymes e1,e2..er or their speed v1,v2,…,vr and m metabolites x1,x2,…,xm with the stoichiometry nij
Representation
• Consider a simple pathway, e.g.:
Separating structure and kinetics
• The rate at which the substrate concentrations are changing is given by N:v, where N is the stoichiometry matrix, and v are the enzyme kinetic functions. So for our substrate cycle pathway:
• where each vi is the rate function for enzyme i, depending
on the metabolites, Vm, Km etc.
What kinds of constraints do cellshave to abide by?
• Environmental constraints– Condition-dependent -> variable constraints
– pH, temperature, osmolarity, availability of electron, receptors, etc.
– Availability of carbon, oxygen, sulfur, nitrogen, and phosphate sources in surrounding media
• Regulatory constraints– Self-imposed “restraints”
– Subject to evolutionary change
– Allow cells to eliminate suboptimal phenotypes and confine themselves to behaviors of increased fitness
Constraints
Goal• Modeling of high dimensional non-linear
systems Application on metabolite intracellular network of E-coli: – Metabolite flux balancing,
• Estimation of maximum reaction rates.
– Estimation of non-measured steady-state concentration
– Simulation and Results – Conclusions.
Mathematical Model
• Mass Balance :
• The flux is :
ij
jiji Crv
dt
dC..
),(.maxjjjjj PCfrr
Ci = the concentration of metabolite i,
= is the specific growth rate,
ij = is the stoichiometric coefficient
for this metabolite in reaction j, which occurs at the rate rj.
Since masses were balanced, the equation for extracellular glucose needs to include a conversion factor for the difference between the intracellular volume and the culture volume.
maxjr Flux maximum
x
PTSpulse
larextracelluglcglcfeed
larextracelluglc Xr
fCCDdt
dC
)(
Here, Cfeedglc is the glucose concentration in the feed, X is the biomass concentration, and x denotes the specific weight of biomass in the culture volume. The term fpulse takes into account the sudden change of the glucose concentration due to a glucose pulse.
X
PTSXpulse
laireextracelluglc
feedglc
laireextracelluglc rC
fCCDdt
dC
pgPGDHPDHGpg Crr
dt
dC66
6
pgGIPATPGMpg Crr
dt
dC1
1
pfMurSynthTATKbPFKPGIpf Crrrrr
dt
dC6
6 2
Extracellulaire Glucose
6-phosphogluconate (6pg)
Fructose-6-phosphate (f6p)
Glucose-1-phosphate (g1p)
Dynamic response of the co-substrates
max6
6
,1
, ,1 , ,1
( ) ( )
( )( )( ) 1 1
PG NADP
G P DH
MgATPNADPH
NADP NADP
i NADPH i MgATP
dC t C tr
dt C tC tC t K
K K
max
6 6
,2
, ,2 , ,2
( )
( )( )( ) 1 1
NADP
G P DH PG
MgATPNADPH
NADP NADP
i NADPH i MgATP
C tr C t
C tC tC t K
K K
Estimation of maximum reaction rates
• As the in vitro enzymatic activities determination is not really representative of the in vivo maximal rates . In order to avoid this problem, these values were estimated with the approach suggested by Rizzi et al. (1997), which is based on a calculation of the flux distribution under steady state conditions. The rate of the enzyme i at steady state
iiiii PCfrr~
,~~ max
Where is the parameter vector and is the steady-state concentration vector of the metabolites involve in the reaction. From the above equation the maximal rate could be calculated.
iii
ii
PCf
rr ~
,~~
max
Assuming
PDHGNADPHNADPMgATPDHPGDHPGDHPG PtCtCtCfrr 66max66 ),(ˆ),(ˆ),(ˆ
This approach assumes that during the transient enzyme concentration remain at their steady state value
)(ˆ),(),(ˆ
666max666
max6
6 tCPtfrPtfrdt
CdPGPGDHDHPGDHPGPGDHDHPGDHPG
PG
The balance equation for 6PG takes the form :
Parameters Identification
• THE KINETIC PARAMETERS OF THESE EQUATIONS WERE FIT TO THE MEASUREMENTS OF INTRACELLULAR METABOLITES BY MINIMIZING THE SUM OF RELATIVE SQUARED ERRORS USING Differential Evolution Algorithm
DHPGP 6DHPGP 6
Differential Evolution
A stochastic nonlinear optimization algorithm by Storn and Price 1996
A population of solution vectors are successively updated by addition, subtraction and component swapping until the population converges hopefully to the optimum :
• no derivatives are used• very few parameters to set• very reliable method
Evolutionary Operators : forming the mutant vector
DE / RAND / 1 Vi G+1 = X1 G + F * (X2 G - X3 G)
DE / RAND / 2
Vi G+1 = X1 G + F * (X2 G + X3 G - X4 G - X5 G )DE / BEST / 1
Vi G+1 = Xbest G + F * (X2 G - X3 G)
DE / BEST / 2Vi G+1 = Xbest G + F * (X2 G + X3 G - X4 G - X5 G )
F = crossover constant
The new child replaces a randomly selected vector from the population only if it is better than it
Objective Function
2
6, )(
)(ˆ)(
k ki
kikiPGrel tC
tCtC
The relative error square sum for the time course of the different metabolites and co-metabolites concentration were calculated in order to asses the quality of the identification process
Metabolite Concentration (mM)
Glucose extracellulaire 0.0556
g6p 3.480
f6p 0.600
fdp 0.272
gap 0.218
dhap 0.167 (estimation)
pgp 0.008 (estimation)
3pg 2.130 (estimation)
2pg 0.399 (estimation)
pep 2.670
pyr 2.670
6pg 0.808
ribu5p 0.111 (estimation)
xyl5p 0.138 (estimation)
sed7p 0.276 (estimation)
rib5p 0.398 (estimation)
e4p 0.098 (estimation)
g1p 0.653
amp 0.955
adp 0.595
atp 4.270
nadp 0.195
nadph 0.062
nad 1.470
nadh 0.100
Benchmark
1s
Integrator9
1s
Integrator8
1s
Integrator7
1s
Integrator6
1s
Integrator5
1s
Integrator4
1s
Integrator3
1s
Integrator2
1s
Integrator1
1s
Integrator
Cext,glc
Cpep
Cpy r
Cg6p
Cf 6p
C6pg
Catp
Cadp
Camp
Cf dp
rPTS
rPGI
rPFK
rALDO
rTIS
rGAPDH
rPGK
rPGluMu
rENO
rPK
rPDH
rPEPCxy lase
rPGM
rG1PAT
Calcul des flux
rPTS
rPGI
rPFK
rALDO
rTIS
rGAPDH
rPGK
rPGluMu
rENO
rPK
rPDH
rPEPCxy lase
rPGM
rG1PAT
dCext,glc
dCpep
dCpy r
dCg6p
dCf 6p
dC6pg
dCatp
dCadp
dCamp
dCf dp
Calcul des concentrations
Model’s Parameters
Results
Glucose extracellulaire
Fructose-6-phosphate (f6p) glucose-1-phosphate (g1p)
6-phosphogluconate (6pg)
ss
ss
mg/l
mg/l mg/l
mg/l
But– Explication and
prédiction of metabolic pathway
– Models : • Saccharomyces
Cerevisiae,
• E-coli
– ILP can explain the qualitative evolution of a dynamic system
21.2 vvdt
dCB
A
E
DB
C
rA
rC
rE
rD
v3+v5
v4v3-
v2v1
Approaches
• Steady State therefore the variation of metabolic concentration is ignored : Tamadoni-Nezdah 2004
• Intégration of abduction and induction to analyse the inhibitions in metabolic pathway Muggleton et al.
CF-Induction: Principe B ∧ H ⊨ E
B ∧ ¬ E ⊨ ¬ H inverse entailment
B ∧ ¬ E ⊨ Carc(B∧¬ E, P) ⊨ CC(B,E) ⊨ ¬ H
CC(B,E) ⊆ Carc(B∧¬ E, P) ,
¬ CC(B,E) ≡ F, H ⊨ F (où F est CNF )
Système complet qui permet les clauses non-Horn
Clausal theory
Generalizer : dropping
Input of CF-induction• clause(e1,bg,[concentration(a,up)]).• clause(e2,obs,[concentration(d,up)]).• clause(e3,obs,[concentration(e,down)]).• clause(e4,obs,[concentration(c,down)]).
Reaction• clause(bR1,bg,[reaction(a,b)]).• clause(bR2,bg,[reaction(b,d)]).• clause(bR3,bg,[reaction(d,e)]).• clause(bR4,bg,[reaction(e,c)]).• clause(bR5,bg,[reaction(c,b)]).• clause(bR6,bg,[reaction(b,c)]).
A
E
DB
C
rA
rC
rE
rD
v3+v5
v4v3-
v2v1
Explanation : clause(be1,bg,[-reaction(Y,X),-reaction(X,Z),inhibited(Y,X),-inhibited(Y,Z),concentration(X,up)]).clause(be2,bg,[-concentration(Y,down),-reaction(Y,X),inhibited(Y,X),concentration(X,down)]).
[concentration(b, up), [concentration(b, up), inhibited(b, c), inhibited(b, inhibited(b, c), inhibited(b, d), inhibited(c,b), d), inhibited(c,b), inhibited(a, b), -inhibited(a, b), -concentration(a, up)]concentration(a, up)]
Conclusion
CF-induction explains the evolution of dynamic systems
Pertinence of information
Quels signaux sont utiles pour la classification à un
instant donné?
Méthode objective d’évaluation de la pertinence sans connaissance a priori de l’expert (Knowledge Basis Discovery) pour:
1. Confirmer et enrichir la connaissance de l’expert
2. Réduire le nombre de capteurs pour les signaux biologiques
Pas de définition générique de la pertinence.
(Blum & Langley 1997, Subramanian et al. 1997, Zadeh 2004)
Définition de la pertinence (dictionnaire): caractère de ce qui est plus ou moins approprié, qui s’inscrit dans la ligne de l’objectif poursuivi.
Un signal est pertinent si:
1. il ne fournit pas de résultat aberrant
2. il génère des décisions en accord avec la plupart des autres signaux
3. il fournit une information significative pour la classification
Utilisation d’une autre mesure du conflit baptisée D:
i
CimCimD )()(21 212,1
D1,2=0 (le conflit est nul entre les sources 1 et 2)
Cette mesure est utilisable uniquement pour des sources ayant les mêmes éléments focaux
1 / 2 : facteur de normalisation
Problème: conflit non nul pour deux sources (2 paramètres biochimiques) fournissant la même répartition des masses d’évidenceAlternatives à la combinaison de DS (IEEE Fuzzy Systems 2004 Budapest Hungary , RNTI 2004):
Pertinent
Non Pertinent
11 paramètres en accord par rapport
au seuil de 0,3
Pertinent
Non Pertinent
11 paramètres en accord par rapport
au seuil de 0,3
Classification batch
Classification LAMDA
Classification avec prise en compte de la pertinence
% classification 88,84% 95,06%
Conclusion• The discrepancy between simulation and experimental points can
be explained by various factors, for example experimental measurement errors, as some metabolites like pep and pyruvate are especially difficult to measure.
• Another reason could be a lack in the model structure with regard to the mass balance equations or the rate expressions.
• Most of the stoichiometry is well known, but some effectors are involved in a great number of reactions.
• The metabolic regulations and allosteric properties used in these equations have been determined by in vitro studies. Some of these effects might be different under in vivo conditions.
Glyco
lysis P
athw
ay
CALCULATION OF ELEMENTARY FLUX MODE
• The elementary flux mode contains all the possible conversion pathways in a subsystem with specific inputs and outputs (exchange flux).
• Constraints on the exchange flux also influence the calculated elementary modes. So before calculation the constraints on the exchange flux should be determined.
• While a common metabolite in one subsystem is only used as a substrate in other subsystems, it is an output for that subsystem; while only used as a product, it is an input; otherwise it is unconstrained.
• After determining the constraints on every exchange flux, the elementary modes for each subsystem were calculated from the stoichiometric matrix of the subsystem using the method presented by Schuster
Adding constraints
0,iv i Constraints on internal fluxes:
0jb Constraints on external fluxes:
Source
Sink
Sink/source is unconstrained
In other words flux going into the system is considered negative while flux leaving the system is considered positive.
Remark: later on we will impose further constraints both on the internal flux as well as the external flux…
0jb
jb
Flux cone and metabolic capabilities
Observation: the number of reactions considerably exceeds the number of metabolites0 S v
0
0
0
The S matrix will have more columns than rows
The null space of viable solutions to our linear set of equations contains an infinite number of solutions.
“The solution space for any system of linear homogeneous equations and inequalities is a convex polyhedral cone.” - Schilling 2000
C
Our flux cone contains all the points of the null space with non negative coordinates (besides exchange fluxes that are constrained to be negative or unconstrained)
What about the constraints?
(V) Flux cone and metabolic capabilities
What is the significance of the flux cone?
•It defines what the network can do and cannot do!
•Each point in this cone represents a flux distribution in which the system can operate at steady state.
•The answers to the following questions (and many more) are found within this cone:•what are the building blocks that the network can manufacture? •how efficient is energy conversion?•Where is the critical links in the system?
(VI) Navigating through the flux cone –using “Extreme pathways”
Next thing to do is develop a way to describe and interpret any location within this space.
•We will not use the traditional reaction/enzyme based perspective
•Instead we use a pathway perspective:
Extreme rays - “extreme rays correspond to edges of the cone. They are said to generate the cone and cannot be decomposed into non-trivial combinations of any other vector in the cone.”- schilling 2000
Differences
•Unlike a basis the set of, extreme pathways is typically unique
•Any flux in the cone can be described using a non negative combination of extreme rays.
We use the term Extreme Pathways when referring to Extreme rays of a convex polyhedral cone that represents metabolic fluxes
What is the analogy in
linear algebra?
(VI) Navigating through the flux cone –using “Extreme pathways”
,1
{ | 0 }k
i i ii
C v v w EP w i
Pw v
•Extreme Pathways will be denoted by vector EPi (0≤ i ≤ k)
•Every point within the cone can be written as a non-negative linear combination of the extreme pathways. C v
In biological context this means that :
any steady state flux distribution can be represented by a non-negative linear combination
of extreme pathways.
The entire process from a bird’s eyes
Compute steady state flux Convex
Identify Extreme pathways (using
algorithm presented in Schilling 2000)
+ constraints
2. Elementary (flux) modes• Elementary (flux) mode analysis introduced by Schuster
& Hilgetag (1994)
•Example 1
Clarify the mechanisms of metabolic pathway regulation by
KBS An elementary flux mode is a minimal set of enzymes that could operate at steady state, with all the irreversible reactions used in the appropriate direction (cf. Schuster et al., 2000).
reversible reactions need not be split into their forward andreverse steps.
As in several other metabolic simulators, this list is automatically translated into astoichiometry matrix
Estimation of maximum reaction rates
• As the in vitro enzymatic activities determination is not really representative of the in vivo maximal rates . In order to avoid this problem, these values were estimated with the approach suggested by Rizzi et al. (1997), which is based on a calculation of the flux distribution under steady state conditions. The rate of the enzyme i at steady state
iiiii
PCfrr~
,~~ max
Where is the parameter vector and is the steady-state concentration vector of the metabolites involve in the reaction. From the above equation the maximal rate could be calculated.
iii
i
i
PCf
rr ~
,~~
max
Extraction of System Constraints
• Hypothesis : in a metabolic pathway all the kinetic properties of the enzymes involved are known, then an ordinary differential equation (ODE) model would be appropriate.
• if these kinetics properties are not known then an ODE model could be inappropriate; the use of arbitrary kinetics would produce a misleading impression of precision, and a simpler qualitative model would be preferred.
Similarly for system identification, it is often appropriate to first learn a qualitative structural model, and then parameterise this to form an ODE.
Dynamic Equations
where αi and βi are multiplicative parameters called rate constants and gi,j and hi,j are exponential parameters called kinetic orders.
Let {X1, . . .,Xn} be a set of genes and/or chemical substances in the underlying biological network. Let Xi(t) be the value (expression level or concentration) of a gene or a chemical substance Xi at time t.
The inference problem is, given time series data Xi(t) that are assumed to be generated from an D-system D, to estimate to parameters αi, βi, gi,j and hi,j of D.
Emergent Information in Biological Systems
• The behavior of systems with near-critical connectivity is essentially chaotic and may be an important source of variety in biology.
• In most models of biological systems (for example, dynamical systems, cellular automata, state spaces) the patterns of connectivity between elements are isomorphic to directed graphs.
• These isomorphisms suggest that the above sources of emergent behaviour may be universal
Reasoning
• mathematical reasoning, i.e., that process by which mathematicians arrive to conclusions on the basis of a small number of clear and distinct basic principles.
Goal of our Research
• Analyse and understand patterns in temporal data.
Modeling
• The task of forming a model to explain a given set of experimental results is called model identification.
• This is a form of inductive inference. – For example, if the levels of the metabolites in
glycolysis are observed over a series of time steps, and from this data the reactions of glycolysis are inferred, this would be model identification.
Qualitative EquationReason to consider qualitative states is that it is experimentally much easier to measure qualitative metabolic states than quantitative ones.
In the case of qualitative reasoning we don’t need numerical parameters
Displacement of the steady state
Les opérateurs d’agrégation flous
• Une fois que l’information dite marginale a été calculée par la fonction d’appartenance pour chacune des sources d’information, il est nécessaire de combiner toutes ces informations afin de prendre une décision concernant la classe dans laquelle on va placer l’élément.
Quel est le lien entre ces différentes informations ?
Indépendance cognitive
Regularity Analysis
• The Fourier Analysis allows to characterize the global regularity of a function.
• The Wavelet Transform allows to characterize locally regularity of a function.