Zuse Institute Berlin (ZIB) S. R oblitz 1.Introductory Example ... I Estimates p with some con dence...
Transcript of Zuse Institute Berlin (ZIB) S. R oblitz 1.Introductory Example ... I Estimates p with some con dence...
What is
Parameter Identification
S Roblitz
Zuse Institute Berlin (ZIB)
December 10 2014
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 2
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 3
Introductory Example
A + Bk1
GGGGGBFGGGGG
k2
C + D
Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD
A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0
2 = 05
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Case (T+E) measurement data cover
both transient and equilibrium phase
0 10 20 300
05
1
15
2
timeconcentr
ation
A B C D
Case (E) measurement data cover
equilibrium phase only
Parameter Identification Susanna Roblitz 4
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 5
Problem formulation
Model y(t p) = (y1(t p) yd(t p)) isin Rd
y prime = f (y p) y(t0) = y0
Usually nonlinear in p
Parameters p = (p1 pq) isin Rq
Data zkl asymp yk(tl p) k = 1 d l = 1 Mk
mostly m =sum
k Mk q data compression
Task fit the model (ODE) to data estimate parameters
Parameter Identification Susanna Roblitz 6
Problem formulation
Questions
I What algorithm to use
I How good is the fit
I Is there a better model
I Sensitivity
Parameter Identification Susanna Roblitz 7
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 2
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 3
Introductory Example
A + Bk1
GGGGGBFGGGGG
k2
C + D
Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD
A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0
2 = 05
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Case (T+E) measurement data cover
both transient and equilibrium phase
0 10 20 300
05
1
15
2
timeconcentr
ation
A B C D
Case (E) measurement data cover
equilibrium phase only
Parameter Identification Susanna Roblitz 4
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 5
Problem formulation
Model y(t p) = (y1(t p) yd(t p)) isin Rd
y prime = f (y p) y(t0) = y0
Usually nonlinear in p
Parameters p = (p1 pq) isin Rq
Data zkl asymp yk(tl p) k = 1 d l = 1 Mk
mostly m =sum
k Mk q data compression
Task fit the model (ODE) to data estimate parameters
Parameter Identification Susanna Roblitz 6
Problem formulation
Questions
I What algorithm to use
I How good is the fit
I Is there a better model
I Sensitivity
Parameter Identification Susanna Roblitz 7
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 3
Introductory Example
A + Bk1
GGGGGBFGGGGG
k2
C + D
Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD
A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0
2 = 05
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Case (T+E) measurement data cover
both transient and equilibrium phase
0 10 20 300
05
1
15
2
timeconcentr
ation
A B C D
Case (E) measurement data cover
equilibrium phase only
Parameter Identification Susanna Roblitz 4
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 5
Problem formulation
Model y(t p) = (y1(t p) yd(t p)) isin Rd
y prime = f (y p) y(t0) = y0
Usually nonlinear in p
Parameters p = (p1 pq) isin Rq
Data zkl asymp yk(tl p) k = 1 d l = 1 Mk
mostly m =sum
k Mk q data compression
Task fit the model (ODE) to data estimate parameters
Parameter Identification Susanna Roblitz 6
Problem formulation
Questions
I What algorithm to use
I How good is the fit
I Is there a better model
I Sensitivity
Parameter Identification Susanna Roblitz 7
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Introductory Example
A + Bk1
GGGGGBFGGGGG
k2
C + D
Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD
A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0
2 = 05
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Case (T+E) measurement data cover
both transient and equilibrium phase
0 10 20 300
05
1
15
2
timeconcentr
ation
A B C D
Case (E) measurement data cover
equilibrium phase only
Parameter Identification Susanna Roblitz 4
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 5
Problem formulation
Model y(t p) = (y1(t p) yd(t p)) isin Rd
y prime = f (y p) y(t0) = y0
Usually nonlinear in p
Parameters p = (p1 pq) isin Rq
Data zkl asymp yk(tl p) k = 1 d l = 1 Mk
mostly m =sum
k Mk q data compression
Task fit the model (ODE) to data estimate parameters
Parameter Identification Susanna Roblitz 6
Problem formulation
Questions
I What algorithm to use
I How good is the fit
I Is there a better model
I Sensitivity
Parameter Identification Susanna Roblitz 7
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 5
Problem formulation
Model y(t p) = (y1(t p) yd(t p)) isin Rd
y prime = f (y p) y(t0) = y0
Usually nonlinear in p
Parameters p = (p1 pq) isin Rq
Data zkl asymp yk(tl p) k = 1 d l = 1 Mk
mostly m =sum
k Mk q data compression
Task fit the model (ODE) to data estimate parameters
Parameter Identification Susanna Roblitz 6
Problem formulation
Questions
I What algorithm to use
I How good is the fit
I Is there a better model
I Sensitivity
Parameter Identification Susanna Roblitz 7
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Problem formulation
Model y(t p) = (y1(t p) yd(t p)) isin Rd
y prime = f (y p) y(t0) = y0
Usually nonlinear in p
Parameters p = (p1 pq) isin Rq
Data zkl asymp yk(tl p) k = 1 d l = 1 Mk
mostly m =sum
k Mk q data compression
Task fit the model (ODE) to data estimate parameters
Parameter Identification Susanna Roblitz 6
Problem formulation
Questions
I What algorithm to use
I How good is the fit
I Is there a better model
I Sensitivity
Parameter Identification Susanna Roblitz 7
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Problem formulation
Questions
I What algorithm to use
I How good is the fit
I Is there a better model
I Sensitivity
Parameter Identification Susanna Roblitz 7
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Problem formulation
minimize least squares error
F (p)22 =
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
rarr min
σkl measurement error (standard deviation)
0 5 10 15minus2
0
2
4
6
8
10
12
t
y
measurement values
model function
residues
Parameter Identification Susanna Roblitz 8
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Approaches
I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value
I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown
parameters as random variablesI Prediction follows rules of probability
Parameter Identification Susanna Roblitz 9
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 10
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Likelihood
assumption normally distributed measurement noise
P(z |p) prop exp
(minus
dsumk=1
Mksuml=1
(zkl minus yk(tl p))2
2σ2kl
)
maximum likelihood estimate (MLE)
pMLE = argmaxpP(z |p)
Parameter Identification Susanna Roblitz 11
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Bayes theorem
Joint probability = Conditional Probability x MarginalProbability
P(p z) = P(p|z)P(z) = P(z |p)P(p)
P(p|z) =P(z |p)P(p)
P(z)
P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =
intP(z |p)P(p)dp evidence
maximum a posteriori estimate (MAP)
pMAP = argmaxpP(p|z)
Parameter Identification Susanna Roblitz 12
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Evaluation of posterior
P(p|z) prop P(z |p)P(p)
I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence
I consider full high dimensional posterior PDF for statisticalinference
I necessity of a prior
Parameter Identification Susanna Roblitz 13
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models
Parameter Identification Susanna Roblitz 14
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Linear least squares
y(t p) = B(t)p B(t) isin Rdtimesq
Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd
Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T
Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat
z minus Ap2 rarr min
Parameter Identification Susanna Roblitz 15
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Example Reaction kinetics
I Arrhenius law dependence ofrate constant K ontemperature T
K (T ) = C middot exp
(minus E
RT
)I gas constant
R = 83145 JmolmiddotK
I unknown parameterspre-exponential factor Cactivation energy E
I measurements(Ti Ki) i = 1 m = 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
TK
Parameter Identification Susanna Roblitz 16
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Example Reaction kinetics
minimization problem
log C minus 1
RTiE = log Ki rArr log K minus A
(log C
E
)2 = min
Ai1 = 1 Ai2 = minus 1
RTi i = 1 21
720 740 760 780 800 8200
02
04
06
08
1
12x 10minus3
T
K
720 740 760 780 800 820minus12
minus11
minus10
minus9
minus8
minus7
minus6
T
log
K
Parameter Identification Susanna Roblitz 17
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
The method of normal equations
Apθ
(A)Apminuszz
R
z minus Ap2 = minhArr z minus Ap isin R(A)perp
R(A) = Ap p isin Rq (range space of A)
z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z
uniquely solvablehArr ATA invertiblehArr rank(A) = q
rArr solution via Cholesky decomposition (ATA is spd)
if rank(A) = q κ2(A)2 without proof= λmax(ATA)
λmin(ATA)= κ2(ATA)
We need a more stable method for solving the normal eq
Parameter Identification Susanna Roblitz 18
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Orthogonalization methods
Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I
z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2
2
The Eucledian norm middot 2 is invariant under orthogonaltransformation
z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min
Choice of the orthogonal transformation Q isin Rmtimesm
Parameter Identification Susanna Roblitz 19
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
QR factorization
QR factorization =A
R
0
Q Q1 2
1
A = Q
(R1
0
) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular
Solution of least-squares problem MATLAB [QR]=qr(A)
z minus Ap22 = QT (z minus Ap)2
2 =
∥∥∥∥(z1 minus R1pz2
)∥∥∥∥2
2
= z1 minus R1p22 + z22 ge z22
2
rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible
z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22
The QR-factorization is stable since κ2(A) = κ2(R)
Parameter Identification Susanna Roblitz 20
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Householder QR
Columnwise application of Householder matrices to A isin Rmtimesq
QTA = (HqHqminus1 middot middot middotH2H1)A = R
For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm
A(k) = Hk
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
lowast lowast middot middot middot lowast
=
lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast
lowast middot middot middot middot middot middot middot middot middot lowast
lowast lowast middot middot middot lowast
0
0 lowast middot middot middot lowast
Only the entries lowast and lowast are modified
Parameter Identification Susanna Roblitz 21
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
QR with column pivoting
BusingerGolub Numer Math 7 (1965)
column permutation with the aim
|r11| ge |r22| ge ge |rqq|step k
A(kminus1) =
(R S0 T (k)
)Denote the column j of T (k) by T
(k)j
Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change
|αk | = |rkk | = T (k)1 2 = max
jT (k)
j 2 = T (k)
Finally AΠ = QR with permutation matrix Π
Parameter Identification Susanna Roblitz 22
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Rank decision
rank(A) = n lt q
QTAΠexact=
(R S0 0
)eps=
(R S0 T (n+1)
) T (n+1) ldquosmallrdquo
Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0
Idea Stop the iteration as soon as
|rk+1k+1| lt δ|r11| δ relative precision of A
Definition numerical rank n
|rn+1n+1| lt δ|r11| le |rnn|
choice of δ =rArr decision for n
Parameter Identification Susanna Roblitz 23
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Subcondition number
DeuflhardSautter Lin Alg Appl 29 (1980)
Definition subconditon sc(A) = |r11||rqq|
Properties
I sc(A) ge 1
I sc(αA) = sc(A)
I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)
A is called almost singular if
δsc(A) ge 1 or equivalently δ|r11| ge |rqq|
Parameter Identification Susanna Roblitz 24
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Scaling
Drow Dcol diagonal matrices
We consider the scaled least squares problem
Dminus1row(ADcolD
minus1col p minus z)2 = Ap minus z2 min
where A = Dminus1rowADcol p = Dminus1
col p z = Dminus1rowz
I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |
I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)
in general sc(A) 6= sc(A) and rank(A) 6= rank(A)
Parameter Identification Susanna Roblitz 25
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Generalized Inverse
(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq
(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer
Ap minus z2 = min
(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular
p = (ATA)minus1AT︸ ︷︷ ︸=A+
z
(2b) n lt q write p = A+z A+ generalizedpseudo inverse
Parameter Identification Susanna Roblitz 26
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Penrose axioms
Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by
(i) (A+A)T = A+A
(ii) (AA+)T = AA+
(iii) A+AA+ = A+
(iv) AA+A = A
One can show plowast = A+z is the shortest solution ie
plowast2 le p2 forallp isin L(z)
with
L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)
Parameter Identification Susanna Roblitz 27
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Computation of shortest solution
QTAΠ =
(R S0 0
) A isin Rmtimesq
rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)
ΠTp =
(p1
p2
) p1 isin Rn p2 isin Rqminusn
QT z =
(z1
z2
) z1 isin Rn z2 isin Rmminusn
rArr zminusAp22 = QT zminusQTAp2
2 = z1minusRp1minusSp222 +z22
2
zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)
Parameter Identification Susanna Roblitz 28
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Computation of shortest solution
V = Rminus1S u = Rminus1z1
p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22
= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)
φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0
hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)
φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum
Note if n lt q2 a faster algorithm exists
Parameter Identification Susanna Roblitz 29
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 30
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Problem formulation
Data(ti zi) ti isin R zi isin Rd i = 1 M
Parameters p = (p1 pq)T
Model
y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p
Measurement tolerances
δzi isin Rd Di = diag(δzi)
Define
F (p) =
Dminus11 (z1 minus y(t1 p))
Dminus1
M (zM minus y(tM p))
isin Rm m le M middot d
Parameter Identification Susanna Roblitz 31
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Problem formulation
Nonlinear least-squares problem
Msumi=1
Dminus1i (zi minus y(ti p))2
2 = F (p)TF (p) = F (p)22 = min
Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that
F (plowast)22 = min
pisinDF (p)2
2
The nonlinear least-squares problem is called compatible if
F (plowast) = 0
Parameter Identification Susanna Roblitz 32
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Supplement Multivariable calculus
Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)
g prime(p) = nablag(p) =
(partg
partp1 middot middot middot partg
partpq
)T
Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix
F prime(p) =
partpartp1
F1(p) middot middot middot partpartpq
F1(p)
partpartp1
Fm(p) middot middot middot partpartpq
Fm(p)
isin Rmtimesq
Parameter Identification Susanna Roblitz 33
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Problem formulation
Minimization problem
g(p) = F (p)22 = min
sufficient condition on plowast to be a local minimum
g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite
Find by Newton iteration a zero of the function G Rq rarr Rq
G (p) = F prime(p)TF (p)
(system of q nonlinear equations)
Parameter Identification Susanna Roblitz 34
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Newton iteration
Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0
G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0
= G (p0) + G prime(p0)(p minus p0) = G (p)
Zero p1 of the linear substitute map G
G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0
Newton iteration
G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk
system of nonlinear equations =rArr sequence of linear systems
Parameter Identification Susanna Roblitz 35
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Gauss-Newton iteration
G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)
iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)
)∆pk = minusF prime(pk)TF (pk)
I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)
I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)
rArr Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
Parameter Identification Susanna Roblitz 36
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Gauss-Newton iteration
F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk
corresponds to normal equation for the linear least squaresproblem
F prime(pk)∆pk + F (pk)2 = min
formal solution
∆pk = minusF prime(pk)+F (pk)
in case of convergence F prime(plowast)+F (plowast) = 0
if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)
)minus1F prime(p)T
Parameter Identification Susanna Roblitz 37
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Classification
I local GN methods require sufficiently good starting guessp0
I global GN methods can handle bad guesses as well
I m = q guaranteed local convergence
I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems
Parameter Identification Susanna Roblitz 38
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Assumptions
We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0
I F D rarr Rm continuously differentiable D sube Rq openand convex
I F prime(p) possibly rank-deficient Jacobian
I F prime satisfies a particular Lipschitz condition (ie F prime
behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin
F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)
for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)
Parameter Identification Susanna Roblitz 39
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Assumptions
I compatibility condition (model and data must somehowfit each other)
F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D
κ(p) le κ lt 1 forallp isin D (ii)
I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D
∆p0 le α and Bρ(p0) sub D (iii)
h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)
Parameter Identification Susanna Roblitz 40
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Convergence result
Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with
F prime(plowast)+F (plowast) = 0
The convergence rate can be estimated according to
pk+1minus pk2 le1
2ωpk minus pkminus12
2 +κ(pkminus1)pk minus pkminus12 (lowast)
This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)
Uniqueness =rArr requires full rank Jacobian
Parameter Identification Susanna Roblitz 41
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Compatibility
I linear least squares problems F (p) = Ap minus z
F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)
=rArr ω = 0
F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)
=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast
convergence within one iterationI systems of nonlinear equations (m = q)
if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1
rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0
quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0
Parameter Identification Susanna Roblitz 42
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Incompatibility factor
General nonlinear least-squares problems (m gt q)
limkrarrinfin
pk+1 minus pkpk minus pkminus1
(lowast)le lim
krarrinfin
(ω2pk minus pkminus1+ κ(pkminus1)
)= κ(plowast)
κ(plowast) is the asymptotic convergence rate
Adequate nonlinear least-squares problems arecharacterized by the condition
κ(plowast) lt 1
SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems
Parameter Identification Susanna Roblitz 43
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Convergence monitorTermination criteria
Define
∆pk+1
= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)
Stop iteration as soon as linear convergence dominates theiteration process
∆pk+1 asymp ∆pk+1 le XTOL
For incompatible problems in the convergent phase it holds
∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p
kThis provides an estimate for κ(plowast)
κ(plowast) asymp ∆pk+1∆pk
Parameter Identification Susanna Roblitz 44
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Convergence monitor
0 2 4 6 8 10 1210
minus15
10minus10
10minus5
100
k
ordinary GN correctionssimplified GN corrections
Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase
Parameter Identification Susanna Roblitz 45
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Summary
Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data
Parameter Identification Susanna Roblitz 46
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Identifiability
interpretation of result in case of rank deficient Jacobian
Let plowast be the final iterate and rank(F prime(plowast)) = n lt q
QR factorization with column pivoting and rank decision
QTF prime(plowast)Π = R
rArr reordering of parameters to (plowastπ1 plowastπ2
plowastπn plowastπq) in
such a way that
|r11| ge |r22| ge middot middot middot ge |rnn|
rarr parameters (plowastπ1 plowastπ2
plowastπn) can actually be estimatedfrom the current dataset
Parameter Identification Susanna Roblitz 47
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Parameter transformations
p constrained parametersu unconstrained parametersφ transformation function
p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)
constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA
2(1 + sin u)
p le C p = C +(1minusradic
1 + u2)
p ge C p = C minus(1minusradic
1 + u2)
Parameter Identification Susanna Roblitz 48
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Levenberg-Marquardt method
Idea local linearization only reasonable in a smallneighborhood of pk
F (pk) + F prime(pk)∆pk22 + ρ∆pk2
2 = min
(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)
I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton
I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0
Parameter Identification Susanna Roblitz 49
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Scaling
I row (measurement) scaling
F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)
F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min
Different scaling leads to a different solutionI column (parameter) scaling
p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol
assume pk = Dcolpk
∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)
F prime(pk ) full rank= minusDminus1
col F prime(pk)+F (pk) = Dminus1col ∆pk
No invariance in the rank deficient case
Parameter Identification Susanna Roblitz 50
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Globalization
Global (or damped) Gauss-Newton method
F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1
How to choose λk
Remember we want to solve F prime(p)+F (p) = 0
Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)
damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test
∆pk+1
(λk)∆pk lt 1
divergence criterion λk lt λmin 1
Parameter Identification Susanna Roblitz 51
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Codes
I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares
I NLSCON solver for Nonlinear Least-Squares withCONstraints
Parameter Identification Susanna Roblitz 52
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Outline
1 Introductory Example
2 Problem Formulation
3 The Bayesian Approach
4 The Local Optimization Approach
41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models
Parameter Identification Susanna Roblitz 53
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Problem formulation
d species q (unknown) parameters
model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq
set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd
with measurement tolerances δzj isin Rd Di = diag(δzi)
F (p) =
Dminus11 (y(t1 p)minus z1)
Dminus1
M (y(tM p)minus zM)
F prime(p) =
Dminus11 yp(t1 p)
Dminus1
M yp(tM p)
Gauss-Newton method
∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk
Parameter Identification Susanna Roblitz 54
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Computation of F (p)
requires solution of y prime = f (y p) at time points t1 tM
Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM
Solution use integrator with dense output
Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed
Example LIMEX has a dense output based on Hermiteinterpolation
Parameter Identification Susanna Roblitz 55
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Sensitivity analysis
I sensitivity matrix at time t
S(t) =party
partp(t) = yp(t) isin Rdtimesq
F prime(p) =
Dminus11 S(t1)
Dminus1
M S(tM)
I scaled sensitivity matrix (necessary whenever species or
parameters cover a broad range of physical units)
Sij(t) =partyipartpj
(t) middot pscal
yscal
pscal yscal user specified scaling values
Parameter Identification Susanna Roblitz 56
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Computation of sensitivities
Differentiate y prime = f (y p) with respect to p to obtain thevariational equation
y primep = fy (y p) middot yp + fp(y p) yp(0) = 0
Compact notation
S prime = fy middot S + fp S = [s1 sq] sk isin Rd
In practice solve combined systemy prime
s prime1
s primeq
=
f
fy middot s1 + fp1
fy middot sq + fpq
isin R(q+1)d
can be solved efficiently by LIMEX (inexact Jacobian decoupling)
Parameter Identification Susanna Roblitz 57
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
Example
k F (pk ) ∆pk λk rank
0 481e-02 355e-01 21 111e-01 457e-01 1000 2
255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2
plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994
κ = 0008 sc(F prime(plowast)) = 63
k F (pk ) ∆pk λk rank
0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1
plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622
κ = 0004 sc(F prime(plowast)) = 45times 106
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
0 10 20 300
05
1
15
2
time
concentr
ation
A B C D
Parameter Identification Susanna Roblitz 58
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
BioPARKIN
integrated software environment for simulation and parameteridentification httpbioparkinzibde
Parameter Identification Susanna Roblitz 59
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61
BioPARKIN
I SBML import and export
I deterministic simulation methods (LIMEX others willfollow)
I sensitivity analysis based on variational equations
I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)
I output of information on identifiability of parameters
I time-shifting of data and concatination of IVPs formulti-experiment simulations
Parameter Identification Susanna Roblitz 60
Thank you for your attention
ContactZuse Institute Berlin
Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml
Parameter Identification Susanna Roblitz 61