Zuse Institute Berlin (ZIB) S. R oblitz 1.Introductory Example ... I Estimates p with some con dence...

61
What is... Parameter Identification? S.R¨oblitz Zuse Institute Berlin (ZIB) December 10, 2014

Transcript of Zuse Institute Berlin (ZIB) S. R oblitz 1.Introductory Example ... I Estimates p with some con dence...

What is

Parameter Identification

S Roblitz

Zuse Institute Berlin (ZIB)

December 10 2014

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 2

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 3

Introductory Example

A + Bk1

GGGGGBFGGGGG

k2

C + D

Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD

A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0

2 = 05

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Case (T+E) measurement data cover

both transient and equilibrium phase

0 10 20 300

05

1

15

2

timeconcentr

ation

A B C D

Case (E) measurement data cover

equilibrium phase only

Parameter Identification Susanna Roblitz 4

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 5

Problem formulation

Model y(t p) = (y1(t p) yd(t p)) isin Rd

y prime = f (y p) y(t0) = y0

Usually nonlinear in p

Parameters p = (p1 pq) isin Rq

Data zkl asymp yk(tl p) k = 1 d l = 1 Mk

mostly m =sum

k Mk q data compression

Task fit the model (ODE) to data estimate parameters

Parameter Identification Susanna Roblitz 6

Problem formulation

Questions

I What algorithm to use

I How good is the fit

I Is there a better model

I Sensitivity

Parameter Identification Susanna Roblitz 7

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 2

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 3

Introductory Example

A + Bk1

GGGGGBFGGGGG

k2

C + D

Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD

A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0

2 = 05

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Case (T+E) measurement data cover

both transient and equilibrium phase

0 10 20 300

05

1

15

2

timeconcentr

ation

A B C D

Case (E) measurement data cover

equilibrium phase only

Parameter Identification Susanna Roblitz 4

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 5

Problem formulation

Model y(t p) = (y1(t p) yd(t p)) isin Rd

y prime = f (y p) y(t0) = y0

Usually nonlinear in p

Parameters p = (p1 pq) isin Rq

Data zkl asymp yk(tl p) k = 1 d l = 1 Mk

mostly m =sum

k Mk q data compression

Task fit the model (ODE) to data estimate parameters

Parameter Identification Susanna Roblitz 6

Problem formulation

Questions

I What algorithm to use

I How good is the fit

I Is there a better model

I Sensitivity

Parameter Identification Susanna Roblitz 7

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 3

Introductory Example

A + Bk1

GGGGGBFGGGGG

k2

C + D

Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD

A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0

2 = 05

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Case (T+E) measurement data cover

both transient and equilibrium phase

0 10 20 300

05

1

15

2

timeconcentr

ation

A B C D

Case (E) measurement data cover

equilibrium phase only

Parameter Identification Susanna Roblitz 4

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 5

Problem formulation

Model y(t p) = (y1(t p) yd(t p)) isin Rd

y prime = f (y p) y(t0) = y0

Usually nonlinear in p

Parameters p = (p1 pq) isin Rq

Data zkl asymp yk(tl p) k = 1 d l = 1 Mk

mostly m =sum

k Mk q data compression

Task fit the model (ODE) to data estimate parameters

Parameter Identification Susanna Roblitz 6

Problem formulation

Questions

I What algorithm to use

I How good is the fit

I Is there a better model

I Sensitivity

Parameter Identification Susanna Roblitz 7

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Introductory Example

A + Bk1

GGGGGBFGGGGG

k2

C + D

Aprime = B prime = minusk1AB + k2CD C prime = D prime = +k1AB minus k2CD

A(0) = 2 B(0) = 1 C (0) = 05 D(0) = 0 k01 = 02 k0

2 = 05

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Case (T+E) measurement data cover

both transient and equilibrium phase

0 10 20 300

05

1

15

2

timeconcentr

ation

A B C D

Case (E) measurement data cover

equilibrium phase only

Parameter Identification Susanna Roblitz 4

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 5

Problem formulation

Model y(t p) = (y1(t p) yd(t p)) isin Rd

y prime = f (y p) y(t0) = y0

Usually nonlinear in p

Parameters p = (p1 pq) isin Rq

Data zkl asymp yk(tl p) k = 1 d l = 1 Mk

mostly m =sum

k Mk q data compression

Task fit the model (ODE) to data estimate parameters

Parameter Identification Susanna Roblitz 6

Problem formulation

Questions

I What algorithm to use

I How good is the fit

I Is there a better model

I Sensitivity

Parameter Identification Susanna Roblitz 7

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 5

Problem formulation

Model y(t p) = (y1(t p) yd(t p)) isin Rd

y prime = f (y p) y(t0) = y0

Usually nonlinear in p

Parameters p = (p1 pq) isin Rq

Data zkl asymp yk(tl p) k = 1 d l = 1 Mk

mostly m =sum

k Mk q data compression

Task fit the model (ODE) to data estimate parameters

Parameter Identification Susanna Roblitz 6

Problem formulation

Questions

I What algorithm to use

I How good is the fit

I Is there a better model

I Sensitivity

Parameter Identification Susanna Roblitz 7

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Problem formulation

Model y(t p) = (y1(t p) yd(t p)) isin Rd

y prime = f (y p) y(t0) = y0

Usually nonlinear in p

Parameters p = (p1 pq) isin Rq

Data zkl asymp yk(tl p) k = 1 d l = 1 Mk

mostly m =sum

k Mk q data compression

Task fit the model (ODE) to data estimate parameters

Parameter Identification Susanna Roblitz 6

Problem formulation

Questions

I What algorithm to use

I How good is the fit

I Is there a better model

I Sensitivity

Parameter Identification Susanna Roblitz 7

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Problem formulation

Questions

I What algorithm to use

I How good is the fit

I Is there a better model

I Sensitivity

Parameter Identification Susanna Roblitz 7

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Problem formulation

minimize least squares error

F (p)22 =

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

rarr min

σkl measurement error (standard deviation)

0 5 10 15minus2

0

2

4

6

8

10

12

t

y

measurement values

model function

residues

Parameter Identification Susanna Roblitz 8

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Approaches

I FrequentistI Assumes that there is an unknown but fixed parameter pI Estimates p with some confidenceI Prediction by using the estimated parameter value

I BayesianI Represents uncertainty about the unknown parameterI Uses probability to quantify this uncertainty unknown

parameters as random variablesI Prediction follows rules of probability

Parameter Identification Susanna Roblitz 9

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 10

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Likelihood

assumption normally distributed measurement noise

P(z |p) prop exp

(minus

dsumk=1

Mksuml=1

(zkl minus yk(tl p))2

2σ2kl

)

maximum likelihood estimate (MLE)

pMLE = argmaxpP(z |p)

Parameter Identification Susanna Roblitz 11

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Bayes theorem

Joint probability = Conditional Probability x MarginalProbability

P(p z) = P(p|z)P(z) = P(z |p)P(p)

P(p|z) =P(z |p)P(p)

P(z)

P(p) priorP(z |p) likelihoodP(p|z) posterior (probability of parameter given the data)P(z) =

intP(z |p)P(p)dp evidence

maximum a posteriori estimate (MAP)

pMAP = argmaxpP(p|z)

Parameter Identification Susanna Roblitz 12

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Evaluation of posterior

P(p|z) prop P(z |p)P(p)

I evaluation of posterior by Markov chain Monte Carlo(MCMC) sampling (Metropolis-Hastings algorithm)rarr convergence

I consider full high dimensional posterior PDF for statisticalinference

I necessity of a prior

Parameter Identification Susanna Roblitz 13

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization ApproachI Linear Least Squares ProblemsI Nonlinear Least Squares ProblemsI Specification for ODE Models

Parameter Identification Susanna Roblitz 14

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Linear least squares

y(t p) = B(t)p B(t) isin Rdtimesq

Measurement time points t1 tM isin RMeasurement values z1 zM isin Rd

Set A(t) = [B(t1) B(tM)]T z = [z1 zM ]T

Linear least-squares problemFor given z isin Rm and A isin Rmtimesq with m ge q find p isin Rq suchthat

z minus Ap2 rarr min

Parameter Identification Susanna Roblitz 15

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Example Reaction kinetics

I Arrhenius law dependence ofrate constant K ontemperature T

K (T ) = C middot exp

(minus E

RT

)I gas constant

R = 83145 JmolmiddotK

I unknown parameterspre-exponential factor Cactivation energy E

I measurements(Ti Ki) i = 1 m = 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

TK

Parameter Identification Susanna Roblitz 16

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Example Reaction kinetics

minimization problem

log C minus 1

RTiE = log Ki rArr log K minus A

(log C

E

)2 = min

Ai1 = 1 Ai2 = minus 1

RTi i = 1 21

720 740 760 780 800 8200

02

04

06

08

1

12x 10minus3

T

K

720 740 760 780 800 820minus12

minus11

minus10

minus9

minus8

minus7

minus6

T

log

K

Parameter Identification Susanna Roblitz 17

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

The method of normal equations

Apθ

(A)Apminuszz

R

z minus Ap2 = minhArr z minus Ap isin R(A)perp

R(A) = Ap p isin Rq (range space of A)

z minus Ap2 = minpprimeisinRq z minus Apprime2 lArrrArr ATAp = AT z

uniquely solvablehArr ATA invertiblehArr rank(A) = q

rArr solution via Cholesky decomposition (ATA is spd)

if rank(A) = q κ2(A)2 without proof= λmax(ATA)

λmin(ATA)= κ2(ATA)

We need a more stable method for solving the normal eq

Parameter Identification Susanna Roblitz 18

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Orthogonalization methods

Let Q isin Rmtimesm be an orthogonal matrix ieQTQ = I

z minus Ap22 = (z minus Ap)TQTQ(z minus Ap) = Q(z minus Ap)2

2

The Eucledian norm middot 2 is invariant under orthogonaltransformation

z minus Ap2 = min lArrrArr Q(z minus Ap)2 = min

Choice of the orthogonal transformation Q isin Rmtimesm

Parameter Identification Susanna Roblitz 19

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

QR factorization

QR factorization =A

R

0

Q Q1 2

1

A = Q

(R1

0

) Q isin Rmtimesm orthogonal R1 isin Rqtimesq upper triangular

Solution of least-squares problem MATLAB [QR]=qr(A)

z minus Ap22 = QT (z minus Ap)2

2 =

∥∥∥∥(z1 minus R1pz2

)∥∥∥∥2

2

= z1 minus R1p22 + z22 ge z22

2

rank(A) = q rArr rank(R) = rank(A) = q rArr R1 is invertible

z minus Ap2 = minhArr p = Rminus11 z1 rArr r2 = z22

The QR-factorization is stable since κ2(A) = κ2(R)

Parameter Identification Susanna Roblitz 20

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Householder QR

Columnwise application of Householder matrices to A isin Rmtimesq

QTA = (HqHqminus1 middot middot middotH2H1)A = R

For k = 1 q we obtain Hk = diag(Ikminus1Hprimek) isin Rmtimesm

A(k) = Hk

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

lowast lowast middot middot middot lowast

=

lowast middot middot middot middot middot middot middot middot middot middot middot middot middot middot middot lowast

lowast middot middot middot middot middot middot middot middot middot lowast

lowast lowast middot middot middot lowast

0

0 lowast middot middot middot lowast

Only the entries lowast and lowast are modified

Parameter Identification Susanna Roblitz 21

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

QR with column pivoting

BusingerGolub Numer Math 7 (1965)

column permutation with the aim

|r11| ge |r22| ge ge |rqq|step k

A(kminus1) =

(R S0 T (k)

)Denote the column j of T (k) by T

(k)j

Pivot strategy push the column of T (k) with maximal 2-normto the front so that after the change

|αk | = |rkk | = T (k)1 2 = max

jT (k)

j 2 = T (k)

Finally AΠ = QR with permutation matrix Π

Parameter Identification Susanna Roblitz 22

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Rank decision

rank(A) = n lt q

QTAΠexact=

(R S0 0

)eps=

(R S0 T (n+1)

) T (n+1) ldquosmallrdquo

Note The exact rank n cannot be computed since byrounding errors T (n+1) = |rn+1n+1| 6= 0

Idea Stop the iteration as soon as

|rk+1k+1| lt δ|r11| δ relative precision of A

Definition numerical rank n

|rn+1n+1| lt δ|r11| le |rnn|

choice of δ =rArr decision for n

Parameter Identification Susanna Roblitz 23

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Subcondition number

DeuflhardSautter Lin Alg Appl 29 (1980)

Definition subconditon sc(A) = |r11||rqq|

Properties

I sc(A) ge 1

I sc(αA) = sc(A)

I A 6= 0 singular lArrrArr sc(A) =infinI sc(A) le κ2(A) (hence the name)

A is called almost singular if

δsc(A) ge 1 or equivalently δ|r11| ge |rqq|

Parameter Identification Susanna Roblitz 24

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Scaling

Drow Dcol diagonal matrices

We consider the scaled least squares problem

Dminus1row(ADcolD

minus1col p minus z)2 = Ap minus z2 min

where A = Dminus1rowADcol p = Dminus1

col p z = Dminus1rowz

I row (measurement) scaling Drow = diag(δz1 δzm)eg relative error concept δzj = ε|zj |

I column (parameter) scalingDcol = diag(p1scal pqscal)(to account for different physical units)

in general sc(A) 6= sc(A) and rank(A) 6= rank(A)

Parameter Identification Susanna Roblitz 25

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Generalized Inverse

(1) A isin Rqtimesq rank(A) = q (A is regular)rArr Ap = z has unique solution p = Aminus1z where Aminus1A = AAminus1 = Iq

(2) A isin Rmtimesq rank(A) = n le q lt mrArr usually Ap = z has no solution but there is aminimizer

Ap minus z2 = min

(2a) n = q Ap minus z2 = minhArr ATAp = AT z (normal eq)uniquely solvable since ATA is nonsingular

p = (ATA)minus1AT︸ ︷︷ ︸=A+

z

(2b) n lt q write p = A+z A+ generalizedpseudo inverse

Parameter Identification Susanna Roblitz 26

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Penrose axioms

Let A isin Rmtimesq rank(A) = n le q le m Then A+ is uniquelydefined by

(i) (A+A)T = A+A

(ii) (AA+)T = AA+

(iii) A+AA+ = A+

(iv) AA+A = A

One can show plowast = A+z is the shortest solution ie

plowast2 le p2 forallp isin L(z)

with

L(z) = p isin Rq z minus Ap2 = min = plowast +N (A)

Parameter Identification Susanna Roblitz 27

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Computation of shortest solution

QTAΠ =

(R S0 0

) A isin Rmtimesq

rank(A) = n R isin Rntimesn upper triangular S isin Rntimes(qminusn)

ΠTp =

(p1

p2

) p1 isin Rn p2 isin Rqminusn

QT z =

(z1

z2

) z1 isin Rn z2 isin Rmminusn

rArr zminusAp22 = QT zminusQTAp2

2 = z1minusRp1minusSp222 +z22

2

zminusAp22 = minhArr z1minusRp1minusSp2 = 0hArr p1 = Rminus1(z1 minus Sp2)

Parameter Identification Susanna Roblitz 28

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Computation of shortest solution

V = Rminus1S u = Rminus1z1

p2 = ΠTp2 = p12 + p22 = u minus Vp22 + p22

= u2 minus 2〈uVp2〉+ 〈Vp2Vp2〉+ 〈p2 p2〉= u2 + 〈p2 (I + V TV )p2 minus 2V Tu〉= φ(p2)

φ(p2) = min if φprime(p2) = 2(I + V TV )p2 minus 2V Tu = 0

hArr (I + V TV )p2 = V Tu (solution by Cholesky decomposition)

φprimeprime(p2) = 2(I + V TV ) spd rArr p2 is minimum

Note if n lt q2 a faster algorithm exists

Parameter Identification Susanna Roblitz 29

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 30

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Problem formulation

Data(ti zi) ti isin R zi isin Rd i = 1 M

Parameters p = (p1 pq)T

Model

y(t p) Rtimes Rq 7rarr Rd nonlinear wrt p

Measurement tolerances

δzi isin Rd Di = diag(δzi)

Define

F (p) =

Dminus11 (z1 minus y(t1 p))

Dminus1

M (zM minus y(tM p))

isin Rm m le M middot d

Parameter Identification Susanna Roblitz 31

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Problem formulation

Nonlinear least-squares problem

Msumi=1

Dminus1i (zi minus y(ti p))2

2 = F (p)TF (p) = F (p)22 = min

Nonlinear least-squares problemGiven a mapping F D sube Rq rarr Rm with m gt q find asolution plowast isin D such that

F (plowast)22 = min

pisinDF (p)2

2

The nonlinear least-squares problem is called compatible if

F (plowast) = 0

Parameter Identification Susanna Roblitz 32

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Supplement Multivariable calculus

Consider p = (p1 pq)T isin Rq and the scalar functiong(p) Rq rarr RThe derivative of g is the gradient (a column vector)

g prime(p) = nablag(p) =

(partg

partp1 middot middot middot partg

partpq

)T

Consider the vector F (p) = (F1(p) Fm(p))T Rq rarr RmIts derivative is the Jacobian matrix

F prime(p) =

partpartp1

F1(p) middot middot middot partpartpq

F1(p)

partpartp1

Fm(p) middot middot middot partpartpq

Fm(p)

isin Rmtimesq

Parameter Identification Susanna Roblitz 33

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Problem formulation

Minimization problem

g(p) = F (p)22 = min

sufficient condition on plowast to be a local minimum

g prime(plowast) = 2F prime(plowast)TF (plowast) = 0 and g primeprime(plowast) positive definite

Find by Newton iteration a zero of the function G Rq rarr Rq

G (p) = F prime(p)TF (p)

(system of q nonlinear equations)

Parameter Identification Susanna Roblitz 34

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Newton iteration

Idea approximation of G by a ldquosimplerrdquo function namely itsTaylor series expansion around a starting guess p0

G (p) = G (p0) + G prime(p0)(p minus p0) + O(p minus p0) for p rarr p0

= G (p0) + G prime(p0)(p minus p0) = G (p)

Zero p1 of the linear substitute map G

G prime(p0)∆p0 = minusG (p0) p1 = p0 + ∆p0

Newton iteration

G prime(pk)∆pk = minusG (pk) pk+1 = pk + ∆pk

system of nonlinear equations =rArr sequence of linear systems

Parameter Identification Susanna Roblitz 35

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Gauss-Newton iteration

G prime(p) = F prime(p)TF prime(p) + F primeprime(p)TF (p)

iteration in terms of F (p)(F prime(pk)TF prime(pk) + F primeprime(pk)TF (pk)

)∆pk = minusF prime(pk)TF (pk)

I compatible caseF (plowast) = 0 rArr G prime(plowast) = F prime(plowast)TF prime(plowast)

I almost compatible case F (plowast) is ldquosmallrdquo rArr save effortfor evaluating F primeprime(p)

rArr Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

Parameter Identification Susanna Roblitz 36

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Gauss-Newton iteration

F prime(pk)TF prime(pk)∆pk = minusF prime(pk)TF (pk) pk+1 = pk + ∆pk

corresponds to normal equation for the linear least squaresproblem

F prime(pk)∆pk + F (pk)2 = min

formal solution

∆pk = minusF prime(pk)+F (pk)

in case of convergence F prime(plowast)+F (plowast) = 0

if rank(F prime(p)) = q rArr F prime(p)+ =(F prime(p)TF prime(p)

)minus1F prime(p)T

Parameter Identification Susanna Roblitz 37

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Classification

I local GN methods require sufficiently good starting guessp0

I global GN methods can handle bad guesses as well

I m = q guaranteed local convergence

I m gt q convergence only for a small subclass of nonlinearLSQ problems so-called adequate problems

Parameter Identification Susanna Roblitz 38

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Assumptions

We want to show that the Gauss-Newton method convergeslocally This can only be expected under some assumptions onthe function F (p) and the starting guess p0

I F D rarr Rm continuously differentiable D sube Rq openand convex

I F prime(p) possibly rank-deficient Jacobian

I F prime satisfies a particular Lipschitz condition (ie F prime

behaves ldquonicelyrdquo) with Lipschitz constant 0 le ω ltinfin

F prime(p)+(F prime(p)minus F prime(p))(p minus p)2 le ωp minus p22 (i)

for all collinear p p p isin D (ie they lie on a singlestraight line) and p minus p isin R(F prime(p)+)

Parameter Identification Susanna Roblitz 39

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Assumptions

I compatibility condition (model and data must somehowfit each other)

F prime(p)+(I minus F prime(p)F prime(p)+)F (p) le κ(p)p minus p forallp p isin D

κ(p) le κ lt 1 forallp isin D (ii)

I the initial update ∆p0 must be small enough and a ballBρ(p0) with radius ρ around p0 is contained in D

∆p0 le α and Bρ(p0) sub D (iii)

h = αω lt 2(1minus κ) ρ = α(1minus κminus h2) (iv)

Parameter Identification Susanna Roblitz 40

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Convergence result

Under the above assumptions the sequence pk ofGauss-Newton iterates is well-defined remains in Bρ(p0) andconverges to some plowast isin Bρ(p0) with

F prime(plowast)+F (plowast) = 0

The convergence rate can be estimated according to

pk+1minus pk2 le1

2ωpk minus pkminus12

2 +κ(pkminus1)pk minus pkminus12 (lowast)

This result shows existence of a solution plowast in the sense thatF prime(plowast)+F (plowast) = 0 even in case of rank deficient Jacobian(rank may vary throughout D as long as ω ltinfin and κ lt 1)

Uniqueness =rArr requires full rank Jacobian

Parameter Identification Susanna Roblitz 41

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Compatibility

I linear least squares problems F (p) = Ap minus z

F prime(p) = (Ap minus z)prime = A = (Ap minus z)prime = F prime(p)(i)

=rArr ω = 0

F prime(p)+(I minus F prime(p)F prime(p)+) = F prime(p)+(I minus F prime(p)F prime(p)+) = 0(ii)

=rArr κ(p) = κ = 0(lowast)rArr p2minusp1 le 0rArr ∆p2 = 0rArr p1 = plowast

convergence within one iterationI systems of nonlinear equations (m = q)

if F prime(p) nonsingular rArr F prime(p)+ = F prime(p)minus1

rArr I minus F prime(p)F prime(p)+ = 0(lowast)rArr κ(p) = 0

quadratic convergenceI compatible problems F (plowast) = 0rArr κ(plowast) = 0

Parameter Identification Susanna Roblitz 42

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Incompatibility factor

General nonlinear least-squares problems (m gt q)

limkrarrinfin

pk+1 minus pkpk minus pkminus1

(lowast)le lim

krarrinfin

(ω2pk minus pkminus1+ κ(pkminus1)

)= κ(plowast)

κ(plowast) is the asymptotic convergence rate

Adequate nonlinear least-squares problems arecharacterized by the condition

κ(plowast) lt 1

SummaryThe Gauss-Newton method converges locally linearly foradequate and locally quadratically for compatible nonlinearleast-squares problems

Parameter Identification Susanna Roblitz 43

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Convergence monitorTermination criteria

Define

∆pk+1

= minusF prime(pk)+F (pk+1) (simplified Gauss-Newton corrections)

Stop iteration as soon as linear convergence dominates theiteration process

∆pk+1 asymp ∆pk+1 le XTOL

For incompatible problems in the convergent phase it holds

∆pk+1 asymp κ∆pk ∆pk+1 asymp κ2∆p

kThis provides an estimate for κ(plowast)

κ(plowast) asymp ∆pk+1∆pk

Parameter Identification Susanna Roblitz 44

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Convergence monitor

0 2 4 6 8 10 1210

minus15

10minus10

10minus5

100

k

ordinary GN correctionssimplified GN corrections

Typical scissor between ordinary and simplified Gauss-Newtoncorrections in the linear convergence phase

Parameter Identification Susanna Roblitz 45

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Summary

Whenever the ordinary Gauss-Newton method with full rankJacobian fails to converge then one should rather improve themodel (change equations or initial guess p0) than just turn toa different iterative solver In the convergent butrank-deficient case one should try to get more information byacquiring better data

Parameter Identification Susanna Roblitz 46

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Identifiability

interpretation of result in case of rank deficient Jacobian

Let plowast be the final iterate and rank(F prime(plowast)) = n lt q

QR factorization with column pivoting and rank decision

QTF prime(plowast)Π = R

rArr reordering of parameters to (plowastπ1 plowastπ2

plowastπn plowastπq) in

such a way that

|r11| ge |r22| ge middot middot middot ge |rnn|

rarr parameters (plowastπ1 plowastπ2

plowastπn) can actually be estimatedfrom the current dataset

Parameter Identification Susanna Roblitz 47

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Parameter transformations

p constrained parametersu unconstrained parametersφ transformation function

p = φ(u) F (p) = F (φ(u)) = F (u) F prime(u) = F prime(p)φprime(u)

constraint transformation φ(u)p gt 0 p = exp(u)A le p le B p = A + BminusA

2(1 + sin u)

p le C p = C +(1minusradic

1 + u2)

p ge C p = C minus(1minusradic

1 + u2)

Parameter Identification Susanna Roblitz 48

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Levenberg-Marquardt method

Idea local linearization only reasonable in a smallneighborhood of pk

F (pk) + F prime(pk)∆pk22 + ρ∆pk2

2 = min

(F prime(pk)TF prime(pk) + ρIq)∆pk = minusF prime(pk)TF (pk)

I ρrarr 0 Levenberg-Marquardt rarr Gauss-Newton

I ρ gt 0 (F prime(pk)TF prime(pk) + ρIq) always regular rarr maskingof uniqueness of a given solution incorrect interpretationof numerical results solutions are often not minima ofg(p) but merely saddle points with g prime(p) = 0

Parameter Identification Susanna Roblitz 49

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Scaling

I row (measurement) scaling

F (p) = Dminus1rowF (p) Drow = diag(δz1 δzM)

F prime(pk)∆pk+F (pk) = minhArr Dminus1row(F prime(pk)∆pk+F (pk)) = min

Different scaling leads to a different solutionI column (parameter) scaling

p = Dcolp H(p) = H(Dminus1col p) = F (p) H prime(p) = F prime(p)Dcol

assume pk = Dcolpk

∆pk = minusH prime(pk)+H(pk) = minus(F prime(pk)Dcol)+F (pk)

F prime(pk ) full rank= minusDminus1

col F prime(pk)+F (pk) = Dminus1col ∆pk

No invariance in the rank deficient case

Parameter Identification Susanna Roblitz 50

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Globalization

Global (or damped) Gauss-Newton method

F prime(pk)∆pk = minusF (pk) pk+1 = pk + λk∆pk 0 lt λk le 1

How to choose λk

Remember we want to solve F prime(p)+F (p) = 0

Idea choose λk in such a way that F prime(pk)+F (pk) decaysalong the iterates (theory of level function Gauss-Newton path)

damping factors λk are determined via a predictor-correctorstrategy such that they satisfy the natural monotonicity test

∆pk+1

(λk)∆pk lt 1

divergence criterion λk lt λmin 1

Parameter Identification Susanna Roblitz 51

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Codes

I NLSQ-ERR ERRor-oriented Newton method forNonlinear Least-SQuares

I NLSCON solver for Nonlinear Least-Squares withCONstraints

Parameter Identification Susanna Roblitz 52

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Outline

1 Introductory Example

2 Problem Formulation

3 The Bayesian Approach

4 The Local Optimization Approach

41 Linear Least Squares Problems42 Nonlinear Least Squares Problems43 Specification for ODE Models

Parameter Identification Susanna Roblitz 53

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Problem formulation

d species q (unknown) parameters

model y prime = f (y p) y(0) = y0 with y isin Rd p isin Rq

set of experimental data(t1 z1 δz1) (tM zM δzM) tj isin R zj isin Rd

with measurement tolerances δzj isin Rd Di = diag(δzi)

F (p) =

Dminus11 (y(t1 p)minus z1)

Dminus1

M (y(tM p)minus zM)

F prime(p) =

Dminus11 yp(t1 p)

Dminus1

M yp(tM p)

Gauss-Newton method

∆pk = minusF prime(pk)+F (pk) pk+1 = pk + λk∆pk

Parameter Identification Susanna Roblitz 54

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Computation of F (p)

requires solution of y prime = f (y p) at time points t1 tM

Problem time points chosen by adaptive step-size controlgenerally do not agree with t1 tM

Solution use integrator with dense output

Idea construction of interpolation polynomial based on theavailable solution such that accuracy is not destroyed

Example LIMEX has a dense output based on Hermiteinterpolation

Parameter Identification Susanna Roblitz 55

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Sensitivity analysis

I sensitivity matrix at time t

S(t) =party

partp(t) = yp(t) isin Rdtimesq

F prime(p) =

Dminus11 S(t1)

Dminus1

M S(tM)

I scaled sensitivity matrix (necessary whenever species or

parameters cover a broad range of physical units)

Sij(t) =partyipartpj

(t) middot pscal

yscal

pscal yscal user specified scaling values

Parameter Identification Susanna Roblitz 56

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Computation of sensitivities

Differentiate y prime = f (y p) with respect to p to obtain thevariational equation

y primep = fy (y p) middot yp + fp(y p) yp(0) = 0

Compact notation

S prime = fy middot S + fp S = [s1 sq] sk isin Rd

In practice solve combined systemy prime

s prime1

s primeq

=

f

fy middot s1 + fp1

fy middot sq + fpq

isin R(q+1)d

can be solved efficiently by LIMEX (inexact Jacobian decoupling)

Parameter Identification Susanna Roblitz 57

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Example

k F (pk ) ∆pk λk rank

0 481e-02 355e-01 21 111e-01 457e-01 1000 2

255e-02 175e-01 03892 104e-02 422e-02 1000 23 916e-04 190e-03 1000 24 800e-06 199e-05 1000 25 2321e-07 154e-09 1000 2

plowast = (klowast1 klowast2 ) = (010000001 019999997) rArr klowast21 = 19999994

κ = 0008 sc(F prime(plowast)) = 63

k F (pk ) ∆pk λk rank

0 385e-02 296e-02 11 240e-03 185e-03 1000 12 111e-05 767e-06 1000 13 667e-06 675e-11 1000 1

plowastlowast = (klowastlowast1 klowastlowast2 ) = (024155702 048312906) rArr klowastlowast21 = 20000622

κ = 0004 sc(F prime(plowast)) = 45times 106

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

0 10 20 300

05

1

15

2

time

concentr

ation

A B C D

Parameter Identification Susanna Roblitz 58

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

BioPARKIN

integrated software environment for simulation and parameteridentification httpbioparkinzibde

Parameter Identification Susanna Roblitz 59

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

BioPARKIN

I SBML import and export

I deterministic simulation methods (LIMEX others willfollow)

I sensitivity analysis based on variational equations

I parameter estimation via Gauss-Newton methods(tunable possible scaling of species and parameters)

I output of information on identifiability of parameters

I time-shifting of data and concatination of IVPs formulti-experiment simulations

Parameter Identification Susanna Roblitz 60

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61

Thank you for your attention

ContactZuse Institute Berlin

Computational Systems Biology Grouphttpwwwzibdeennumerikcsbhtml

Parameter Identification Susanna Roblitz 61