Hessian Regularisation - Krotov-BFGS & GRAPE-Newton ... · PDF fileKrotov-BFGS, with RFO...

1
Hessian Regularisation Krotov-BFGS & GRAPE-Newton-Raphson Methods David Goodwin, Sean Wright, Ilya Kuprov. System 3 C F H +140 Hz -160 Hz Interaction parameters of a molecular group used in state transfer simulations on a system characterising the fragment of a fluorohydrocarbon molecule (magnetic induction of 9.4 Tesla). Introduction The evolution of a quantum system is characterised by the quantum Liouville equation in a vector representation of the density operator, t | ˆ ρ(t )i = -i ˆ ˆ L(t ) | ˆ ρ(t )i, with the general solution, | ˆ ρ(t )i = exp (0) -i t Z 0 ˆ ˆ L dt | ˆ ρ(t )i The Hamiltonian of a practical NMR quantum system can be split into two parts: ˆ H = ˆ H d + X k c (k ) n ˆ H k where ˆ H d is the drift or free Hamiltonian, and ˆ H k are the parts of the Hamiltonian that can be changed externally, called the control Hamiltonian. We assume that c (k ) n are piecewise-constant, so “slicing” the problem into discrete time intervals Δt . The density matrix can be calculated at a time T using the propagators ˆ U n : ρ(T )= ˆ U N ··· ˆ U 1 ρ 0 ˆ U 1 ··· ˆ U N , ˆ U n = exp -i Δt ˆ H d + X k c (k ) n ˆ H k !! The state of a magnetic resonance system can be controlled using a set of electromagnetic pulses. Design of these pulses may prove difficult for control of complicated systems; numerical optimisation methods can be used to find a maximum “overlap” between the desired state and the state produced by the set of pulses, with the pulse schedule being the parameter of the objective function in the optimisation problem. Here, the metric of “overlap” is indicated by the fidelity. This work will attempt to reproduce the krotov-BFGS optimisation method (Tannor 1992, Eitan 2011) and to develop a scalable Newton-Raphson optimisation method for the GRAPE method (de Fouquieres 2011) in the software toolbox Spinach (Hogben 2011). The work presented here finds that both of these methods need regularisation of the Hessian matrix (the matrix of second order derivatives) for optimal control of most systems. System 1 HH H H H H H H H H H 11 Hydrogen atoms in an initial state of a mixture of ˆ L + and ˆ L - ; target state all have ˆ L z . When lambda is greater or equal to 5 × 10 -4 Krotov-BFGS, with RFO regularisation, outperformed the standard Krotov-Tannor algorithm; reaching convergence in a smaller number of iterations. This is also true regardless of the pulse duration: The λ =1 -4 plot indicates that the algorithm failed to converge and should be investigated further in order to improve the stability of the algorithm. System 2 H Rotation of magnetisation on a Hydrogen atom. Gradient Calculation Optimisation problems using quasi-Newton methods (e.g. BFGS) require a gradient calculation. This is reduced to: I Propagate forwards from the source (current state ρ 0 ) I Propagate backwards from the target state σ I Compute the expectation of the derivative: * σ ˆ U N ˆ U N -1 ··· c (k ) n=t ˆ U n=t ··· ˆ U 2 U 1 ρ 0 + Expectation of Second Order Derivatives The Newton-Raphson method is a second order method, additionally requiring the explicit calculation of the Hessian matrix. This requires the expectation of the second order derivatives: σ ˆ U N ˆ U N -1 ··· 2 c (k ) n=t 2 ˆ U n=t ··· ˆ U 2 U 1 ρ 0 σ ˆ U N ˆ U N -1 ··· c (k ) n=t +1 ˆ U n=t +1 c (k ) n=t ˆ U n=t ··· ˆ U 2 U 1 ρ 0 σ ˆ U N ˆ U N -1 ··· c (k ) n=t 2 ˆ U n=t 2 ··· c (k ) n=t 1 ˆ U n=t 1 ··· ˆ U 2 U 1 ρ 0 I Computation to scale with O (n ) by storing propagators from gradient calculation. I Problem now reduces to finding n second-order derivatives on the block diagonal of the Hessian. Hessian Element Calculations: Augmented Exponentials I based on Sophie Schirmer’s method (unpublished) for calculating gradients based on an augmented exponential (Van Loan 1978). exp -i ˆ ˆ LΔt -i ˆ ˆ H (k 1 ) n 1 Δt 0 0 -i ˆ ˆ LΔt -i ˆ ˆ H (k 2 ) n 2 Δt 0 0 -i ˆ ˆ LΔt = e -i ˆ ˆ LΔt c (k 1 ) n 1 e -i ˆ ˆ LΔt 1 2 2 c (k 1 ) n 1 c (k 2 ) n 2 e -i ˆ ˆ LΔt 0 e -i ˆ ˆ LΔt c (k 2 ) n 2 e -i ˆ ˆ LΔt 0 0 e -i ˆ ˆ LΔt Krotov-BFGS GRAPE optimises the cost functional (fidelity) with respect to the control pulse sequence. The Krotov approach (Tannor 1992) is to find an iterative proceedure that guarantees improvement to the cost functional at every step. This is achieved by solving Euler-Lagrange equations: J = hψ (T )|σ ihσ |ψ (T )i- Re T Z 0 * χ d dt + i ˆ H 0 + i X k c (k ) (t ) ˆ H k ψ + - λ X k dt c (k ) (t ) 2 d dt |ψ n i = - i ˆ H 0 + X k c (k ) n (t ) ˆ H k ! |ψ n i c (k ) n (t )=c (k ) n-1 (t )+ 1 λ Im D χ n-1 ˆ H k ψ n E d dt |χ n i = - i ˆ H 0 + X k ˜ c (k ) n (t ) ˆ H k ! |χ n i ˜ c (k ) n (t ) =˜ c (k ) n-1 (t )+ 1 λ Im D χ n ˆ H k ψ n E where |ψ n (0)i≡|initial statei and |χ(T )i≡|target stateihtarget state|ψ (T )i. I de Fouquieres, Schirmer, Glaser, Kuprov; J. Mag. Res., 212, 412-417, (2011). I Hogben, Krzystyniak, Charnock, Hore, Kuprov; J. Mag. Res., 208, 179-194, (2011). I Van Loan; IEEE Trans., 23(3), 395–404, (1978). I Tannor, Kazakov, Orlov; Time-Dependent Quantum Molecular Dynamics, Plenum, 347-360, (1992). I Eitan, Mundt, Tannor; Phys. Rev. A, 83, 053426 (2011). The Hessian Matrix The Hessian Matrix is: I Symmetric, I Non-singular - one that is invertible, I Diagonally dominant. We must regularise the Hessian matrix so it is non-singular and well conditioned. EIG-1 Take the eigendecomposition of the Hessian matrix: H = Q ΛQ where Λ is a diagonal matrix of eigenvalues and Q is a matrix whose columns are the corresponding eigenvectors. λ min = max [0, - min(Λ)] H reg =Q (Λ + λ min ˆ I )Q TRM Similar to the eigendecomposition method above, except we introduce a constant δ ; the region of a radius we trust to give a Hessian that is sufficiently positive definite. H reg = Q (Λ + δλ min ˆ I )Q RFO This is the same as the TRM method, except we make the eigendecomposition of an augmented Hessian: H aug = δ 2 H δ~ g δ~ g 0 = Q ΛQ where ~ g is the gradient (vector is first order derivatives). SVD Similar to the eigendecomposition method, except we regularise according to the singular values of the Hessian matrix.

Transcript of Hessian Regularisation - Krotov-BFGS & GRAPE-Newton ... · PDF fileKrotov-BFGS, with RFO...

Page 1: Hessian Regularisation - Krotov-BFGS & GRAPE-Newton ... · PDF fileKrotov-BFGS, with RFO regularisation, ... Optimisation problems using quasi-Newton methods ... I based on Sophie

HessianRegularisationKrotov-BFGS&GRAPE-Newton-RaphsonMethods

David Goodwin, Sean Wright, Ilya Kuprov.

System 3

C

F

H

+14

0Hz

−160

Hz

Interaction parameters of a molecular group used instate transfer simulations on a system characterising thefragment of a fluorohydrocarbon molecule (magneticinduction of 9.4 Tesla).

IntroductionThe evolution of a quantum system is characterised by the quantumLiouville equation in a vector representation of the density operator,∂∂t |ρ(t)〉 = −i ˆL(t) |ρ(t)〉, with the general solution,

|ρ(t)〉 = exp(0)

−i t∫0

ˆL dt

|ρ(t)〉

The Hamiltonian of a practical NMR quantum system can be split intotwo parts:

H = Hd +∑k

c (k)n Hk

where Hd is the drift or free Hamiltonian, and Hk are the parts of theHamiltonian that can be changed externally, called the controlHamiltonian.

We assume that c(k)n are piecewise-constant, so “slicing” the problem

into discrete time intervals ∆t. The density matrix can be calculated at

a time T using the propagators Un:

ρ(T ) = UN · · · U1ρ0U†1 · · · U

†N, Un = exp

(−i∆t

(Hd +

∑k

c (k)n Hk

))The state of a magnetic resonance system can be controlled using a setof electromagnetic pulses. Design of these pulses may prove difficult forcontrol of complicated systems; numerical optimisation methods can beused to find a maximum “overlap” between the desired state and thestate produced by the set of pulses, with the pulse schedule being theparameter of the objective function in the optimisation problem. Here,the metric of “overlap” is indicated by the fidelity.

This work will attempt to reproduce the krotov-BFGS optimisationmethod (Tannor 1992, Eitan 2011) and to develop a scalableNewton-Raphson optimisation method for the GRAPE method (deFouquieres 2011) in the software toolbox Spinach (Hogben 2011).

The work presented here finds that both of these methods needregularisation of the Hessian matrix (the matrix of second orderderivatives) for optimal control of most systems.

System 1

H H H H H HHHHHH11 Hydrogen atoms in an initial state of a mixture of L+

and L−; target state all have Lz .

When lambda is greater or equal to 5× 10−4

Krotov-BFGS, with RFO regularisation, outperformedthe standard Krotov-Tannor algorithm; reachingconvergence in a smaller number of iterations. This isalso true regardless of the pulse duration:

The λ = 1−4 plot indicates that the algorithm failed to converge

and should be investigated further in order to improve the stability

of the algorithm.

System 2

HRotation of magnetisation on a Hydrogen atom.

Gradient CalculationOptimisation problems using quasi-Newton methods (e.g. BFGS) require a gradientcalculation. This is reduced to:I Propagate forwards from the source (current state ρ0)I Propagate backwards from the target state σI Compute the expectation of the derivative:⟨

σ

∣∣∣∣∣UN UN−1 · · ·∂

∂c(k)n=t

Un=t · · · U2U1

∣∣∣∣∣ρ0

Expectation of Second Order DerivativesThe Newton-Raphson method is a second order method, additionally requiring the explicitcalculation of the Hessian matrix. This requires the expectation of the second orderderivatives: ⟨

σ

∣∣∣∣UN UN−1 · · · ∂2

∂c(k)n=t

2Un=t · · · U2U1

∣∣∣∣ρ0

⟩⟨σ

∣∣∣∣UN UN−1 · · · ∂

∂c(k)n=t+1

Un=t+1∂

∂c(k)n=t

Un=t · · · U2U1

∣∣∣∣ρ0

⟩⟨σ

∣∣∣∣UN UN−1 · · · ∂

∂c(k)n=t2

Un=t2 · · · ∂

∂c(k)n=t1

Un=t1 · · · U2U1

∣∣∣∣ρ0

⟩I Computation to scale with O(n) by storing propagators from gradient calculation.I Problem now reduces to finding n second-order derivatives on the block diagonal of the

Hessian.

Hessian Element Calculations: Augmented ExponentialsI based on Sophie Schirmer’s method (unpublished) for calculating gradients based on an

augmented exponential (Van Loan 1978).

exp

−iˆL∆t −i ˆ

H(k1)n1 ∆t 0

0 −i ˆL∆t −i ˆH

(k2)n2 ∆t

0 0 −i ˆL∆t

=

e−i

ˆL∆t ∂

∂c(k1)n1

e−iˆL∆t 1

2∂2

∂c(k1)n1 ∂c

(k2)n2

e−iˆL∆t

0 e−iˆL∆t ∂

∂c(k2)n2

e−iˆL∆t

0 0 e−iˆL∆t

Krotov-BFGSGRAPE optimises the cost functional (fidelity) with respect to the control pulse sequence.The Krotov approach (Tannor 1992) is to find an iterative proceedure that guaranteesimprovement to the cost functional at every step. This is achieved by solving Euler-Lagrangeequations:

J = 〈ψ(T )|σ〉〈σ|ψ(T )〉 − Re

T∫0

⟨χ

∣∣∣∣∣ ddt + iH0 + i∑k

c (k)(t)Hk

∣∣∣∣∣ψ⟩− λ

∑k

dt∥∥∥c (k)(t)

∥∥∥2

d

dt|ψn〉 =− i

(H0 +

∑k

c (k)n (t)Hk

)|ψn〉

c (k)n (t) =c

(k)n−1(t) +

1

λIm⟨χn−1

∣∣∣Hk

∣∣∣ψn

⟩d

dt|χn〉 =− i

(H0 +

∑k

c (k)n (t)Hk

)|χn〉

c (k)n (t) =c

(k)n−1(t) +

1

λIm⟨χn

∣∣∣Hk

∣∣∣ψn

⟩where |ψn(0)〉 ≡ |initial state〉 and |χ(T )〉 ≡ |target state〉〈target state|ψ(T )〉.

I de Fouquieres, Schirmer, Glaser, Kuprov; J. Mag. Res., 212, 412-417, (2011).

I Hogben, Krzystyniak, Charnock, Hore, Kuprov; J. Mag. Res., 208, 179-194, (2011).

I Van Loan; IEEE Trans., 23(3), 395–404, (1978).

I Tannor, Kazakov, Orlov; Time-Dependent Quantum Molecular Dynamics, Plenum, 347-360, (1992).

I Eitan, Mundt, Tannor; Phys. Rev. A, 83, 053426 (2011).

The Hessian MatrixThe Hessian Matrix is:I Symmetric,I Non-singular - one that is invertible,I Diagonally dominant.We must regularise the Hessian matrix so it isnon-singular and well conditioned.

EIG-1Take the eigendecomposition of the Hessianmatrix:

H = QΛQ†

where Λ is a diagonal matrix of eigenvaluesand Q is a matrix whose columns are thecorresponding eigenvectors.

λmin = max [0,−min(Λ)]

Hreg =Q(Λ + λminI )Q†

TRMSimilar to the eigendecomposition methodabove, except we introduce a constant δ; theregion of a radius we trust to give a Hessianthat is sufficiently positive definite.

Hreg = Q(Λ + δλminI )Q†

RFOThis is the same as the TRM method, exceptwe make the eigendecomposition of anaugmented Hessian:

Haug =

[δ2H δ~gδ~g 0

]= QΛQ†

where ~g is the gradient (vector is first orderderivatives).

SVDSimilar to the eigendecomposition method,except we regularise according to the singularvalues of the Hessian matrix.