arXiv:2111.02590v1 [physics.chem-ph] 4 Nov 2021

14
Applying generalized variational principles to excited-state-specific complete active space self-consistent field theory Rebecca Hanscam 1 and Eric Neuscamman 1,2, a) 1) Department of Chemistry, University of California, Berkeley, California 94720, USA 2) Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA (Dated: 5 November 2021) We employ a generalized variational principle to improve the stability, reliability, and precision of fully excited-state- specific complete active space self-consistent field theory. Compared to previous approaches that similarly seek to tailor this ansatz’s orbitals and configuration interaction expansion for an individual excited state, we find the present approach to be more resistant to root flipping and better at achieving tight convergence to an energy stationary point. Unlike state- averaging, this approach allows orbital shapes to be optimal for individual excited states, which is especially important for charge transfer states and some doubly excited states. We demonstrate the convergence and state-targeting abilities of this method in LiH, ozone, and MgO, showing in the latter that it is capable of finding three excited state energy stationary points that no previous method has been able to locate. I. INTRODUCTION Whether one looks at carotenoids, 1–3 photochemical isomerization, 4–6 or transition metal oxide diatomics, 7–9 molecular excited states often display wavefunction charac- teristics that go beyond the simplifying assumptions of mean field theory. From the right perspective, this fact is not that surprising, as it is the widening of the HOMO-LUMO gap that helps determine ground state equilibrium geometries and ensure the validity of mean field theory. Upon excitation, a molecule may be far from the excited state’s equilibrium ge- ometry, and in any case there is no longer the HOMO-LUMO gap to prevent near-degeneracies between different fillings of the molecular orbital diagram that may be important for the state under study. The result is that methods like time- dependent density functional theory and equation-of-motion coupled cluster theory that perturb around the mean field limit, while extremely useful in many excited state contexts, are qualitatively inappropriate in many others. Instead, methods that explicitly engage with the strongly multi-configurational nature of these states are called for. Ideally, these methods would be equally capable for excited states as they are for ground states, but, as in so many areas of electronic structure theory, the current reality is that they are not. For decades, multi-configurational photochemical investi- gations have been supported by complete active space self consistent field (CASSCF) theory, 10–13 but the approxima- tions introduced in its most common incarnations can cause challenges when treating high-lying states or states with widely varying characters. In particular, the state averaging (SA) approach – in which one finds the orbitals that mini- mize the average energy of multiple configuration interaction (CI) roots – makes the assumption that all states of interest can be constructed to a similar degree of accuracy with one shared set of orbitals. 14 This approximation offers important advantages and has long been a standard and successful ap- a) Electronic mail: [email protected] proach to excited states in CASSCF, 15–20 but it can also cre- ate a number of difficulties. Most obviously, it is less ap- propriate in cases where different states require significantly different orbital relaxations, as occurs in molecules bearing both local and charge transfer (CT) excitations. Indeed, SA- CASSCF relative energies during nuclear motion on an charge transfer excitations’ surface can be in error by 10 kcal/mol or more. 21 Further, the state averaging method links all of the states together so that if one state is not well served by the chosen active space and displays a non-analytic point on its energy surface, all states, even those well-served by the ac- tive space, will show cusps or discontinuities on their energy surfaces. Finally, because it is only the average energy that is made stationary with respect to the wavefunction variables, evaluating nuclear energy gradients for geometry optimiza- tion or dynamics requires solving difficult response equations which are indeed approximated in some implementations. 22,23 In ground state CASSCF, by contrast, the state’s energy is stationary already and nuclear gradient evaluations are much more straightforward. So, although state averaging has been and will continue to be a powerful asset to quantum chemi- cal investigation, there are many reasons why and many set- tings in which a fully excited-state-specific CASSCF would be valuable. Looking at the wider world of excited state theory, there has been remarkable progress in formulating fully state-specific methods in recent years, which augurs well for progress in this direction in CASSCF theory. Examples of this progress include work in variational Monte Carlo, 24–26 variance-based self-consistent field (SCF) theory, 27,28 more robust level shift- ing approaches in SCF methods, 29 core spectroscopy, 30–33 and perturbation theory. 34 Especially relevant to the current study is the “WΓ” approach to state-specific CASSCF (SS- CASSCF), 35 in which an approximate variational principle and density matrix information are used to carefully follow a particular CI root during a two-step optimization that goes back and forth between orbital relaxation steps and CI di- agonalization steps. The WΓ approach proved capable of overcoming root flipping in a wider variety of situations than readily-available alternatives, improving CASPT2 energies arXiv:2111.02590v1 [physics.chem-ph] 4 Nov 2021

Transcript of arXiv:2111.02590v1 [physics.chem-ph] 4 Nov 2021

Applying generalized variational principles to excited-state-specificcomplete active space self-consistent field theory

Rebecca Hanscam1 and Eric Neuscamman1, 2, a)1)Department of Chemistry, University of California, Berkeley, California 94720, USA2)Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720,USA

(Dated: 5 November 2021)

We employ a generalized variational principle to improve the stability, reliability, and precision of fully excited-state-specific complete active space self-consistent field theory. Compared to previous approaches that similarly seek to tailorthis ansatz’s orbitals and configuration interaction expansion for an individual excited state, we find the present approachto be more resistant to root flipping and better at achieving tight convergence to an energy stationary point. Unlike state-averaging, this approach allows orbital shapes to be optimal for individual excited states, which is especially importantfor charge transfer states and some doubly excited states. We demonstrate the convergence and state-targeting abilitiesof this method in LiH, ozone, and MgO, showing in the latter that it is capable of finding three excited state energystationary points that no previous method has been able to locate.

I. INTRODUCTION

Whether one looks at carotenoids,1–3 photochemicalisomerization,4–6 or transition metal oxide diatomics,7–9

molecular excited states often display wavefunction charac-teristics that go beyond the simplifying assumptions of meanfield theory. From the right perspective, this fact is not thatsurprising, as it is the widening of the HOMO-LUMO gapthat helps determine ground state equilibrium geometries andensure the validity of mean field theory. Upon excitation, amolecule may be far from the excited state’s equilibrium ge-ometry, and in any case there is no longer the HOMO-LUMOgap to prevent near-degeneracies between different fillingsof the molecular orbital diagram that may be important forthe state under study. The result is that methods like time-dependent density functional theory and equation-of-motioncoupled cluster theory that perturb around the mean field limit,while extremely useful in many excited state contexts, arequalitatively inappropriate in many others. Instead, methodsthat explicitly engage with the strongly multi-configurationalnature of these states are called for. Ideally, these methodswould be equally capable for excited states as they are forground states, but, as in so many areas of electronic structuretheory, the current reality is that they are not.

For decades, multi-configurational photochemical investi-gations have been supported by complete active space selfconsistent field (CASSCF) theory,10–13 but the approxima-tions introduced in its most common incarnations can causechallenges when treating high-lying states or states withwidely varying characters. In particular, the state averaging(SA) approach – in which one finds the orbitals that mini-mize the average energy of multiple configuration interaction(CI) roots – makes the assumption that all states of interestcan be constructed to a similar degree of accuracy with oneshared set of orbitals.14 This approximation offers importantadvantages and has long been a standard and successful ap-

a)Electronic mail: [email protected]

proach to excited states in CASSCF,15–20 but it can also cre-ate a number of difficulties. Most obviously, it is less ap-propriate in cases where different states require significantlydifferent orbital relaxations, as occurs in molecules bearingboth local and charge transfer (CT) excitations. Indeed, SA-CASSCF relative energies during nuclear motion on an chargetransfer excitations’ surface can be in error by 10 kcal/mol ormore.21 Further, the state averaging method links all of thestates together so that if one state is not well served by thechosen active space and displays a non-analytic point on itsenergy surface, all states, even those well-served by the ac-tive space, will show cusps or discontinuities on their energysurfaces. Finally, because it is only the average energy thatis made stationary with respect to the wavefunction variables,evaluating nuclear energy gradients for geometry optimiza-tion or dynamics requires solving difficult response equationswhich are indeed approximated in some implementations.22,23

In ground state CASSCF, by contrast, the state’s energy isstationary already and nuclear gradient evaluations are muchmore straightforward. So, although state averaging has beenand will continue to be a powerful asset to quantum chemi-cal investigation, there are many reasons why and many set-tings in which a fully excited-state-specific CASSCF wouldbe valuable.

Looking at the wider world of excited state theory, there hasbeen remarkable progress in formulating fully state-specificmethods in recent years, which augurs well for progress inthis direction in CASSCF theory. Examples of this progressinclude work in variational Monte Carlo,24–26 variance-basedself-consistent field (SCF) theory,27,28 more robust level shift-ing approaches in SCF methods,29 core spectroscopy,30–33

and perturbation theory.34 Especially relevant to the currentstudy is the “WΓ” approach to state-specific CASSCF (SS-CASSCF),35 in which an approximate variational principleand density matrix information are used to carefully followa particular CI root during a two-step optimization that goesback and forth between orbital relaxation steps and CI di-agonalization steps. The WΓ approach proved capable ofovercoming root flipping in a wider variety of situations thanreadily-available alternatives, improving CASPT2 energies

arX

iv:2

111.

0259

0v1

[ph

ysic

s.ch

em-p

h] 4

Nov

202

1

2

when compared to state-averaging, and in making qualitativeimprovements to some potential energy surfaces.21,35 How-ever, it was unable to locate at least one of the low-lying statesof MgO and, as a method that lacks coupling between orbitaland CI variables, it struggles to tightly converge stationarypoints. The method presented here proves more reliable whenfaced with root flipping and far superior at tight convergencethanks to its objective function and its coupling of orbital andCI parameters during optimization.

To understand how these advantages come about, let us turnto discussing recent progress in the use of quasi-Newton meth-ods to minimize energy-gradient-based objective functions,which has proven effective in the context of both the excitedstate mean field (ESMF) ansatz36,37 and Kohn-Sham ∆SCF.38

Essentially, the idea is to search for energy saddle points –which in full CI (FCI) would be the exact excited states –by minimizing the norm of the energy gradient with respectto the variational parameters. By relying on either an initialguess sufficiently close to the desired stationary point38 or ageneralized variational principle (GVP) that can use sought-after properties to steer an optimization towards that station-ary point,37 these approaches have proven capable of achiev-ing full excited-state-specificity while avoiding root flippingor variational collapse to lower states. While the work in thisdirection so far has mostly been focused on weakly correlatedexcited states, there is no formal barrier to applying the GVPapproach to the CASSCF ansatz, which is our focus here.

To perform excited-state-specific optimization of theCASSCF ansatz, we will minimize a GVP containing thesquare gradient norm by purely quasi-Newton descent, es-chewing CI diagonalization (except in generating a guess)and more traditional augmented Hessian approaches to orbitalrotations.39,40 Of course, it may be that a combination of allof these methods ultimately proves more efficient, as has re-cently been found for the ground state,41–43 but in this firstcombination of CASSCF with a GVP, we stick to pure quasi-Newton minimization for simplicity, and so our core computa-tional task is to evaluate gradients of an objective function thatcontains the square norm of the energy gradient. Recent workhas provided multiple ways forward here. On the one hand,automatic differentiation arguments guarantee that in mostscenarios, the requisite derivatives can be derived automati-cally and will have a cost that is a modest and constant multi-ple of the energy evaluation cost.36 In many cases, this guar-antee can motivate the derivation of analytic forms for thesederivatives,44 which are often even more efficient in practice,although not necessarily simple or easy to implement. Asan alternative, Hait and Head-Gordon have presented a cleverfinite-difference approach to these derivatives.38 Although fi-nite difference will incur some error relative to analytic orautomatic differentiation, their study of orbital optimizationshows that this error is small enough that it does not preventsuccessful convergence to excited state stationary points. Thekey benefit of this approach is that it requires only that the en-ergy gradient itself be available, and so is more convenient toimplement. Although it is possible that a fully analytic for-mulation of the energy gradient norm derivatives would im-prove the rate of quasi-Newton convergence by avoiding fi-

nite difference errors, we for simplicity adopt the finite dif-ference approach here and find that optimization remains ef-fective even when orbital and CI parameters are optimizedtogether. In future, it may be interesting to explore whethermore accurate analytic expressions improve numerical effi-ciency and whether mixtures with CI and augmented Hessianorbital optimizers are worthwhile, but already the present ap-proach to combining CASSCF with an excited state GVP al-lows us to succeed in situations where previous CASSCF ap-proaches fail.

II. THEORY

A. CASSCF Ansatz

The standard CASSCF ansatz10–13 has been the founda-tion for a wide range of CASSCF derived methods,43,45–51

and is the formulation used in the approach introduced here.CASSCF methods classify subsets of the molecular orbitals asclosed orbitals each occupied by two electrons, active orbitalswith varying occupation, and virtual orbitals that are com-pletely unoccupied. The CASSCF wavefunction is thereforecomposed of all possible electronic configurations within theactive orbitals, defining the active space. The wavefunctionmust also account for orbital relaxation effects as while rota-tions within the active space are described entirely by changesto the configuration (CI) coefficients, the virtual and closedorbitals remain excluded. To relax the orbital descriptions weincorporate an orbital rotation operator in the wavefunction,such that

|ΨCAS〉= eX∑

IcI |φI〉 (1)

where |φI〉 represents a Slater determinant and cI is the corre-sponding CI coefficient. The total number of Slater determi-nants, and thus CI variational parameters forming ~c, is deter-mined by the size of the active space.

For a finite basis of spatial orbitals, the operator X in Eq.(1) is given by

X =Nbasis

∑p<q

Xpq(a†

paq− a†qap). (2)

It is defined to be real and spin restricted, thereby ensuringthe orbital rotation operator U = eX is unitary and also spinrestricted.37,52 Note that only the upper triangle of the matrixX appears in Eq. (2), although it is often useful to considerthe full matrix, which is anti-Hermitian and thus defined bythe upper triangle. Additionally, rotations between orbitalswithin the active space do not affect the energy as they are re-dundant with the flexibility present in the CI expansion. Sim-ilarly, rotations within the closed and virtual orbital spaceshave no affect on the energy. Were these redundant parame-ters retained, the variable space would contain an infinite seamof energetic degeneracy, and so to avoid complications dur-ing numerical optimization, all redundant parameters are ex-cluded. This choice leads to Figure 1, which shows the blocks

3

Figure 1. Orbital rotation coefficient matrix X where thesolid shaded area represents nonzero variational parameters,and the striped region is the negative transpose.

of X that are included in the orbital variational parameter set~x. All together, our CASSCF wavefunction’s variational pa-rameters are the concatenated set~v = {~c,~x}.

B. Objective Function

1. Generalized Variational Principle

In FCI, when the energy is expressed as a function of theCI coefficients, the exact excited states are the energy saddlepoints of this function. Even in more approximate theories,the approximate ansatz’s saddle points are often good approx-imations to the excited states,36,53–55 and thus the focus of thepresent investigation is to find excited state energy station-ary points for the CASSCF ansatz. As these points are notenergy minima, gradient-based descent methods are likely tocollapse to lower states, and even non-gradient-based meth-ods like self-consistent field algorithms can display similardifficulties.54,55 To retain the convenience of minimization al-gorithms while avoiding this issue of variational collapse, wechoose objective functions that have the square norm of theenergy gradient as their centerpiece.

|∇~vE|2 = ∑i

∣∣∣∣∂E∂ci

∣∣∣∣2 + ∑j

∣∣∣∣ ∂E∂x j

∣∣∣∣2 (3)

In CASSCF, this gradient norm contains contributions fromboth the CI coefficient gradients and the orbital rotation gra-dients. It is positive semi-definite by construction, and, foran isolated energy saddle point, is expected to be surroundedby a basin of convergence that, if we can somehow get our-selves inside it, should allow a straightforward minimizationof |∇~vE|2 to bring us to the desired excited state energy sta-tionary point. It is important to note that when ∇~v|∇~vE|2 = 0it is possible that |∇~vE|2 6= 0, meaning that the square gra-dient norm has stationary points that are not energy station-ary points. In the results discussed below, such cases wereovercome through a combination of improved initial orbitalguesses and by incorporating additional properties within thegeneralized variational principle37 (GVP) to which we nowturn our attention.

With the norm of the energy gradient being zero for all en-ergy stationary points, we require some mechanism by whichthe desired excited state’s stationary point can be targeted. In

some cases, a good enough guess is available to place onewithin the appropriate basin of convergence, but in generalsuch a guess may not be available. To address this problem,we use a GVP approach to expand our objective function be-yond the square gradient norm so that other properties of theexcited state can help steer the optimization into the desiredconvergence basin.

Lµ = µ

∣∣∣~d ∣∣∣2 + (1−µ)∣∣∇~vE

∣∣2 (4)

In this objective function, ~d contains functions of the wave-function that should have values close to zero for the desiredexcited state, such as the difference 〈H〉−ω between the cur-rent wavefunction energy and a guess for the excited state’senergy. Thus, when µ is greater than zero and we minimizeLµ , the term containing ~d should help drive the optimizationtowards the energy stationary point belonging to the desiredexcited state. If the functions within ~d uniquely specify thestate (by which we mean the norm of ~d is smaller for thatexcited state than for any other energy stationary point), thenan optimization in which µ is gradually lowered to zero willarrive at the desired stationary point.37

The energy difference term 〈H〉−ω that we typically in-clude within ~d can be motivated as a useful approximation35,36

to the rigorous excited state variational principle

W =〈Ψ|(ω− H)2 |Ψ〉

〈Ψ|Ψ〉≈(〈H〉−ω

)2, (5)

which if evaluated exactly has its global minimum at theHamiltonian eigenstate whose energy is closest to ω .56,57 Ofcourse, many other properties and functions of the wavefunc-tion can also be useful in specifying the desired state throughthe vector ~d. For example, if we knew that it should ideallybe orthogonal to another nearby state |Φ〉 and should have adipole moment ~µ (not to be confused with the weighted av-erage parameter µ above) of about ~µ0, we might use ~d ={〈H〉−ω, 〈Ψ|Φ〉 , |~µ −~µ0| } to guide our optimization intothe desired basin of convergence, at which point µ can be re-duced to zero so that, in the final stage of optimization, min-imization of the energy gradient square norm brings us to thedesired stationary point. It is important to recognize that thefunctions employed within ~d need not be exact, as their onlypurpose is to get us into the right basin of convergence, afterwhich they have no further effect. A good example of wherethis flexibility can be exploited is seen in our results on ozone,where we use a simple approximation for the overlap withanother state to help one of our optimizations converge cor-rectly. Evaluating that overlap exactly would be an exercise innon-orthogonal CI (NOCI),58–60 but in this case a simple dotproduct between CI vectors (which neglects differences in themolecular orbitals) is free by comparison and a good enoughnudge to guide the optimization to the desired stationary pointin the face of a tricky near-degeneracy.

4

2. Objective Function Gradient

To minimize our objective function via gradient descent, wewill need an expression for its gradient. When ~d = {〈H〉−ω},this gradient is

∇~vLµ = 2µ(E−ω)∇~vE + (1−µ)∇~v|∇~vE|2. (6)

In CASSCF, the energy gradient with respect to the full varia-tional parameter set ∇~vE can be split into the energy gradientwith respect to the CI parameters ∇~cE and the energy gradi-ent with respect to the orbital rotation parameters ∇~xE. In thiswork, we use the analytic expression for the CI gradient

∇~cE =∂E∂~c

=2(H−E)~c~cT ·~c

(7)

where H is the Hamiltonian matrix in the CI basis. For theorbital energy gradient, we apply automatic differentiation36

to the AO-to-MO integral transforms, although this is sim-ply for convenience, and a production-level implementa-tion would certainly implement the orbital gradient byhand.11,13,27,38,44,61

By far the most computationally challenging term in Eq. (6)is the derivative of the squared norm of the energy gradientwith respect to the variational parameters,

∂v j|∇~vE|2 =

∂v j∑

i

∣∣∣∣∂E∂vi

∣∣∣∣2 = 2∑iHi j

∂E∂vi

. (8)

The Hessian matrix of energy second derivatives Hi j ≡ ∂ 2E∂vi∂v j

is expensive to evaluate, and we certainly do not wish to con-struct it explicitly. While it is possible to use automatic dif-ferentiation to evaluate this term,36 for ease of implementa-tion we instead turn to a central finite difference method thatHait and Head-Gordon have shown to be effective for excitedstate orbital optimization.38 Using a directional finite differ-ence of the energy gradient with a chosen perturbation ofδ~v = λ∇~vE

∣∣~v=~v0

yields the approximate expression

∇~v |∇~vE|2 = 1λ

(∇~vE

∣∣~v=~v0+δ~v−∇~vE

∣∣~v=~v0−δ~v

)+O

2(

∇~vE∣∣~v=~v0

)3).

(9)

This approach avoids the computationally demandingHessian-gradient contraction in Eq. (8), replacing it with mul-tiple evaluations of the energy gradient. Automatic differenti-ation – as its cost is typically 2-3 times the cost of the function– should be able to deliver a fully analytic version of this ap-proach with zero finite difference error at a similar price, ashas been achieved for ESMF. Further, a hand-implementedanalytic version could be even faster. Thus, it may be worthinvestigating in future whether the removal of the small fi-nite difference error leads to a significant improvement in op-timization efficiency. For the present study, however, we em-ploy Eq. (9) as is for both the orbital and CI variables togetherand find that it is sufficient for achieving tight energy station-ary point convergence.

A close inspection of Eq. (8) shows that, even if one appliesnaive steepest descent for minimizing the objective function,some coupling between the orbital and CI variables is presentdue to the energy Hessian. In practice, a quasi-Newton ap-proach that builds up an approximation to the objective func-tion Hessian will account for even more coupling betweenthese variable sets. Although it is too early to tell how wellthis approach to coupling works as compared to second-orderground state approaches,41,43 a quasi-Newton minimization ofour objective function certainly incorporates more couplingthan a simple two-step optimization35 in which one goes backand forth between optimizing the CI variables with the or-bitals held fixed and optimizing the orbitals with the CI vari-ables held fixed. In each step of quasi-Newton minimization,the effects of orbital changes on the CI energy gradient andCI changes on the orbital energy gradient are taken approxi-mately into account. The result is a dramatic improvement inthe method’s ability to tightly converge the energy gradient ascompared to the two-step WΓ approach that we compare to inour results below.

3. Approximate Objective Function Hessian

In this work, we use the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm62–65 to minimize theobjective function. Roughly speaking, L-BFGS takes aNewton-like step using an approximate Hessian. In partic-ular, this approximate Hessian is arrived at by using finite-differences between previous iterations’ objective functiongradients to improve upon some initial guess for the objectivefunction Hessian. This initial guess can be set to the iden-tity matrix for simplicity, but the speed of convergence canbe accelerated dramatically if a better guess for the Hessianis supplied, as has been demonstrated for objective functionslike ours in both the ∆SCF53 and ESMF66 contexts. Indeed,we find here as well that, although L-BFGS with an identityHessian guess is better at achieving tight convergence than anuncoupled two-step method as discussed in the previous sec-tion, the efficiency of the optimization is greatly improved byinstead employing an inexpensive approximation to the diag-onal of the actual Hessian as the guess.

Starting with the Hessian of the µ = 0 objective func-tion, (i.e. the second derivatives of the energy gradient squarenorm)

∂ 2

∂vk∂v j∑

i

∣∣∣∣∂E∂vi

∣∣∣∣2 = 2∑iHi jHik

+2∑i

(∂ 3E

∂vi∂v j∂vk

)∂E∂vi

,

(10)

we can anticipate that, due to its contraction with the energygradient, the role of the third derivative tensor will becomenegligible as the optimization approaches an energy station-ary point. Indeed, it has been observed empirically in both∆SCF53 and ESMF66 that dropping this term entirely doesnot much matter, and so we neglect it here as well. In thecase where ~d = {〈H〉−ω} and we now allow µ to be zero

5

or nonzero, this leaves us with the following approximate ex-pression for the objective function Hessian.

∂ 2Lµ

∂v j∂vk≈

[(E−ω)H jk +

∂E∂v j

∂E∂vk

]+2(1−µ)∑

iHi jHik

(11)

When not using the identity, we will use the diagonal of Eq.(11) as the approximate objective function Hessian that wesupply to L-BFGS. To make this approach affordable, we nowturn to choosing an efficient approximation for the energyHessian H.

We approximate the energy Hessian using a diagonal form,although we make different choices for how to deal with theCI block (denoted ccH) and the orbital block (denoted xxH).In the CI block, we make no approximation beyond omittingthe off-diagonal terms, leaving us with the same diagonal thatis used in the Davidson algorithm.67

ccHii =2(Hii−E)

~c ·~c(12)

For the diagonal of the orbital block, we define E−pq =(a†

paq− a†qap)

and arrive at the following expression.52

xxHpq,pq =∂ 2E

∂xpq∂xpq= 〈Ψ|

[E−pq,

[E−pq, H

]]|Ψ〉 (13)

Unlike the CI block, we now go beyond just dropping the off-diagonal terms by approximating the Hamiltonian inside thecommutators with the one-electron Fock operator built fromour CASSCF wavefunction’s one-body density matrix. Thesechoices for our approximate energy Hessian diagonal, whichare similar to those made in other contexts,53,66 combine withEq. (11) to provide L-BFGS with a much better guess thanthe identity for the objective function Hessian. This improvedguess comes at an additional computational cost that is signif-icantly less than the energy gradient evaluation we are alreadydoing, as it involves no two-electron AO-to-MO integral trans-forms and has a much simpler interaction with the CI vector.

C. Optimization Procedure

The overall quasi-Newton optimization procedure for ourGVP approach to excited state CASSCF is as follows.

1. An initial orbital basis and active space are chosen andan initial guess (typically taken from a CASCI calcu-lation) for the CI coefficients is selected. The orbitalrotation coefficients are initialized as zero and a valuefor ω is estimated from the energy of the initial inputs,results from other methods, or experimental data.

2. The set of variational parameters ~v = {~c,~x} are opti-mized all together via a series of L-BFGS minimiza-tions of Lµ for decreasing values of µ . We supply ei-ther the identity or the approximate objective function

Hessian discussed in the previous section as the initialguess for the L-BFGS Hessian. The initial µ value andconvergence threshold are set to 0.5 and |∇~vL|= 10−3,respectively. Within each micro-iteration of an L-BFGSminimization, the following tasks are completed.

(a) The gradient of the objective function ∇vLµ withrespect to the CI coefficients ~c is built from theanalytical expression in Equation 7 where thecontraction of the active space Hamiltonian withthe CI coefficient vector is performed utilizingPySCF’s68 existing direct CI functions.

(b) The gradient with respect to the orbital rotationcoefficients~x is calculated using the automatic dif-ferentiation framework within TensorFlow.69 Thescaling of this task is dominated by the AO-to-MOintegral transformations.

(c) The value of the finite difference λ is set to themaximum of {10−6, |∇~vE|} at each iteration, andthe objective function (Eq. (4)) and its gradient(Eq. (6)) are built at the cost of three gradient eval-uations of both ∇~cE and ∇~xE.

(d) If the approximate objective function Hessian (Eq.(11)) is in use, then it is built using the approxi-mate energy Hessian diagonal as discussed in theprevious section.

3. After each complete L-BFGS minimization (macro-iteration), µ is decreased by 0.1 and the convergencethreshold tightened by a factor of 10. Step 2 is repeateduntil µ = 0 and an overall CASSCF energy stationarypoint is located.

III. RESULTS AND DISCUSSION

In the following collection of molecular examples, we aimto answer the key question of how does the GVP approachcompare to other SS-CASSCF methods? Is the GVP able tofind the CASSCF energy stationary point that corresponds tothe initial CASCI root in the face of root-flipping? How doesthe convergence of the GVP approach compare to other SS-CASSCF methods, with and without the approximate diago-nal Hessian being provided to L-BFGS? Finally, are there sit-uations where the GVP can succeed when other SS-CASSCFmethods fail?

These questions were investigated in LiH, asymmetri-cally stretched O3, and MgO. The cc-pVDZ atomic orbitalbasis70,71 was used throughout. Both LiH and O3 used the HForbital basis for the initial guess, while MgO used the localdensity approximation (LDA) orbital basis. An initial CASCIcalculation was performed for each of these molecules and thetargeted root’s CASCI CI vector was used as the initial guessfor the CI coefficients. Values for ω were chosen using pastresults from other CASSCF calculations or estimated based onthe initial CASCI energy orderings. The first macro-iterationof each GVP optimization performed in this study held theCI parameters fixed while converging the orbital gradient to

6

|∇~xL|< 10−5, using the identity as the objective function Hes-sian guess. Beyond the first macro-iteration, all parameterswere optimized together with the approximate diagonal Hes-sian guess employed for all values of µ in all optimizationsin O3 and MgO, while the identity guess proved sufficient forthe optimizations in LiH.

In this study, we consider a stationary point converged inour GVP optimization when

∣∣∇~v|∇~vE|2∣∣ < 10−7, |∇~cE| <

10−6, and |∇~xE| < 10−6. For each of the molecules in thisstudy, the results of the GVP approach are compared to thoseof the WΓ and simple root selection (SRS) 2-step methods. InSRS, one selects the CI root to use in orbital optimization byalways taking the nth root from the energy-ordered CI roots,whereas WΓ uses an approximate variational principle andthe one-body density matrix to select the desired root.35 Forboth WΓ and SRS, neither of which has orbital-CI couplingin our implementation, we set looser convergence thresholdsbecause this lack of coupling prevents them from convergingto the same level of precision. For the change in energy, thenorm of the orbital gradient, and the norm of the change inthe one-electron density matrix, the WΓ thresholds were setto 10−7, 10−4, and 10−4 respectively. To check whether aloosely converged WΓ or SRS calculation corresponds to thesame stationary point as the GVP, we have therefore also usedour GVP approach to finalize their convergence. This finaliza-tion was never observed to alter the character of the wavefunc-tion, even in cases where a non-negligible energy change wasobserved during finalization. All molecular orbital analysiswas performed with the programs Gabedit72 and Molden.73

A. LiH

The ground state of LiH (X1Σ+) is ionic at it’s equilibriumbond length of 1.8 Å, but the first excited state (A1Σ+) ismostly neutral due to a HOMO-LUMO charge transfer ex-citation. However, as the bond is stretched, the ground statebecomes increasingly neutral while the first excited state be-comes more ionic. What makes this an especially interest-ing molecule to study in the present context is the avoidedcrossing that exists between the ground and first excited statesat intermediate bond lengths.74,75 The mixing of state char-acters in this region leads to a well known root flippingproblem14,35,37,74–76 that provides a good test for our GVP ap-proach.

Using an active space of 4 electrons in 4 orbitals (Li1s2s2pz, H 1s), Figure 2 demonstrates that SRS clearly suf-fers from the root flipping problem, causing it to struggle withconvergence and taking a comparatively large number of it-erations or failing altogether. Past work35 has shown that theWΓ method is able to overcome the root flipping problem bytracking the targeted root through the optimization, producingthe smooth potential energy surface seen in the top panel ofFigure 2. While the dissociation curves illustrate the agree-ment between WΓ and the GVP approaches across all geome-tries, they also highlight the improvement the GVP achieves inoverall convergence, in particular the magnitude of the orbitalgradients, by several orders of magnitude from both the SRS

Figure 2. The top panel shows potential energy surfaces forthe first excited state of LiH. The middle panel shows energyconvergence at a bond length of 2.6 Å relative to the GVP’sfinal tightly converged energy E. The bottom panel shows,again at 2.6 Å, the convergence of the norm of the energygradient. In the middle and bottom panels, the optimizationdetails are labeled for each macro-iteration of the GVP ap-proach, with ω = −7.9 Eh used at all macro-iterations at 2.6Å. The insets to the middle and bottom panels show the 2σ

and 3σ natural orbitals and corresponding occupation num-bers. At each geometry, SRS and WΓ converged the orbitalgradient to 10−4, while the GVP converged to 10−7.

7

and WΓ results. For a geometry of 2.6 Å, Table 1 shows verysimilar wavefunction character between the energy stationarypoint the GVP finds and the more loosely converged WΓ state.Both have strong overlap to the initial CASCI root and it isclear they are both describing the desired state, one is merelymore tightly converged than the other. Indeed, looking at theconvergence for this geometry in the bottom panels of Figure2, the GVP achieves an energy one millihartree closer to theFCI result than the other state-specific methods in a compa-rable number of Hamiltonian-CI vector contractions. This isespecially noteworthy given that in our LiH calculations, weused the identity as the initial Hessian guess in L-BFGS, sug-gesting that helpful orbital-CI coupling is indeed present inthe quasi-Newton approach even without the better Hessianstarting guess.

Table 1. Wavefunction character of the first excited stateA1Σ+ of LiH at a bond length of 2.6 Å.

ActivePrimary Space Electron Wavefunction Weight (%)

Excitations Configuration CASCI GVP WΓ

2σ → 3σ 1σ2 2σ 3σ 87 83 832σ2 → 3σ2 1σ2 3σ2 6 6 52σ2 → 3σ , 4σ 1σ2 3σ 4σ 4 6 6

Aufbau 1σ2 2σ2 3 6 6Overlap with CASCI Root: 1 0.96 0.99

B. Asymmetrical O3

We turn next to asymmetrically stretched ozone, which con-tains two excited states that are close to energetically degen-erate and so are especially challenging for state-specific op-timization. Indeed, at this particular geometry (RO1O2 = 1.3Å, RO2O3 = 1.8 Å, ∠O1O2O3 = 120°), the 41A” and 51A”states can switch order with each other and even strongly re-mix their primary configurations depending on the size of theactive space used and whether or not the orbitals are opti-mized state-specifically. We employ a 9-orbital, 12-electronactive space and freeze the six lower energy orbitals (whichare, roughly speaking, the O 1s and 2s orbitals). With thischoice, we do in fact observe a root flip: SS-CASSCF opti-mizations starting from the 4th and 5th 1A” CASCI roots findtwo different energy stationary points, but the stationary pointfound when starting from the 5th CASCI root (and which ismost similar in character to the 5th CASCI root) has a lowerenergy than the other stationary point, as displayed in Table 2.

As seen in Figure 3, the initial CASCI states (whenswapped in energy ordering) have very similar natural or-bital occupation patterns as the SS-CASSCF energy station-ary points, but a close inspection of the data in Table 2 sug-gests that the story is not entirely straightforward. Indeed,although the GVP optimization starting from the 5th CASCIroot converges tightly and without incident to an energy sta-tionary point, the final non-orthogonal-CI-style overlaps be-

Figure 3. Natural orbital occupation numbers for the 41A”and 51A” excited states of O3, calculated from the initialCASCI roots and using the WΓ and GVP approaches. Theinsets show the natural orbitals of each state as calculated bythe GVP.

tween this stationary point and the two CASCI roots (Table 2)show that a non-trivial remixing has occurred. The stationarypoint is still dominated by the CASCI root we started from(overlap 0.87), but contains a significant amount of the otherroot as well (overlap 0.41).

When attempting the GVP optimization starting from the4th CASCI root, the story is even less straightforward, withour first attempt at minimizing the GVP failing to find a sta-tionary point at all. While this difficulty eventually revealeditself to be an example of a bad initial wavefunction guess,this was not obvious until we had later found the 51A” station-ary point and could verify that, indeed, the CASCI guess waspretty far from the mark. So, in addition to being an inter-esting root flipping case, this ozone example also presented acase where the wavefunction guess and simple energy target-ing were not sufficient, and instead the ability of the GVP toincorporate other properties became important.

One property beyond energetics that we can exploit is thefact that different Hamiltonian eigenstates should be orthogo-nal to each other. When using state-specific optimization andan approximate ansatz, this property will not hold exactly, butshould hold approximately. To help find the 51A” stationarypoint, we therefore append an additional component to ~d that(approximately) measures the overlap between the wavefunc-tion being optimized and the converged GVP 41A” state. Ourexpanded targeting vector in our objective function is now

~d =

{〈H〉−ω,

~b ·~c|~c|

}(14)

in which ~c is the CI vector for the wavefunction being op-timized and ~b is the normalized CI vector for the converged41A” stationary point. The new component is only an approx-

8

Table 2. Wavefunction data for the 4th and 5th 1A" states in O3. Note that the 5th CASCI root ultimately optimizes to becomethe 41A" state, and so its data is presented under the 41A" heading in the left column, whereas the 4th CASCI root’s data ispresented on the right under the 51A" heading. The GVP data are for the stationary point found when starting from the CASCIroot shown under the same heading.

41A" Wavefunction Weight (%) 51A" Wavefunction Weight (%)Primary Excitations CASCI GVP CASCI GVP

9a’, 10a’ → 3a", 11a’ 67.3 65.7 5.4 19.62a" → 11a’ 1.8 6.0 41.0 40.1

9a’, 2a" → 3a"2 1.4 0.0 10.0 4.0Overlap with 4th 1A" CASCI root 0 0.41 1 0.66Overlap with 5th 1A" CASCI root 1 0.87 0 0.68

Energy (Eh) -224.258 -224.313 -224.265 -224.309

imation to the wavefunction overlap, of course, as it does notaccount for differences in the shapes of the molecular orbitalsin the two wavefunctions. However, we do not need it to beexact. We only need it to be good enough to push the opti-mization into the basin of convergence for the 51A” stationarypoint, so that when µ goes to zero in the final stage of GVPoptimization, correct convergence is achieved.

Using the expanded targeting vector from Eq. (14) led to asuccessful GVP optimization in which we again started fromthe 4th 1A” CASCI root, but this time converged successfullyto an energy stationary point for the 51A” state. As seen fromthe overlap data in Table 2, this stationary point is essentiallyan equal superposition of the 4th and 5th CASCI roots, re-vealing that the states remix strongly during state-specific or-bital relaxation and that the 4th CASCI root really was nota superb initial guess. In the end, the two energy stationarypoints that our GVP finds are made from different mixturesof the 4th and 5th CASCI roots, although with somewhat re-laxed orbitals. These stationary points are substantially dif-ferent from each other but not entirely orthogonal: their exactNOCI-style overlap with each other is 0.3, which is not hugebut is not zero either. Thus, although the GVP was success-fully able to find SS-CASSCF stationary points for both statesin this difficult case, the fact that the final stationary pointsare not as strongly orthogonal as we might like suggests thatthe chosen active space could do with enlargement, or at leastthat a non-orthogonal CI re-diagonalization of these stationarypoints may be worthwhile.

C. MgO

As our third and final example, we use the GVP to find SS-CASSCF energy stationary points corresponding to each ofthe eight lowest 1A1 CASCI roots in MgO at a bond length of1.8 Å and with an (8o, 8e) active space. The excited statesin MgO present a challenging array of multi-reference andcharge transfer character,77–79 as can be seen from an inspec-tion of Table 3 and Figures 5 and 6. Some states exhibit bothbehaviors at once, such as the CT2 state, which is a doubly-excited, double-charge-transfer state in which the most promi-nent electron configuration accounts for less than half thewavefunction. SS-CASSCF is an especially appropriate the-

ory in this setting, being able to deal with both the strong post-CT orbital relaxation and the multi-reference character thatso often comes along with double excitations. Previous workwith state-averaged CASSCF has investigated the lowest ex-cited state in MgO,80 and in principle dynamic weighting81

may be able to help in making predictions about the others,but the mix of neutral and ionic character in these states makesstandard state averaging hard to recommend, and if one wishesto take dynamic weighting to its limit, one is really asking forSS-CASSCF. However, even when SS-CASSCF is the goal,the method of optimization matters a great deal, with a previ-ous study showing that simple root selection fails to convergeto the initially targeted state in state-specific optimizations ofall seven of the lowest 1A1 excited states.35 Using a carefulanalysis based on NOCI overlaps, we find that, while the WΓ

optimization method is more effective, it still fails to locate anappropriate stationary point for three of these seven excitedstates. By adding the GVP approach to our toolbox, how-ever, we are able to find good energy stationary points for theground state and all seven excited states.

Before getting into the state-by state details, let us first em-phasize the value of supplying L-BFGS with our approximatediagonal form for the initial objective function Hessian as op-posed to the identity matrix. For this comparison, as for all theoptimizations in this section, our starting point is a particularroot from a CASCI calculation carried out in the LDA orbitalbasis (denoted as CASCI-LDA), with the active space chosenas the lowest four LDA orbitals of σ character plus the lowestfour of π character, as seen in Figure 6. These active orbitalscan be roughly characterized as the O 2s and 2p and the Mg3s, off-axis 3p, and 3dz2 orbitals. The Mg 1s, 2s, and 2p andthe O 1s orbitals are held closed but not frozen. As seen inFigure 4, employing our Hessian approximation speeds up theoptimization convergence for the V1 state by more than an or-der of magnitude relative to using the identity matrix. Similarspeed ups were observed for other states as well. There is stillclearly room for improvement, however, and so in future itwill be interesting to investigate combinations of GVP-basedL-BFGS with more standard tools like Davidson CI steps andmore traditional orbital optimizations.

Turning now to stability, we find that, with this new GVPoptimization method in hand, we can now find stationarypoints for all eight of the lowest 1A1 states, as shown in Ta-

9

Table 3. Wavefunction data for 1A1 states in MgO, listed from top to bottom in ascending orderof the CASCI-LDA energies. Labels (GS, M1, etc) are taken from a previous study.35 Thedata include the CASCI-LDA dipole moments µ , wavefunction weight percentages on majorcomponents (the sum of squared determinant coefficients for all determinants of the indicatedcharacter), the exact NOCI-style overlaps between the SS-CASSCF stationary points and theinitial CASCI-LDA wavefunctions, and the predicted excitation energies.

Wavefunction Weight % NOCI Overlap Excitation E (eV)State Label µ (D) Primary Excitations CASCI WΓ GVP WΓ GVP CASCI WΓ GVP

11A1 GS -3.95 Aufbau 76.5 81.9 81.9 0.95 0.95 0 0 06σ2 → 7σ2 12.1 10.9 10.9

21A1 M1 -5.396σ → 7σ 41.8 – 56.1

– 0.80 2.48 – 3.112π → 3π 25.2 – 1.06σ2 → 7σ2 15.1 – 38.4

31A1 V1 -4.88 2π → 3π 68.4 72.2 72.2 0.98 0.98 3.70 4.88 4.886σ , 2π → 7σ , 3π 22.3 20.0 20.0

41A1 V2 -5.93

6σ → 8σ 70.5 8.2 59.6

0.35 0.96 6.46 6.60 8.25

6σ2 → 7σ , 8σ 14.8 7.4 15.06σ , 2π → 3π , 8σ 5.3 0.0 4.3

2π → 3π 3.9 2.6 9.16σ , 2π → 7σ , 3π 2.1 0.6 5.7

Aufbau 0.4 48.6 1.06σ2 → 7σ2 0.4 24.4 0.5

51A1 CT1 3.84 2π2 → 7σ2 62.8 68.5 68.5 0.93 0.93 7.15 6.57 6.572π2 → 3π2 13.3 4.5 4.5

61A1 CT2 3.932π2 → 7σ2 30.2 44.1 44.1

0.91 0.91 7.62 7.30 7.306σ2 → 7σ2 16.9 14.7 14.76σ , 2π → 7σ , 3π 13.9 8.0 8.0

71A1 CT4 2.33 6σ , 2π → 7σ , 3π 47.1 70.7 50.8 0.30 0.90 8.07 11.65 8.456σ2, 2π → 7σ2, 3π 27.0 13.1 24.1

81A1 CT3 3.66

2π → 3π 19.0 14.8 7.9

0.91 0.88 8.16 8.39 8.54

6σ , 2π → 7σ , 3π 17.4 23.0 31.62π2 → 7σ2 16.6 12.2 17.36σ → 7σ 10.0 2.5 1.6

6σ2, 2π → 7σ2, 3π 8.6 9.7 9.92π2 → 3π2 5.9 5.4 6.06σ → 8σ 0.6 9.8 0.0

ble 3. The ground state is the simplest, and indeed all opti-mization methods – including GVP, WΓ, SRS, and the defaultPySCF ground state CASSCF solver – come to the same sta-tionary point. The lowest excited state (M1) is a more sig-nificant case, as no previous method has to our knowledgebeen able to locate the full (orbital + CI) energy stationarypoint for this state. Despite its careful root tracking approach,WΓ collapses to the ground state when trying to target theM1 state starting from the corresponding CASCI-LDA root.In contrast, GVP has no trouble with this state, finding a sta-tionary point that, based on its NOCI overlap with the start-ing CASCI-LDA root, clearly corresponds to the excited statebeing sought. Turning to the V1 and CT2 states, both GVPand WΓ work well, arriving at the same stationary points that,

again, have large overlaps with the CASCI-LDA excited statesused to initiate the optimizations and define which excitedstate we are after. The V2 and CT4 states both represent fail-ures for the WΓ approach, however, which was not obvious inthe previous study35 as a natural orbital occupation analysis(Figure 5) makes it appear that the stationary points arrivedat are a match for the states being sought. However, NOCIoverlaps, which we have now evaluated and which are a moredirect measure of wavefunction similarity, show that in bothV2 and CT4, WΓ converges to a stationary point that is ofa very different character than the excited state in question.GVP, on the other hand, finds stationary points for these statesthat have large overlaps with the starting CASCI-LDA rootsand so clearly match the states being sought. In CT1, we have

10

Figure 4. Convergence in terms of energy (top) and energygradient with respect to the variational parameters (bottom)vs micro-iterations for GVP optimizations of the V1 state ofMgO. Convergence when L-BFGS starts with our approxi-mate Hessian is shown with a solid line, while convergencewhen the identity is used instead is shown with a dashed line.Starting points for new macro-iterations are labeled. For bothoptimizations, the first macro-iteration (not shown) uses theidentity, µ = 0.5, and freezes the CI parameters to providesome initial orbital relaxation.

our one example in MgO in which the simplest use of the GVP(energy targeting only) fails to find a stationary point, the op-timization getting stuck at an energy gradient norm of roughly10−4. However, WΓ works in this case, and GVP can be im-proved either by expanding the vector ~d, as we did in the upperozone state, or by improving the initial guess, which is the ap-proach we take here. If we supply slightly better orbitals bytaking them from the output of the second macro-iteration ofWΓ (but still using the CASCI-LDA CI vector guess so as notto give GVP too much help) we find that the GVP optimiza-tion is able to converge to the same stationary point as foundby WΓ. The final state we are looking at, CT3, is an evenmore interesting case, in which WΓ and GVP find two differ-ent stationary points, both of which have strong overlap withthe sought after state. The difference between these stationarypoints is in the 8σ orbital, which in the GVP stationary pointhas O 3s character but in the WΓ stationary point has Mg 3dz2

Figure 5. Natural orbital occupation numbers for the firsteight 1A1 states in MgO, optimized starting from a CASCI-LDA guess with both the WΓ and GVP approaches. Frombottom to top, the states are displayed in ascending order ofthe CASCI-LDA energies, although note that due to orbitalrelaxation, this ordering is not maintained by SS-CASSCF.

character. Given their large overlaps with the initial CASCI-LDA root and their large overlap of 0.93 with each other, theyboth appear to be approximations of the same Hamiltonianeigenstate and thus a good example of how nonlinear wave-function forms can have more stationary points than there arephysical eigenstates. Rather than try to choose between them,we see this as a case that indicates the active space is, at leastfor this state, at least one orbital too small.

As in other types of CASSCF, multiple solutions can ex-ist when the highest energy active orbitals are only slightlyoccupied and it is possible to get similarly good wavefunc-tions when swapping one or more of them with low-lying vir-tual orbitals. This issue can cause multiple nearby minimain both ground state and SA-CASSCF, although it is entirelycase by case whether swaps between the least occupied ac-tive orbitals and the lowest virtual orbitals move the optimiza-tion between different local minima or simply move it aroundwithin the same basin of convergence surrounding a singleminimum. Our results for CT3 provide evidence that some-thing like the multiple-minima issue can occur for excitedstates in SS-CASSCF, with two very similar stationary pointsdiffering by a swap between low-lying virtuals and high-lying

11

Figure 6. The MgO active orbitals in the LDA guess (bottom row) and the SS-CASSCF stationary points for CT2 (middle row)and the ground state (top row). Each image has the Mg atom at left in green and text indicating the orbital’s primary character.

active orbitals. In the case of CT3, one might prefer the 3dz2

stationary point on the basis that it contains only valence or-bitals in its active space, but applying such logic in generalis not straightforward. Indeed, all optimization methods wehave tried (including the default implementation in PySCF)agree that, after state-specific optimization, the ground stateactive space displayed in Figure 6 contains orbitals with O3s, 3px, and 3py character, having swapped them in for theLDA Mg 3px, 3py, and 3dz2 valence orbitals that were presentin the the initial guess. What is essentially going on here isthat, if only a subset of the active orbitals need to have sig-nificant occupation in order to capture the strong correlationeffects in a given state, then, for that state, the choice for theremaining active orbitals that will give the lowest energy iswhichever ones provide the best ability to capture some weakcorrelation, and there is no particular reason that these will bevalence orbitals. In the ground state, it makes some sense forthe O 3-shell orbitals to be more effective for this purpose thanthe unoccupied Mg valence orbitals, as the ground state con-centrates the electrons on the O atom, putting a premium onorbitals that can help describe weak correlation effects in itsvicinity. Another well-known example of this issue, althoughnot in play here, is the double d-shell effect,82,83 where it is of-ten wise to include non-valence d orbitals in the active spacefor transition metal compounds ahead of some orbitals thatare formally valence orbitals. As in ground states or state av-eraging cases with multiple minima, the best approach to re-moving the ambiguity between CT3’s two stationary points isprobably to expand the active space. By doing so, the orbitalsthat are competing for inclusion in the active space and lead-ing to multiple stationary points can all be included, at whichpoint we expect the two stationary points would merge intoone. From an optimization perspective, this would amountto the two minima on the |∇~vE|2 surface joining into a sin-gle minimum with a single basin of convergence. Certainly

this must happen in the limit that the active space expandsCASSCF into FCI, but we suspect that in this case it will hap-pen immediately upon allowing both the O 3s and Mg 3dz2

orbitals to be in the active space simultaneously.

IV. CONCLUSION

We have shown that excited-state-specific optimization ofthe CASSCF ansatz via the minimization of a generalizedvariational principle allows the desired excited state stationarypoints to be located and tightly converged in multiple chal-lenging scenarios. The GVP consists of the square norm ofthe energy gradient along with a steering term that allows ap-proximately known properties of the desired state to guide theoptimization to its energy stationary point. The form permitsa very broad variety of properties to be employed, and in thisstudy we have used estimates for the energy and, in one partic-ularly challenging case, rough orthogonality against anotherstate for this purpose. By achieving state-specific optimiza-tion, the approach avoids key difficulties of state-averaged ap-proaches, especially in terms of orbital relaxations for statesof significantly different character from the ground state, suchas charge transfer and doubly excited states.

In our results, we find that the GVP approach is capableof converging to the correct stationary point in excited statesof LiH, ozone, and MgO in which root flipping is present.Its tighter convergence than uncoupled two-step methods pro-duces energies in LiH that are significantly closer to FCI, andits root-targeting capabilities allow it to match the efficacy ofthe recently developed WΓ method in a nearly degenerate pairof states in ozone. In MgO, it was not previously possibleto find the correct stationary points for three excited singletstates in the symmetric representation of the computationalpoint group. With the addition of the GVP approach, all three

12

of these missing stationary points have been found.Looking forward, there are a number of promising direc-

tions worth pursuing. First, this study limited itself to us-ing quasi-Newton optimization of the GVP objective function,which is illuminating but almost certainly not the most effi-cient approach given the historical dominance of the Davidsonalgorithm when dealing with CI coefficients. Methods thatcombine the flexibility and reliability of GVP minimizationwith the efficiency of Krylov subspace eigensolvers are thusa priority for future method development. Second, CASSCFenergetics are rarely quantitative due to a lack of treatment ofweak correlation effects. With the GVP approach able to pro-vide excited state stationary points in a wider range of casesthan was previously possible, it will be interesting to performmore extensive tests on what benefits this can offer to post-CASSCF weak correlation methods. Whatever these direc-tions uncover, it is becoming increasingly clear that it is pos-sible and often desirable to achieve fully excited-state-specificquantum chemistry in a wide variety of single-reference andmulti-reference methods.

V. ACKNOWLEDGEMENTS

This work was supported by the National Science Founda-tion’s CAREER program under Award Number 1848012.

Calculations were performed using the Berkeley ResearchComputing Savio cluster and the Lawrence Berkeley NationalLab Lawrencium cluster.

VI. REFERENCES

1Polívka, T.; Sundström, V. Ultrafast dynamics of carotenoid excitedstates— from solution to natural and artificial systems. Chem. Rev. 2004,104, 2021–2072.

2Brian, D.; Liu, Z.; Dunietz, B. D.; Geva, E.; Sun, X. Three-state har-monic models for photoinduced charge transfer. J. Chem. Phys. 2021, 154,174105.

3Frank, H. A.; Bautista, J. A.; Josue, J.; Pendon, Z.; Hiller, R. G.;Sharples, F. P.; Gosztola, D.; Wasielewski, M. R. Effect of the solvent envi-ronment on the spectroscopic properties and dynamics of the lowest excitedstates of carotenoids. J. Phys. Chem. B 2000, 104, 4569–4577.

4Bandara, H. D.; Burdette, S. C. Photoisomerization in different classes ofazobenzene. Chem. Soc. Rev. 2012, 41, 1809–1825.

5Polli, D.; Altoe, P.; Weingart, O.; Spillane, K. M.; Manzoni, C.; Brida, D.;Tomasello, G.; Orlandi, G.; Kukura, P.; Mathies, R. A. Conical intersectiondynamics of the primary photoisomerization event in vision. Nature 2010,467, 440–443.

6Zimmerman, G.; Chow, L.-Y.; Paik, U.-J. The photochemical isomerizationof azobenzene1. J. Am. Chem. Soc. 1958, 80, 3528–3531.

7Harrison, J. F. Electronic structure of diatomic molecules composed of afirst-row transition metal and main-group element (H-F). Chem. Rev. 2000,100, 679–716.

8Claveau, E. E.; Miliordos, E. Electronic structure of the dicationic first rowtransition metal oxides. Phys. Chem. Chem. Phys. 2021,

9Miliordos, E.; Mavridis, A. Electronic structure and bonding of the early3d-transition metal diatomic oxides and their ions: ScO, TiO, CrO, andMnO. J. Phys. Chem. A 2010, 114, 8536–8572.

10Ruedenberg, K.; Schmidt, M. W.; Gilbert, M. M.; Elbert, S. Are atoms in-trinsic to molecular electronic wavefunctions? I. The FORS model. Chem.Phys. 1982, 71, 41–49.

11Werner, H.; Knowles, P. J. A second order multiconfiguration SCF proce-dure with optimum convergence. J. Chem. Phys. 1985, 82, 5053–5063.

12Knowles, P. J.; Werner, H.-J. An efficient second-order MC SCF methodfor long configuration expansions. Chem. Phys. Lett. 1985, 115, 259–267.

13Roos, B. O. The complete active space self-consistent field method and itsapplications in electronic structure calculations. Adv. Chem. Phys. 1987, 69,399–445.

14Werner, H.; Meyer, W. A quadratically convergent MCSCF method for thesimultaneous optimization of several states. J. Chem. Phys. 1981, 74, 5794–5801.

15Bouabça, T.; Ben Amor, N.; Maynau, D.; Caffarel, M. A study of the fixed-node error in quantum Monte Carlo calculations of electronic transitions:The case of the singlet n→ π∗ (CO) transition of the acrolein. J. Chem.Phys. 2009, 130, 114107.

16Fdez. Galván, I.; Delcey, M. G.; Pedersen, T. B.; Aquilante, F.; Lindh, R.Analytical state-average complete-active-space self-consistent field nona-diabatic coupling vectors: Implementation with density-fitted two-electronintegrals and application to conical intersections. J. Chem. Theory Comput.2016, 12, 3636–3653.

17Gozem, S.; Melaccio, F.; Valentini, A.; Filatov, M.; Huix-Rotllant, M.;Ferré, N.; Frutos, L. M.; Angeli, C.; Krylov, A. I.; Granovsky, A. A. Shapeof multireference, equation-of-motion coupled-cluster, and density func-tional theory potential energy surfaces at a conical intersection. J. Chem.Theory Comput. 2014, 10, 3074–3084.

18Granovsky, A. A. Extended multi-configuration quasi-degenerate perturba-tion theory: The new approach to multi-state multi-reference perturbationtheory. J. Chem. Phys. 2011, 134, 214113.

19Malmqvist, P.-A.; Roos, B. O. The CASSCF state interaction method.Chem. Phys. Lett. 1989, 155, 189–194.

20Serrano-Andrés, L.; Merchán, M.; Lindh, R. Computation of conical in-tersections by using perturbation techniques. J. Chem. Phys. 2005, 122,104107.

21Tran, L. N.; Neuscamman, E. Improving Excited-State Potential EnergySurfaces via Optimal Orbital Shapes. J. Phys. Chem. A 2020, 124, 8273–8279.

22Lischka, H.; Dallos, M.; Shepard, R. Analytic MRCI gradient for excitedstates: formalism and application to the n− π∗ valence and n−(3s, 3p)Rydberg states of formaldehyde. Mol. Phys. 2002, 100, 1647–1658.

23Stålring, J.; Bernhardsson, A.; Lindh, R. Analytical gradients of a stateaverage MCSCF state and a state average diagnostic. Mol. Phys. 2001, 99,103–114.

24Bennett, M. C. High-accuracy electronic structure calculations with QMC-PACK. Nat. Rev. Phys. 2021, 1–1.

25Otis, L.; Craig, I.; Neuscamman, E. A hybrid approach to excited-state-specific variational Monte Carlo and doubly excited states. J. Chem. Phys.2020, 153, 234105.

26Pathak, S.; Busemeyer, B.; Rodrigues, J. N.; Wagner, L. K. Excited statesin variational Monte Carlo using a penalty method. J. Chem. Phys. 2021,154, 034101.

27Ye, H.-Z.; Welborn, M.; Ricke, N. D.; Van Voorhis, T. σ -SCF: A directenergy-targeting method to mean-field excited states. J. Chem. Phys. 2017,147, 214104.

28Ye, H.-Z.; Van Voorhis, T. Half-projected σ self-consistent field for elec-tronic excited states. J. Chem. Theory Comput. 2019, 15, 2954–2965.

29Carter-Fenk, K.; Herbert, J. M. State-targeted energy projection: A simpleand robust approach to orbital relaxation of non-aufbau self-consistent fieldsolutions. J. Chem. Theory Comput. 2020, 16, 5067–5082.

30Hait, D.; Haugen, E. A.; Yang, Z.; Oosterbaan, K. J.; Leone, S. R.; Head-Gordon, M. Accurate prediction of core-level spectra of radicals at densityfunctional theory cost via square gradient minimization and recoupling ofmixed configurations. J. Chem. Phys. 2020, 153, 134108.

31Hait, D.; Head-Gordon, M. Highly accurate prediction of core spectra ofmolecules at density functional theory cost: Attaining sub-electronvolt errorfrom a restricted open-shell Kohn–Sham approach. J. Phys. Chem. Lett.2020, 11, 775–786.

32Garner, S. M.; Neuscamman, E. A variational Monte Carlo approach forcore excitations. J. Chem. Phys. 2020, 153, 144108.

33Garner, S. M.; Neuscamman, E. Core excitations with excited state meanfield and perturbation theory. J. Chem. Phys. 2020, 153, 154102.

34Clune, R.; Shea, J. A. R.; Neuscamman, E. N-5-scaling excited-state-

13

specific perturbation theory. J. Chem. Theory Comput. 2020, 16, 6132–6141.

35Tran, L. N.; Shea, J. A. R.; Neuscamman, E. Tracking excited states in wavefunction optimization using density matrices and variational principles. J.Chem. Theory Comput. 2019, 15, 4790–4803.

36Shea, J. A. R.; Neuscamman, E. Communication: A mean field platformfor excited state quantum chemistry. J. Chem. Phys. 2018, 149.

37Shea, J. A. R.; Gwin, E.; Neuscamman, E. A generalized variational prin-ciple with applications to excited state mean field theory. J. Chem. TheoryComput. 2020, 16, 1526–1540.

38Hait, D.; Head-Gordon, M. Excited State Orbital Optimization via Mini-mizing the Square of the Gradient: General Approach and Application toSingly and Doubly Excited States via Density Functional Theory. J. Chem.Theory Comput. 2020, 16, 1699–1710.

39III, B. H. L. General second order MCSCF theory: A density matrix di-rected algorithm. J. Chem. Phys. 1980, 73, 382–390.

40Jørgensen, P.; Swanstrøm, P.; Yeager, D. L. Guaranteed convergencein ground state multiconfigurational self-consistent field calculations. J.Chem. Phys. 1983, 78, 347–356.

41Kreplin, D. A.; Knowles, P. J.; Werner, H. J. Second-order MCSCF opti-mization revisited. I. Improved algorithms for fast and robust second-orderCASSCF convergence. J. Chem. Phys. 2019, 150.

42Kreplin, D. A.; Knowles, P. J.; Werner, H. J. MCSCF optimization revis-ited. II. Combined first- and second-order orbital optimization for largemolecules. J. Chem. Phys. 2020, 152.

43Sun, Q. M.; Yang, J.; Chan, G. K. L. A general second order complete activespace self-consistent-field solver for large-scale systems. Chem. Phys. Lett.2017, 683, 291–299.

44Zhao, L.; Neuscamman, E. Excited state mean-field theory without auto-matic differentiation. J. Chem. Phys. 2020, 152, 204112.

45Aquilante, F.; Pedersen, T. B.; Lindh, R.; Roos, B. O.; Merás, A. S. d.;Koch, H. Accurate ab initio density fitting for multiconfigurational self-consistent field methods. J. Chem. Phys. 2008, 129, 024113.

46Hohenstein, E. G.; Luehr, N.; Ufimtsev, I. S.; Martínez, T. J. An atomicorbital-based formulation of the complete active space self-consistent fieldmethod on graphical processing units. J. Chem. Phys. 2015, 142, 224103.

47Roos, B. O.; Taylor, P. R.; Sigbahn, P. E. A complete active space SCFmethod (CASSCF) using a density matrix formulated super-CI approach.Chem. Phys. 1980, 48, 157–173.

48Roos, B. O. The complete active space SCF method in a fock-matrix-basedsuper-CI formulation. Int. J. Quantum Chem. 1980, 18, 175–189.

49Ruedenberg, K.; Cheung, L. M.; Elbert, S. T. MCSCF optimization throughcombined use of natural orbitals and the brillouin–levy–berthier theorem.Int. J. Quantum Chem. 1979, 16, 1069–1101.

50Siegbahn, P. E. M.; Almlöf, J.; Heiberg, A.; Roos, B. O. The completeactive space SCF (CASSCF) method in a Newton–Raphson formulationwith application to the HNO molecule. J. Chem. Phys. 1981, 74, 2384–2396.

51Yeager, D. L.; Jørgensen, P. Convergency studies of second and approxi-mate second order multiconfigurational Hartree–Fock procedures. J. Chem.Phys. 1979, 71, 755–760.

52Helgaker, T.; Jørgensen, P.; Olsen, J. Molecular Electronic Structure The-ory; John Wily and Sons, Ltd: West Sussex, U.K., 2000; pp 80–82, 600–610.

53Gavnholt, J.; Olsen, T.; Engelund, M.; Schiotz, J. Delta self-consistent fieldmethod to obtain potential energy surfaces of excited molecules on surfaces.Phys. Rev. B 2008, 78.

54Barca, G. M.; Gilbert, A. T.; Gill, P. M. Simple models for difficult elec-tronic excitations. J. Chem. Theory Comput. 2018, 14, 1501–1509.

55Gilbert, A. T.; Besley, N. A.; Gill, P. M. Self-consistent field calculations ofexcited states using the maximum overlap method (MOM). J. Phys. Chem.A 2008, 112, 13164–13171.

56Messmer, R. P. On a variational method for determining excited state wavefunctions. Theor. Chim. Acta 1969, 14, 319–328.

57Choi, J. H.; Lebeda, C. F.; Messmer, R. P. Variational Principle for excitedstates: Exact formulation and other extensions. Chem. Phys. Lett. 1970, 5,503–506.

58Malmqvist, P. A. Calculation of transition density matrices by nonunitaryorbital transformations. Int. J. Quantum Chem. 1986, 30, 479–494.

59Thom, A. J. W.; Head-Gordon, M. Hartree–Fock solutions as a quasidi-

abatic basis for nonorthogonal configuration interaction. J. Chem. Phys.2009, 131, 124113.

60Sundstrom, E. J.; Head-Gordon, M. Non-orthogonal configuration interac-tion for the calculation of multielectron excited states. J. Chem. Phys. 2014,140, 114103.

61Zgid, D.; Nooijen, M. The density matrix renormalization group self-consistent field method: Orbital optimization with the density matrix renor-malization group method in the active space. J. Chem. Phys. 2008, 128,144116.

62Broyden, C. G. The convergence of a class of double-rank minimizationalgorithms 1. general considerations. IMA J. Appl. Math. 1970, 6, 76–90.

63Fletcher, R. A new approach to variable metric algorithms. J. Comput. 1970,13, 317–322.

64Goldfarb, D. A family of variable-metric methods derived by variationalmeans. Math. Comput. 1970, 24, 23–26.

65Shanno, D. F. Conditioning of quasi-Newton methods for function mini-mization. Math. Comput. 1970, 24, 647–656.

66Goetz, B. V. D. Improving Wavefunction Efficiency by Tessellating Corre-lation Factors and Coupled State-Specific Optimization. Ph.D. thesis, Uni-versity of California, Berkeley, 2021.

67Shavitt, I.; Bender, C.; Pipano, A.; Hosteny, R. The iterative calculation ofseveral of the lowest or highest eigenvalues and corresponding eigenvectorsof very large symmetric matrices. J. Comput. Phys. 1973, 11, 90–108.

68Sun, Q.; Berkelbach, T. C.; Blunt, N. S.; Booth, G. H.; Guo, S.;Li, Z.; Liu, J.; McClain, J. D.; Sayfutyarova, E. R.; Sharma, S.;Wouters, S.; Chan, G. K. PySCF: the Python-based simulations of chem-istry framework. 2017; https://onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1340.

69https://www.tensorflow.org/.70Dunning Jr, T. H. Gaussian basis sets for use in correlated molecular cal-

culations. I. The atoms boron through neon and hydrogen. J. Chem. Phys.1989, 90, 1007–1023.

71Hehre, W. J.; Ditchfield, R.; Pople, J. A. Self—consistent molecular orbitalmethods. XII. Further extensions of Gaussian—type basis sets for use inmolecular orbital studies of organic molecules. J. Chem. Phys. 1972, 56,2257–2261.

72Allouche, A.-R. Gabedit—A graphical user interface for computationalchemistry softwares. J. Comput. Chem. 2011, 32, 174–182.

73Schaftenaar, G.; Vlieg, E.; Vriend, G. Molden 2.0: quantum chemistrymeets proteins. J. Comput.-Aided Mol. Des. 2017, 31, 789–800.

74Docken, K. K.; Hinze, J. LiH Potential Curves and Wavefunctions forX1Σ+, A1Σ+, B1Π,3Σ+, and 3Π. J. Chem. Phys. 1972, 57, 4928–4936.

75Pastorczak, E.; Gidopoulos, N. I.; Pernal, K. Calculation of electronic ex-cited states of molecules using the Helmholtz free-energy minimum princi-ple. Phys. Rev. A 2013, 87, 062501.

76Jensen, H. J. A.; Jørgensen, P.; gren, H. Efficient optimization of large scaleMCSCF wave functions with a restricted step algorithm. J. Chem. Phys.1987, 87, 451–466.

77Maatouk, A.; Ben Houria, A.; Yazidi, O.; Jaidane, N.; Hochlaf, M. Elec-tronic states of MgO: Spectroscopy, predissociation, and cold atomic Mgand O production. J. Chem. Phys. 2010, 133, 144302.

78Kim, J. H.; Li, X.; Wang, L.-S.; de Clercq, H. L.; Fancher, C. A.;Thomas, O. C.; Bowen, K. H. Vibrationally resolved photoelectron spec-troscopy of MgO-and ZnO-and the low-lying electronic states of MgO,MgO-, and ZnO. J. Phys. Chem. A 2001, 105, 5709–5718.

79Thümmel, H.; Klotz, R.; Peyerimhoff, S. D. The electronic structure of theMgO molecule in ground and excited states. Chem. Phys. 1989, 129, 417–430.

80Diffenderfer, R. N.; Yarkony, D. R. Use of the state-averaged MCSCF pro-cedure: application to radiative transitions in magnesium oxide. J. Phys.Chem. 1982, 86, 5098–5105.

81Deskevich, M. P.; Nesbitt, D. J.; Werner, H.-J. Dynamically weighted mul-ticonfiguration self-consistent field: Multistate calculations for F+ H 2 O→HF+ OH reaction paths. J. Chem. Phys. 2004, 120, 7281–7289.

82Andersson, K.; Roos, B. O. Excitation energies in the nickel atom studiedwith the complete active space SCF method and second-order perturbationtheory. Chem. Phys. Lett. 1992, 191, 507–514.

83Malmqvist, P. t.; Pierloot, K.; Shahi, A. R. M.; Cramer, C. J.; Gagliardi, L.The restricted active space followed by second-order perturbation theorymethod: Theory and application to the study of Cu O 2 and Cu 2 O 2

14

systems. J. Chem. Phys. 2008, 128, 204109.