Next generation interatomic potentials for condensed systems · Next generation interatomic...

16
Eur. Phys. J. B (2014) 87: 152 DOI: 10.1140/epjb/e2014-50070-0 Colloquium T HE EUROPEAN P HYSICAL JOURNAL B Next generation interatomic potentials for condensed systems Christopher Michael Handley and J¨ org Behler a Lehrstuhl f¨ ur Theoretische Chemie, Ruhr-Universit¨at Bochum, 44780 Bochum, Germany Received 30 January 2014 Published online 7 July 2014 – c EDP Sciences, Societ`a Italiana di Fisica, Springer-Verlag 2014 Abstract. The computer simulation of condensed systems is a challenging task. While electronic structure methods like density-functional theory (DFT) usually provide a good compromise between accuracy and efficiency, they are computationally very demanding and thus applicable only to systems containing up to a few hundred atoms. Unfortunately, many interesting problems require simulations to be performed on much larger systems involving thousands of atoms or more. Consequently, more efficient methods are urgently needed, and a lot of effort has been spent on the development of a large variety of potentials enabling simulations with significantly extended time and length scales. Most commonly, these potentials are based on physically motivated functional forms and thus perform very well for the applications they have been designed for. On the other hand, they are often highly system-specific and thus cannot easily be transferred from one system to another. Moreover, their numerical accuracy is restricted by the intrinsic limitations of the imposed functional forms. In recent years, several novel types of potentials have emerged, which are not based on physical considerations. Instead, they aim to reproduce a set of reference electronic structure data as accurately as possible by using very general and flexible functional forms. In this review we will survey a number of these methods. While they differ in the choice of the employed mathematical functions, they all have in common that they provide high-quality potential-energy surfaces, while the efficiency is comparable to conventional empirical potentials. It has been demonstrated that in many cases these potentials now offer a very interesting new approach to study complex systems with hitherto unreached accuracy. 1 Introduction For the computer simulation of condensed phase systems, there are two fundamental approaches to determine the re- quired energies and forces. The first is the use of electronic structure techniques. With increasing computer resources, the applicability of these methods has been extended con- tinuously in recent years, and many different methods are now available. In particular density-functional the- ory (DFT) [1] has become a standard method in ab initio molecular dynamics (MD) simulations [2,3] of condensed systems with many successful applications. Still, while ac- curate, these simulations require large amounts of CPU time, and in many cases detailed simulations are still un- affordable. The computational costs of electronic struc- ture calculations can be notably reduced by introducing additional approximations. An example is the family of tight-binding methods [47], which allows for the study of much larger systems. Further coarse graining steps build- ing on tight-binding can be made resulting, for instance, in the bond order potentials developed by Hammerschmidt et al. [8]. The other option is to use atomistic potentials, in which the electronic degrees of freedom are not taken into account explicitly. Instead, a direct functional relation be- tween the atomic configuration and the potential-energy a e-mail: [email protected] is derived, which is typically based on physical consider- ations and simplifications. The validity of this procedure can be deduced from quantum mechanics. Employing the Born-Oppenheimer approximation, the Hamilton opera- tor of the electronic Schr¨ odinger equation is fully defined by specifying the atomic positions, the chemical elements, and the total charge of the system, which is typically neu- tral. Since the sum of the electrostatic repulsion energy between the nuclei and of the eigenvalue of the electronic Hamilton operator yields the potential-energy of a given atomic configuration, a direct functional relation between the structure and its energy must exist, which corresponds to the potential-energy surface (PES). If the PES would be known analytically, very efficient simulations could be carried out by avoiding demanding electronic structure calculations. Unfortunately, the development of atomistic potentials, or equivalently of PESs, is a very challenging task, since their multidimensional topology is governed by the full complexity of the underlying quantum mechani- cal many-electron problem, which is very difficult to de- scribe by closed analytic forms in all but the most simple systems. PESs for large-scale simulations can be constructed in many different ways. The typical approach in classi- cal force field methods [9,10], which are frequently used in MD simulations of biochemical systems, is to con- struct the PES as a sum of simple low-dimensional terms

Transcript of Next generation interatomic potentials for condensed systems · Next generation interatomic...

Page 1: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152DOI 101140epjbe2014-50070-0

Colloquium

THE EUROPEANPHYSICAL JOURNAL B

Next generation interatomic potentials for condensed systems

Christopher Michael Handley and Jorg Behlera

Lehrstuhl fur Theoretische Chemie Ruhr-Universitat Bochum 44780 Bochum Germany

Received 30 January 2014Published online 7 July 2014 ndash ccopy EDP Sciences Societa Italiana di Fisica Springer-Verlag 2014

Abstract The computer simulation of condensed systems is a challenging task While electronic structuremethods like density-functional theory (DFT) usually provide a good compromise between accuracy andefficiency they are computationally very demanding and thus applicable only to systems containing up to afew hundred atoms Unfortunately many interesting problems require simulations to be performed on muchlarger systems involving thousands of atoms or more Consequently more efficient methods are urgentlyneeded and a lot of effort has been spent on the development of a large variety of potentials enablingsimulations with significantly extended time and length scales Most commonly these potentials are basedon physically motivated functional forms and thus perform very well for the applications they have beendesigned for On the other hand they are often highly system-specific and thus cannot easily be transferredfrom one system to another Moreover their numerical accuracy is restricted by the intrinsic limitations ofthe imposed functional forms In recent years several novel types of potentials have emerged which are notbased on physical considerations Instead they aim to reproduce a set of reference electronic structure dataas accurately as possible by using very general and flexible functional forms In this review we will survey anumber of these methods While they differ in the choice of the employed mathematical functions they allhave in common that they provide high-quality potential-energy surfaces while the efficiency is comparableto conventional empirical potentials It has been demonstrated that in many cases these potentials nowoffer a very interesting new approach to study complex systems with hitherto unreached accuracy

1 Introduction

For the computer simulation of condensed phase systemsthere are two fundamental approaches to determine the re-quired energies and forces The first is the use of electronicstructure techniques With increasing computer resourcesthe applicability of these methods has been extended con-tinuously in recent years and many different methodsare now available In particular density-functional the-ory (DFT) [1] has become a standard method in ab initiomolecular dynamics (MD) simulations [23] of condensedsystems with many successful applications Still while ac-curate these simulations require large amounts of CPUtime and in many cases detailed simulations are still un-affordable The computational costs of electronic struc-ture calculations can be notably reduced by introducingadditional approximations An example is the family oftight-binding methods [4ndash7] which allows for the study ofmuch larger systems Further coarse graining steps build-ing on tight-binding can be made resulting for instance inthe bond order potentials developed by Hammerschmidtet al [8]

The other option is to use atomistic potentials inwhich the electronic degrees of freedom are not taken intoaccount explicitly Instead a direct functional relation be-tween the atomic configuration and the potential-energy

a e-mail joergbehlertheochemruhr-uni-bochumde

is derived which is typically based on physical consider-ations and simplifications The validity of this procedurecan be deduced from quantum mechanics Employing theBorn-Oppenheimer approximation the Hamilton opera-tor of the electronic Schrodinger equation is fully definedby specifying the atomic positions the chemical elementsand the total charge of the system which is typically neu-tral Since the sum of the electrostatic repulsion energybetween the nuclei and of the eigenvalue of the electronicHamilton operator yields the potential-energy of a givenatomic configuration a direct functional relation betweenthe structure and its energy must exist which correspondsto the potential-energy surface (PES) If the PES wouldbe known analytically very efficient simulations could becarried out by avoiding demanding electronic structurecalculations Unfortunately the development of atomisticpotentials or equivalently of PESs is a very challengingtask since their multidimensional topology is governed bythe full complexity of the underlying quantum mechani-cal many-electron problem which is very difficult to de-scribe by closed analytic forms in all but the most simplesystems

PESs for large-scale simulations can be constructedin many different ways The typical approach in classi-cal force field methods [910] which are frequently usedin MD simulations of biochemical systems is to con-struct the PES as a sum of simple low-dimensional terms

Page 2 of 16 Eur Phys J B (2014) 87 152

representing covalent bonds angles and dihedral anglesand to consider non-bonded electrostatic and dispersioninteractions By using a pre-defined bonding pattern andatom types based on functional groups most classicalforce fields are unable to describe chemical reactions or sig-nificant atomic rearrangements Consequently they cannotbe used to study many problems in materials science in-volving eg metals and semiconductors or to addressquestions related to the making and breaking of bondsOnly a few methods employing the ideas of classical forcefields like the reactive force field ReaxFF [11] or empiricalvalence bond (EVB) potentials [12] are able to overcomethese limitations to some extent

For theoretical studies in the field of materials sciencemany of the simplifications used in classical force fieldshave to be abandoned Instead atomistic potentials mustbe able to decribe a close to continuous range of atomicconfigurations and chemical environments very accuratelyDue to the large diversity of relevant structures from dif-ferent crystal structures via all kinds of local and extendeddefects to amorphous solids and glasses the distinction be-tween bonded and non-bonded atoms is usually not pos-sible and suitable potentials must employ functions basedon the atomic positions only

The selection of an appropriate functional form is onlythe start of the creation of a PES Typically a number ofparameters needs to be optimized in order to obtain anaccurate representation of the atomic interactions Theseparameters are determined by minimizing the errors ofthe energy the forces and in a few cases the Hessianfor a number of example configurations for which refer-ence electronic structure calculations have been carriedout Alternatively some potentials are also making useof available experimental data and also a combination ofexperimental and theoretical data is possible The identi-fication of the optimum set of parameter values can be alaborious task since the final potential must yield accu-rate results for a substantial number of different physicalproperties like binding energies equilibrium geometrieselastic constants and vibrational (or phonon) frequencies

Countless potentials have been published to date forall imaginable systems and applications The Tersoff po-tential is an example of an empirical bond order poten-tial [1314] which has been frequently used for the investi-gation of semiconductors Brenner presented an extensionof the method to allow for the simulation of hydrocarbonsthe so-called REBO (reactive bond order) potential [15]Here the bond order is chosen from a selection of valueswhere the appropriate choice is driven by the nature ofthe local chemical environment of the bond This exten-sion of the work by Tersoff also included the addition ofnew interaction terms that were based upon the degree ofconjugation of the bond

Another method which is very popular for studyingmetals and alloys is the embedded atom method (EAM)by Pijper et al [16] and Daw et al [17] EAM in its mostbasic form consists of two terms a repulsive part andan attractive part The repulsive term is a pairwise po-tential that depends only on the interatomic distance be-

tween two atoms The second term the attractive part isthe embedding potential It determines the energy changewhen bringing a free atom into a position within the mate-rial and thus into the electron density of all other atomsThe energy of an atom at a given place must then be de-termined using the embedding function which referencesthe electron density at that point being determined froma superposition of atomic densities of the other atoms Anextension which also takes angular variations of the elec-tron density into account is the modified embedded atommethod (MEAM) [18] which improves the accuracy of theapproach significantly

Related to explicit electronic structure methods is thefamily of tight binding approaches [457] Tight bind-ing assumes that the electrons of the atoms are ldquotightlyboundrdquo ndash hence the name ndash and that the atoms and theirassociated electrons are mainly interacting with their closeneighbors Because of this assumption atomic-like orbitalsform a natural basis set for the interacting atoms and alinear combination of these orbitals is used to constructthe many-body Hamiltonian Rather than computing theintegrals describing the interaction between these orbitalsthe Hamiltonian is instead constructed from parametrizedfunctions where each function is determined for each atompair combination and found by fitting to quantum me-chanical calculations These functions often depend onlyon the parameters and the distance between atoms andthe direction cosines which describe the relative orienta-tion and alignment of these orbitals [19ndash21] Solving thegeneralized eigenvalue equation for these matrices thenreturns the eigenvalues and eigenvectors providing fur-ther electronic properties of the system such as the bandstructure or the crystal field splitting of d-orbitals TheSlater Koster integrals of tight binding [19] do have aparallel for molecular systems ndash Extended Huckel The-ory [22] Further molecular extensions are the AngularOverlap Model and Ligand Field Molecular Mechanicsenabling the description of d-orbitals within classical forcefields [2324]

Countless other potentials have been published in re-cent decades for performing computer simulations of ma-terials and a comprehensive overview is beyond the scopeof this review All these approaches have in common thatthey are typically based on some physical approximationsto reduce the complexity of the potentials to a tractablelevel In most cases the resulting atomistic potentials arethus very efficient and they perform reasonably well for theintended applications but it is very challenging to obtainhigh-quality energies and forces close to first-principlesdata The limited numerical accuracy has its origin in thefunctional forms which are often less flexible and enforcecertain properties of the potential They usually containonly a few fitting parameters and consequently they aretoo inflexible to capture all the fine details of complexenergy landscapes

In this review we will survey a variety of methods forconstructing PESs which aim to overcome the limitationsof conventional potentials by choosing very generic func-tional forms that are not based on physical considerations

Eur Phys J B (2014) 87 152 Page 3 of 16

or approximations Instead the goal of these ldquomathemati-cal potentialsrdquo is to reproduce a given set of reference datafrom electronic structure calculations as accurately as pos-sible Due to the purely mathematical nature of these po-tentials they are able to describe all types of bonding fromdispersion interactions via covalent bonds to metallic sys-tems with similar accuracy employing the same functionalform Many of these methods which have emerged only inrecent years in the literature in the context of PESs areoften also summarized under the keyword ldquomachine learn-ingrdquo potentials They can be constructed using a varietyof different functional units like Gaussians Taylor seriesexpansions sigmoid-like functions and many others andoften a superposition or some other form of combinationof a large number of these functions is used to expresscomplicated multidimensional PESs Atomistic potentialsof this type are still not widely distributed but their de-velopment has seen enormous progress in recent years andthey have now reached a level of maturity making theminteresting candidates for solving problems in chemistryand condensed matter physics

When constructing accurate atomistic potentials anumber of substantial challenges have to be met

1 Accuracy ndash The energy and forces provided by the po-tential should be as close as possible to the data ofreference electronic structure calculations A manda-tory condition to reach this goal is that the potentialis sufficiently high-dimensional Only then it will beable to capture the effects of polarization and chargetransfer and the consideration of many-body effectsis also most important for describing metals Furtherthe potential should be transferable since a potentialis only useful if it is applicable to structures whichhave not been used for its construction

2 Efficiency ndash Once constructed the potential should beas fast as possible to evaluate to enable the exten-sion of the time and length scales of atomistic simula-tions clearly beyond the realm of electronic structuremethods

3 Generality ndash The potential should have a universal andunbiased functional form ie it should describe alltypes of atomic interactions with the same accuracyFurther the functional form should enable the calcula-tion of analytic derivatives to determine forces neededin molecular dynamics simulations

4 Reactivity ndash The potential should not require the speci-fication of fixed bonding patterns or a further discrim-ination of different atom types for a given elementInstead it should be able to describe the making andbreaking of bonds and arbitrary structural changeswhich is equivalent to depending solely on the positionsand nuclear charges of the atoms

5 Automation ndash The parametrization of the potential fora specific system should require a minimum of humaneffort

6 Costs ndash The amount of demanding electronic structuredata required to construct the potential should be assmall as possible

Apart from criterion three all these points are equallyimportant for ldquophysicalrdquo and ldquomathematicalrdquo potentialsand meeting all these requirements to full satisfaction isvery difficult to achieve for any type of potential currentlyavailable

Some further aspects of the construction of potentialsneed to be discussed here which are specific for purelymathematical potentials while they usually do not rep-resent serious problems for conventional potentials Theyconcern some invariances of the PES that need to be in-corporated in a proper way First the potential-energy ofa system must be invariant with respect to rotation andtranslation of the system since only the relative atomicpositions are important for the energy and forces Furtherthe energy of a system must not change upon interchangeof the positions of any two atoms of the same chemicalelement Including this permutation invariance is straight-forward eg in classical force fields which express the en-ergy as a sum of many individual low-dimensional termsor in electronic structure methods which incorporate thisinvariance by the diagonalization of the Hamiltonian Incase of high-order many-body atomistic potentials whichare required to reach an accuracy comparable to that offirst-principles methods including this permutation in-variance is one of the main conceptual challenges In thefollowing sections some methods which have evolved inrecent years as general-purpose potentials for a wide rangeof systems are highlighted and their properties withrespect to the criteria mentioned above will be discussed

The ultimate goal of such atomisitic potentials is thesimulation of large periodic and non-periodic systemsranging from large molecules bulk liquids and solids toextended interfaces for which electronic structure meth-ods are computationally too demanding to use or are noteven tractable

2 General-purpose atomistic potentials

21 Polynomial fitting

In 2003 Brown et al introduced a method employing poly-nomials as basis functions to construct PESs of molec-ular systems [25] The structural description of the sys-tem starts with a full set of interatomic distances likeR12 R13 R14 R23 R24 R34 for the example of a tetra-atomic molecule A4 The Rij are then transformed to aset of Morse variables defined as

Yij = exp (minusRijγ) (1)

with γ being a constant between 15 and 3 Bohr Sincedistances as well as Morse variables represent internal co-ordinates whose values do not change with rotation andtranslation of the molecule this description ensures trans-lational and rotational invariance of the potential A fulldistance matrix is however not invariant with respect toa permutation of like atoms Therefore a straightforward

Page 4 of 16 Eur Phys J B (2014) 87 152

construction of the PES as a linear combination of poly-nomials [26]

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f =0

Cabcdef

[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(2)

with coefficients Cabcdef does not exhibit this invari-ance Here a b c d e and f are non-negative integerswith a sum between zero and the maximum order nmax

of the polynomials to be considered Typically the largestexponents used are five or six

In order to obtain a permutationally invariant ba-sis set the monomials are symmetrized employing asymmetry operator S

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f=0

DabcdefS[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(3)

which yields the same basis functions irrespective ofthe order of chemically equivalent atoms All coefficientsDabcdef of the symmetrized permutationally invariantpolynomials are fitted simultaneously using least-squaresfitting procedures to a database of high-level ab initio cal-culations which contains typically several tens of thou-sands of data points The specific form of the symmetrizedterms whose number can be substantial depends on thechemical composition of the molecule and to date themethod has been applied to systems containing up to10 atoms More details about the symmetrization proce-dure for which efficient methods are available [26] andthe resulting terms can be found elsewhere [27]

Although the complexity of the method currently re-quires a truncation of the order of the polynomials at com-parably low order and restricts the formal dimensionalityto that of small molecules the resulting potentials can stillbe applied to large systems like bulk water using two- andthree-body terms only [28] This is possible since in molec-ular systems high-order many-body interactions are usu-ally not essential Further examples for successful applica-tions of the method are various charged and neutral waterclusters [262930] the vinyl radical [31] and many othersThus while the construction of PESs using polynomialsis a powerful technique for the investigation of dynam-ics reactions dissociation and conformational changes forsmall molecules or systems composed of these there hasbeen no work towards implementing the technique withrespect to the creation of accurate potentials that can beused to model systems like metals and semiconductors

22 Gaussian processes regression

Gaussian process regression (GPR) [32] also known asKriging is a method that models observables as a real-ization of a underlying statistical probabilistic model ndash

the probability density function The basic concept is thattraining examples with correlated observables ie ener-gies have also a close correlation in their variables iethe coordinates describing the atomic configuration Theassumption then is that the function which models thenon-linear relationship between the variables and the ob-servables the PES is smoothly varying The underlyingstatistical model the probability density function deter-mines the probabilities that for a given set of inputs aparticular output is observed

Since the output is known for a number of configura-tions from reference electronic structure calculations thereverse question can be addressed Can the probabilitydensity function be found that best models the PES for agiven number of structures and their associated energiesand takes into account the reliability of predictions in theregions near to the known reference points

All reference data for the GPR model come from adeterministic function which operates on a set of inputcoordinates where reference point i has coordinates xi =(xi

1 xiNdim

) and Ndim is the number of dimensions ordegrees of freedom of the system

E(xi) = z(xi) +sum

h

βhfh(xi) (4)

The fh are some linear or non-linear functions of x andβ are coefficients where h runs over all functions thatdefine the PES It then follows that because the referencedata comes from a deterministic source the ldquoerrorsrdquo arenot random but repeatable for each configuration Theseerrors can then be described as a further energy functionwhose values are smoothly varying and correlated Theerror within the reference data z(xi) can be modelled byGaussian functions and sampled points that are similar inthe input space have similar errors Thus the energy for adata point is the sum of two varying functions ndash the errornoise and the deterministic source function Consequentlythe energy prediction can be reformulated as

E(xi) = z(xi) + μ (5)

The error is the deviation z(xi) of the function value fromthe global average μ which is provided by the determin-istic source and also includes the error due to noise inthe sampled data This is now expressed by the GPRThe required correlation is constructed from Gaussianone for each of the known reference structures The en-ergy prediction then relies on these Gaussian functionswhere the contribution by each function depends uponthe correlation between the reference configurations andthe trial configuration Given that two similar input vec-tors xi and xj have comparable errors this similarity canbe represented by a correlation matrix C with elements

Cij = Corr(z(xi) z(xj))

= exp

(minus

Ndimsum

h=1

θh|xih minus xj

h|ph

)

(6)

θh and ph are two parameters of the model with θh ge 0and 1le ph le2 Fitting the model then relies on a) the de-termination of the variance σ2 of the Gaussians and b)

Eur Phys J B (2014) 87 152 Page 5 of 16

xi

E(xi)

Fig 1 A one-dimensional interpolation by a Gaussian ProcessRegression (GPR) The grey region represents the uncertaintyof the model for a given input x which is small near or atknown reference points (black dots) The solid curve is theGPR interpolation of the PES The likelihood is then low inthose grey regions with large widths and so sampling of newtraining points is directed towards these configurations

the fitting of the parameter vectors of θ and p Then σ2θ and p are found by the maximization of the likelihoodfunction L with these parameters modulating how corre-lated the Gaussian process values are over the coordinatespace The likelihood function describes the probabilitythat the parameters σ2 θ and p determine the PESand with the vector ε containing all Nref reference energiesε = [E(xi) i = 1 2 Nref ]T we have

L(θp σ2

)=

1

(2π)Nref2 (σ2)Nref2 |C|12

times exp[minus (ε minus Iμ)T Cminus1(ε minus Iμ)

2σ2

] (7)

I is a column vector of 1rsquos and σ2 can be expressed withrespect to θ and p Thus the likelihood function is maxi-mized with respect to θ and p and typically it is the log ofthe likelihood function that is maximized given partlogL

part(σ2) = 0In most applications the elements of p are set to 2 furthersimplifying the fitting of the likelihood parameters

New reference structures are added to the data set bydetermining the likelihood function for unseen examplesThose with the worst likelihood are then used as new ref-erence points The GPR is iteratively improved by addingmore examples to the reference set with the aim to im-prove the likelihood function for all reference and unseenstructures Ideally a model should yield high likelihoodfunction values for inputs within the data set but whichhas been constructed using the least amount of Gaussiansin order to ensure simplicity and optimum generalizationproperties when applied to new configurations This is par-ticularly useful if the amount of data is limited such as inthe case of using high-level electronic structure methodsto generate the reference data (Fig 1)

GPRs have already been shown to be applicableto similar types of problems as neural networks andspecifically in the work of the Popelier group GPRshave been compared to neural networks for modellingenvironment-dependent atom-centered electrostatic mul-tipole moments [33] It has been found that in this ap-plication GPRs can be even superior to neural networkswith almost a 50 decrease in the electrostatic inter-action energy error for the pentamer water cluster [33]

In the following years the use of GPRs in the Popeliergroup has been expanded and been used for the predic-tion of the electrostatic multipole moments for ethanol [34]and alanine [35] as well as the simulation of hydratedsodium ions [36] and the prediction hydrogen bondedcomplexes [37]

Bartok et al have used GPRs to construct PESs ofa variety of condensed systems [38] In this very accu-rate approach the total energy is expressed as a sum ofenvironment-dependent atomic energies and provided asa superposition of Gaussians centered at known referencepoints A central aspect of the method is the use of four-dimensional spherical harmonics to provide a structuralcharacterization which is invariant with respect to rota-tion and translation of the system [38ndash40] Their applica-tion of GPRs has been either to model the entire PES likefor carbon silicon germanium iron and GaN [38] or toprovide a correction to a DFT PES [40] In the latter ap-plication very accurate energies for water can be obtainedby first calculating the energy at the DFT-BLYP level oftheory [4142] The GPR model is then fitted in order tocorrect the one and two-body interactions using more ac-curate data eg from coupled cluster calculations Thusa more computationally affordable method can producehigh accuracy results without having the need to explic-itly compute the energies from an unaffordable high levelof theory

Gaussian functions have also been employed by Ruppet al [4344] to fit the atomization energies of organicmolecules For the structural description of a wide rangeof molecules a Coulomb matrix has been used While thismethod does not yet represent a PES suitable for atomisticsimulations as only the global minimum structure of eachmolecule can be described and as no forces are availablethis very promising technique has certainly the capabilitiesto become a serious alternative approach in the years tocome A first step in this direction has been taken veryrecently in a first study of conformational energies of thenatural product Archazolid A [45]

23 Modified Shepard interpolation

In 1994 Ischtwan and Collins developed a technique toconstruct the PESs of small molecules based on the mod-ified Shepard interpolation (MSI) method [46] The basicidea is to express the energy of an atomic configuration xas a sum of weighted second order Taylor expansions cen-tered at a set of reference points known from electronicstructure calculations Typically a set of 3Natom minus 6 in-verse interatomic distances is used to describe the atomicconfiguration x

Then for each reference point i the energy of a con-figuration x in its close environment is approximated bya second order Taylor series Ei(x) as

Ei (x) = E(xi)

+(x minus xi

)TGi

+12(x minus xi

)THi

(x minus xi

)+ (8)

Page 6 of 16 Eur Phys J B (2014) 87 152

where Gi is the gradient and Hi is the matrix of secondderivatives ie the Hessian at the position xi of the ref-erence point

Typically many reference points are available and theenergy prediction for an unknown configuration can beimproved significantly by employing a weighted sum ofthe Taylor expansions centered at all reference points

E (x) =Nrefsum

i=1

wi (x) middot Ei (x) (9)

The normalized weight wi of each individual Taylorexpansion is given by

wi(x) =vi (x)

sumNrefk=1 vk (x)

(10)

and can be determined from the unnormalized weightfunction

vi (x) =1

|x minus xi|p (11)

which approaches zero for very distant reference pointsand infinity for x = xi corresponding to a normalizedweight wi(x) = 1 The power parameter p needs to beequal or larger than the order of the Taylor expansionThe obtained PES has a number of favorable features Itdepends only on interatomic distances and therefore itis translationally and rotationally invariant Further thePES is also invariant with respect to a permutation ofchemically equivalent atoms

The accuracy of the method crucially depends on thechoice of the reference points as the approximation of asecond order Taylor expansion of the energy is only suffi-ciently accurate if there is a dense set of reference struc-tures It has been suggested by Ischtwan and Collins [47]to start with a reasonable first set of structures which canbe chosen along a reaction path and to improve the qual-ity of the PES by adding more structures in chemicallyrelevant parts of the configuration space These pointsare identified by running classical molecular dynamicstrajectories employing preliminary potentials which arethen iteratively refined by adding more and more pointsfrom electronic structure calculations carried out at theconfigurations visited in these trajectories

The modified Shepard interpolation scheme has beenapplied successfully to study chemical reaction dynamicsfor a number of systems like NH + H2 rarr NH2 + H [47]OH + H2 rarr H2O + H [4849] and many others [50ndash53] Ithas also been adapted to describe molecule-surface inter-actions using the dissociation of H2 at the Pt(111) surfaceas an example by Crespos et al [5455] To keep the di-mensionality and thus the complexity of the system at areasonable level the positions of the surface atoms havebeen frozen resulting in a six-dimensional PES As the hy-drogen atoms can move along the surface an extension ofequation (12) has been required to include the symmetryof the surface The modified energy expression is

E (x) =Nrefsum

i=1

Nsymsum

j=1

wij (x) middot Ei (x) (12)

The purpose of the second summation over all symmetryelements Nsym ensures that the full symmetry of the sur-face is taken into account exactly without the requirementto add structures which are equivalent by symmetry intothe reference set

The modified Shepard interpolation is very appealingbecause of its simplicity and the accuracy of the obtainedPESs but it has also a number of drawbacks First to dateits applicability is limited to very low-dimensional systemsand once the PES has been constructed the system sizecannot be changed because if an atom would be addedthe distance to the reference points in the Taylor expan-sions would not be defined Moreover for the prediction ofthe energy of a new configuration the reference set needs tobe known and a summation of all Taylor expansions is re-quired which makes the method more demanding for largereference sets Further the construction of the potentialrequires energies gradients and Hessians from electronicstructure calculations which makes the determination ofthe reference data rather costly

Given the similarities between Shepard interpolationand polynomial fitting in terms of applicability it is worthcomparing these methods in terms of how much data isrequired to achieve a fit for the CH+

5 system Huang et almade use of 20 728 training energies to fit a polynomialsurface [30] However Wu et al have been able to achievea similar fit using just 50 training examples employingMSI [5051] However these 50 training examples are notjust energies but include forces and Hessians In order todetermine these Hessians numerous energy calculationsmust be performed resulting in 19 440 energy calculationsndash a comparable number to that of Huang et al

24 Interpolating moving least squares

The standard implementation of interpolating movingleast squares (IMLS) [5657] which is a method derivedfrom the modified Shepard interpolation determines aconfigurationrsquos energy using a number of linearly inde-pendent basis functions which can be represented by amatrix The energy is then given as the sum of M basisfunctions bi of the atomic coordinates x where each basisfunction is weighted by a coefficient ai

E(x) =Msum

i=1

ai(x)bi(x) (13)

The basis functions have either the form of a Taylorseries [57] or a many-body expansion similar to thoseused in high-dimensional model representation neu-ral networks [58] The coefficients are determined byminimization of the objective function

D[E(x)] =Nrefsum

i=1

wi(x)

⎣Msum

j=1

aj(x)bj(x) minus E(xi)

⎦2

(14)

which is the weighted sum of squared potential energy er-rors with wi(x) being the weight defined by the prox-imity of the trial structure to a reference structure i

Eur Phys J B (2014) 87 152 Page 7 of 16

Hence the largest weight would be found if the trial con-figuration coincides with a configuration in the referenceset The method is easily expanded to include further fit-ting data such as forces and the Hessians of the train-ing points [5859] which expands the design matrix ndash thematrix containing the simultaneous equations that mustbe solved in order to find the best combination of basisfunctions The coefficients aj are detemined by solving

BTWBa = BTWV (15)

Here V is the vector of reference energies and a isthe transpose of the vector containing the coefficientsW is a NreftimesNref diagonal matrix containing the weightswhere Wij = wi(x)δij B is the Nref times M design matrix

Initial applications of the IMLS method focused onfitting available PESs for HN2 [60] HOOH [5861] andHCN [58] before moving on to PESs computed using ac-curate ab initio methods such as for HOOH CH4 andHCN [59] CH2 and HCN [62] as well as the O+ HClsystem [63] More recently Dawes et al have appliedIMLS fitted PES to van der Waals systems including theCO [64] NNO [65] and OCS [66] dimers described asfour-dimensional PESs where data points have been ob-tained from CCSD(T) calculations This method of datageneration is suitable for IMLS as accurate fits can beconstructed from relatively small amounts of data Thislevel of accuracy is often required eg for the predictionof rovibrational spectra

The basis functions used by Dawes et al take the formof numerous many-body terms that together form a high-dimensional model representation [67] High order termsare not included as basis functions as they would be poorlydetermined in scenarios where the PES has been sparselysampled

In order to perform the fitting of the HOOH PESDawes et al used and iteratively sampled trainingset which eventually included almost 2000 referencepoints [58] However compared to Shepard interpolationmodels and neural networks IMLS models are costly toevaluate Dawes et al showed that IMLS can be used as aprocedure to create an accurate PES from a sparse set ofreference data points This IMLS PES is then employedto generate training points that are in turn used to traina Shepard interpolation model and a neural network

Using two IMLS models where one uses basis func-tions that are an order of degree higher than those usedin the other model can guide the sampling of the PESRegions with a large error between the two models whichwill occur between the current training points will be theideal locations to sample and in turn improve the modelIn this way the manner in which the PES is sampled issimilar to GRPs where the sampling is sparse while non-stochastic in nature [59] The final model used by Daweset al in this example is a modification of IMLS a local-IMLS [68] which is similar to the Shepard interpolationscheme Local-IMLS computes energies for a configura-tion by using the reference structures that are similarrather than using all reference configurations Howeverlocal-IMLS is more general since the basis functions can

Fig 2 A small two-dimensional feed-forward Neural Network(NN) containing a single hidden layer The input nodes G1 andG2 provide the structural information to the NN which thenyields the energy E Apart from the architecture of the NNie the number of layer and the number of nodes per layerthe NN output is determined by the numerical values of theweight parameters which are shown as arrows indicating theflow of information through the NN The functional form ofthis NN is given in equation (17)

take any form rather than just the Taylor series expan-sions used in Shepard interpolation For the HOOH PES asimilar accuracy compared to previous work was achievedusing 30 less reference points

25 Neural network potentials

Artificial neural networks (NNs) [6970] have first been in-troduced already in 1943 to develop mathematical modelsof the signal processing in the brain [71] and after contin-uous methodical extensions in the following decades theyhave now become an established research topic in math-ematics and computer science with many applications invarious fields Also in chemistry and physics they havefound wide use eg for the analysis of spectra and pat-tern recognition in experimental data [72] The use of NNsto construct PESs employing reference data from elec-tronic structure calculations has been suggested by Blanket al [73] almost twenty years ago and in the meantimeneural network potentials (NNPs) have been reported fora number of molecules [74ndash76] and molecules interactingwith surfaces [77ndash79] A comprehensive overview aboutthe systems that have been addressed by NNPs can befound in several recent reviews [80ndash82]

The basic component of almost any NNP is a feed-forward NN and a small example corresponding to atwo-dimensional PES is shown in Figure 2 It consistsof artificial neurons or nodes which are arranged inlayers The neurons in the input layer represent thecoordinates Gi defining the atomic configuration the neu-ron in the output layer yields the energy of this configu-ration In between the input and the output layer thereare one or more so-called hidden layers each of whichcontains a number of additional neurons The purpose of

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 2: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 2 of 16 Eur Phys J B (2014) 87 152

representing covalent bonds angles and dihedral anglesand to consider non-bonded electrostatic and dispersioninteractions By using a pre-defined bonding pattern andatom types based on functional groups most classicalforce fields are unable to describe chemical reactions or sig-nificant atomic rearrangements Consequently they cannotbe used to study many problems in materials science in-volving eg metals and semiconductors or to addressquestions related to the making and breaking of bondsOnly a few methods employing the ideas of classical forcefields like the reactive force field ReaxFF [11] or empiricalvalence bond (EVB) potentials [12] are able to overcomethese limitations to some extent

For theoretical studies in the field of materials sciencemany of the simplifications used in classical force fieldshave to be abandoned Instead atomistic potentials mustbe able to decribe a close to continuous range of atomicconfigurations and chemical environments very accuratelyDue to the large diversity of relevant structures from dif-ferent crystal structures via all kinds of local and extendeddefects to amorphous solids and glasses the distinction be-tween bonded and non-bonded atoms is usually not pos-sible and suitable potentials must employ functions basedon the atomic positions only

The selection of an appropriate functional form is onlythe start of the creation of a PES Typically a number ofparameters needs to be optimized in order to obtain anaccurate representation of the atomic interactions Theseparameters are determined by minimizing the errors ofthe energy the forces and in a few cases the Hessianfor a number of example configurations for which refer-ence electronic structure calculations have been carriedout Alternatively some potentials are also making useof available experimental data and also a combination ofexperimental and theoretical data is possible The identi-fication of the optimum set of parameter values can be alaborious task since the final potential must yield accu-rate results for a substantial number of different physicalproperties like binding energies equilibrium geometrieselastic constants and vibrational (or phonon) frequencies

Countless potentials have been published to date forall imaginable systems and applications The Tersoff po-tential is an example of an empirical bond order poten-tial [1314] which has been frequently used for the investi-gation of semiconductors Brenner presented an extensionof the method to allow for the simulation of hydrocarbonsthe so-called REBO (reactive bond order) potential [15]Here the bond order is chosen from a selection of valueswhere the appropriate choice is driven by the nature ofthe local chemical environment of the bond This exten-sion of the work by Tersoff also included the addition ofnew interaction terms that were based upon the degree ofconjugation of the bond

Another method which is very popular for studyingmetals and alloys is the embedded atom method (EAM)by Pijper et al [16] and Daw et al [17] EAM in its mostbasic form consists of two terms a repulsive part andan attractive part The repulsive term is a pairwise po-tential that depends only on the interatomic distance be-

tween two atoms The second term the attractive part isthe embedding potential It determines the energy changewhen bringing a free atom into a position within the mate-rial and thus into the electron density of all other atomsThe energy of an atom at a given place must then be de-termined using the embedding function which referencesthe electron density at that point being determined froma superposition of atomic densities of the other atoms Anextension which also takes angular variations of the elec-tron density into account is the modified embedded atommethod (MEAM) [18] which improves the accuracy of theapproach significantly

Related to explicit electronic structure methods is thefamily of tight binding approaches [457] Tight bind-ing assumes that the electrons of the atoms are ldquotightlyboundrdquo ndash hence the name ndash and that the atoms and theirassociated electrons are mainly interacting with their closeneighbors Because of this assumption atomic-like orbitalsform a natural basis set for the interacting atoms and alinear combination of these orbitals is used to constructthe many-body Hamiltonian Rather than computing theintegrals describing the interaction between these orbitalsthe Hamiltonian is instead constructed from parametrizedfunctions where each function is determined for each atompair combination and found by fitting to quantum me-chanical calculations These functions often depend onlyon the parameters and the distance between atoms andthe direction cosines which describe the relative orienta-tion and alignment of these orbitals [19ndash21] Solving thegeneralized eigenvalue equation for these matrices thenreturns the eigenvalues and eigenvectors providing fur-ther electronic properties of the system such as the bandstructure or the crystal field splitting of d-orbitals TheSlater Koster integrals of tight binding [19] do have aparallel for molecular systems ndash Extended Huckel The-ory [22] Further molecular extensions are the AngularOverlap Model and Ligand Field Molecular Mechanicsenabling the description of d-orbitals within classical forcefields [2324]

Countless other potentials have been published in re-cent decades for performing computer simulations of ma-terials and a comprehensive overview is beyond the scopeof this review All these approaches have in common thatthey are typically based on some physical approximationsto reduce the complexity of the potentials to a tractablelevel In most cases the resulting atomistic potentials arethus very efficient and they perform reasonably well for theintended applications but it is very challenging to obtainhigh-quality energies and forces close to first-principlesdata The limited numerical accuracy has its origin in thefunctional forms which are often less flexible and enforcecertain properties of the potential They usually containonly a few fitting parameters and consequently they aretoo inflexible to capture all the fine details of complexenergy landscapes

In this review we will survey a variety of methods forconstructing PESs which aim to overcome the limitationsof conventional potentials by choosing very generic func-tional forms that are not based on physical considerations

Eur Phys J B (2014) 87 152 Page 3 of 16

or approximations Instead the goal of these ldquomathemati-cal potentialsrdquo is to reproduce a given set of reference datafrom electronic structure calculations as accurately as pos-sible Due to the purely mathematical nature of these po-tentials they are able to describe all types of bonding fromdispersion interactions via covalent bonds to metallic sys-tems with similar accuracy employing the same functionalform Many of these methods which have emerged only inrecent years in the literature in the context of PESs areoften also summarized under the keyword ldquomachine learn-ingrdquo potentials They can be constructed using a varietyof different functional units like Gaussians Taylor seriesexpansions sigmoid-like functions and many others andoften a superposition or some other form of combinationof a large number of these functions is used to expresscomplicated multidimensional PESs Atomistic potentialsof this type are still not widely distributed but their de-velopment has seen enormous progress in recent years andthey have now reached a level of maturity making theminteresting candidates for solving problems in chemistryand condensed matter physics

When constructing accurate atomistic potentials anumber of substantial challenges have to be met

1 Accuracy ndash The energy and forces provided by the po-tential should be as close as possible to the data ofreference electronic structure calculations A manda-tory condition to reach this goal is that the potentialis sufficiently high-dimensional Only then it will beable to capture the effects of polarization and chargetransfer and the consideration of many-body effectsis also most important for describing metals Furtherthe potential should be transferable since a potentialis only useful if it is applicable to structures whichhave not been used for its construction

2 Efficiency ndash Once constructed the potential should beas fast as possible to evaluate to enable the exten-sion of the time and length scales of atomistic simula-tions clearly beyond the realm of electronic structuremethods

3 Generality ndash The potential should have a universal andunbiased functional form ie it should describe alltypes of atomic interactions with the same accuracyFurther the functional form should enable the calcula-tion of analytic derivatives to determine forces neededin molecular dynamics simulations

4 Reactivity ndash The potential should not require the speci-fication of fixed bonding patterns or a further discrim-ination of different atom types for a given elementInstead it should be able to describe the making andbreaking of bonds and arbitrary structural changeswhich is equivalent to depending solely on the positionsand nuclear charges of the atoms

5 Automation ndash The parametrization of the potential fora specific system should require a minimum of humaneffort

6 Costs ndash The amount of demanding electronic structuredata required to construct the potential should be assmall as possible

Apart from criterion three all these points are equallyimportant for ldquophysicalrdquo and ldquomathematicalrdquo potentialsand meeting all these requirements to full satisfaction isvery difficult to achieve for any type of potential currentlyavailable

Some further aspects of the construction of potentialsneed to be discussed here which are specific for purelymathematical potentials while they usually do not rep-resent serious problems for conventional potentials Theyconcern some invariances of the PES that need to be in-corporated in a proper way First the potential-energy ofa system must be invariant with respect to rotation andtranslation of the system since only the relative atomicpositions are important for the energy and forces Furtherthe energy of a system must not change upon interchangeof the positions of any two atoms of the same chemicalelement Including this permutation invariance is straight-forward eg in classical force fields which express the en-ergy as a sum of many individual low-dimensional termsor in electronic structure methods which incorporate thisinvariance by the diagonalization of the Hamiltonian Incase of high-order many-body atomistic potentials whichare required to reach an accuracy comparable to that offirst-principles methods including this permutation in-variance is one of the main conceptual challenges In thefollowing sections some methods which have evolved inrecent years as general-purpose potentials for a wide rangeof systems are highlighted and their properties withrespect to the criteria mentioned above will be discussed

The ultimate goal of such atomisitic potentials is thesimulation of large periodic and non-periodic systemsranging from large molecules bulk liquids and solids toextended interfaces for which electronic structure meth-ods are computationally too demanding to use or are noteven tractable

2 General-purpose atomistic potentials

21 Polynomial fitting

In 2003 Brown et al introduced a method employing poly-nomials as basis functions to construct PESs of molec-ular systems [25] The structural description of the sys-tem starts with a full set of interatomic distances likeR12 R13 R14 R23 R24 R34 for the example of a tetra-atomic molecule A4 The Rij are then transformed to aset of Morse variables defined as

Yij = exp (minusRijγ) (1)

with γ being a constant between 15 and 3 Bohr Sincedistances as well as Morse variables represent internal co-ordinates whose values do not change with rotation andtranslation of the molecule this description ensures trans-lational and rotational invariance of the potential A fulldistance matrix is however not invariant with respect toa permutation of like atoms Therefore a straightforward

Page 4 of 16 Eur Phys J B (2014) 87 152

construction of the PES as a linear combination of poly-nomials [26]

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f =0

Cabcdef

[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(2)

with coefficients Cabcdef does not exhibit this invari-ance Here a b c d e and f are non-negative integerswith a sum between zero and the maximum order nmax

of the polynomials to be considered Typically the largestexponents used are five or six

In order to obtain a permutationally invariant ba-sis set the monomials are symmetrized employing asymmetry operator S

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f=0

DabcdefS[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(3)

which yields the same basis functions irrespective ofthe order of chemically equivalent atoms All coefficientsDabcdef of the symmetrized permutationally invariantpolynomials are fitted simultaneously using least-squaresfitting procedures to a database of high-level ab initio cal-culations which contains typically several tens of thou-sands of data points The specific form of the symmetrizedterms whose number can be substantial depends on thechemical composition of the molecule and to date themethod has been applied to systems containing up to10 atoms More details about the symmetrization proce-dure for which efficient methods are available [26] andthe resulting terms can be found elsewhere [27]

Although the complexity of the method currently re-quires a truncation of the order of the polynomials at com-parably low order and restricts the formal dimensionalityto that of small molecules the resulting potentials can stillbe applied to large systems like bulk water using two- andthree-body terms only [28] This is possible since in molec-ular systems high-order many-body interactions are usu-ally not essential Further examples for successful applica-tions of the method are various charged and neutral waterclusters [262930] the vinyl radical [31] and many othersThus while the construction of PESs using polynomialsis a powerful technique for the investigation of dynam-ics reactions dissociation and conformational changes forsmall molecules or systems composed of these there hasbeen no work towards implementing the technique withrespect to the creation of accurate potentials that can beused to model systems like metals and semiconductors

22 Gaussian processes regression

Gaussian process regression (GPR) [32] also known asKriging is a method that models observables as a real-ization of a underlying statistical probabilistic model ndash

the probability density function The basic concept is thattraining examples with correlated observables ie ener-gies have also a close correlation in their variables iethe coordinates describing the atomic configuration Theassumption then is that the function which models thenon-linear relationship between the variables and the ob-servables the PES is smoothly varying The underlyingstatistical model the probability density function deter-mines the probabilities that for a given set of inputs aparticular output is observed

Since the output is known for a number of configura-tions from reference electronic structure calculations thereverse question can be addressed Can the probabilitydensity function be found that best models the PES for agiven number of structures and their associated energiesand takes into account the reliability of predictions in theregions near to the known reference points

All reference data for the GPR model come from adeterministic function which operates on a set of inputcoordinates where reference point i has coordinates xi =(xi

1 xiNdim

) and Ndim is the number of dimensions ordegrees of freedom of the system

E(xi) = z(xi) +sum

h

βhfh(xi) (4)

The fh are some linear or non-linear functions of x andβ are coefficients where h runs over all functions thatdefine the PES It then follows that because the referencedata comes from a deterministic source the ldquoerrorsrdquo arenot random but repeatable for each configuration Theseerrors can then be described as a further energy functionwhose values are smoothly varying and correlated Theerror within the reference data z(xi) can be modelled byGaussian functions and sampled points that are similar inthe input space have similar errors Thus the energy for adata point is the sum of two varying functions ndash the errornoise and the deterministic source function Consequentlythe energy prediction can be reformulated as

E(xi) = z(xi) + μ (5)

The error is the deviation z(xi) of the function value fromthe global average μ which is provided by the determin-istic source and also includes the error due to noise inthe sampled data This is now expressed by the GPRThe required correlation is constructed from Gaussianone for each of the known reference structures The en-ergy prediction then relies on these Gaussian functionswhere the contribution by each function depends uponthe correlation between the reference configurations andthe trial configuration Given that two similar input vec-tors xi and xj have comparable errors this similarity canbe represented by a correlation matrix C with elements

Cij = Corr(z(xi) z(xj))

= exp

(minus

Ndimsum

h=1

θh|xih minus xj

h|ph

)

(6)

θh and ph are two parameters of the model with θh ge 0and 1le ph le2 Fitting the model then relies on a) the de-termination of the variance σ2 of the Gaussians and b)

Eur Phys J B (2014) 87 152 Page 5 of 16

xi

E(xi)

Fig 1 A one-dimensional interpolation by a Gaussian ProcessRegression (GPR) The grey region represents the uncertaintyof the model for a given input x which is small near or atknown reference points (black dots) The solid curve is theGPR interpolation of the PES The likelihood is then low inthose grey regions with large widths and so sampling of newtraining points is directed towards these configurations

the fitting of the parameter vectors of θ and p Then σ2θ and p are found by the maximization of the likelihoodfunction L with these parameters modulating how corre-lated the Gaussian process values are over the coordinatespace The likelihood function describes the probabilitythat the parameters σ2 θ and p determine the PESand with the vector ε containing all Nref reference energiesε = [E(xi) i = 1 2 Nref ]T we have

L(θp σ2

)=

1

(2π)Nref2 (σ2)Nref2 |C|12

times exp[minus (ε minus Iμ)T Cminus1(ε minus Iμ)

2σ2

] (7)

I is a column vector of 1rsquos and σ2 can be expressed withrespect to θ and p Thus the likelihood function is maxi-mized with respect to θ and p and typically it is the log ofthe likelihood function that is maximized given partlogL

part(σ2) = 0In most applications the elements of p are set to 2 furthersimplifying the fitting of the likelihood parameters

New reference structures are added to the data set bydetermining the likelihood function for unseen examplesThose with the worst likelihood are then used as new ref-erence points The GPR is iteratively improved by addingmore examples to the reference set with the aim to im-prove the likelihood function for all reference and unseenstructures Ideally a model should yield high likelihoodfunction values for inputs within the data set but whichhas been constructed using the least amount of Gaussiansin order to ensure simplicity and optimum generalizationproperties when applied to new configurations This is par-ticularly useful if the amount of data is limited such as inthe case of using high-level electronic structure methodsto generate the reference data (Fig 1)

GPRs have already been shown to be applicableto similar types of problems as neural networks andspecifically in the work of the Popelier group GPRshave been compared to neural networks for modellingenvironment-dependent atom-centered electrostatic mul-tipole moments [33] It has been found that in this ap-plication GPRs can be even superior to neural networkswith almost a 50 decrease in the electrostatic inter-action energy error for the pentamer water cluster [33]

In the following years the use of GPRs in the Popeliergroup has been expanded and been used for the predic-tion of the electrostatic multipole moments for ethanol [34]and alanine [35] as well as the simulation of hydratedsodium ions [36] and the prediction hydrogen bondedcomplexes [37]

Bartok et al have used GPRs to construct PESs ofa variety of condensed systems [38] In this very accu-rate approach the total energy is expressed as a sum ofenvironment-dependent atomic energies and provided asa superposition of Gaussians centered at known referencepoints A central aspect of the method is the use of four-dimensional spherical harmonics to provide a structuralcharacterization which is invariant with respect to rota-tion and translation of the system [38ndash40] Their applica-tion of GPRs has been either to model the entire PES likefor carbon silicon germanium iron and GaN [38] or toprovide a correction to a DFT PES [40] In the latter ap-plication very accurate energies for water can be obtainedby first calculating the energy at the DFT-BLYP level oftheory [4142] The GPR model is then fitted in order tocorrect the one and two-body interactions using more ac-curate data eg from coupled cluster calculations Thusa more computationally affordable method can producehigh accuracy results without having the need to explic-itly compute the energies from an unaffordable high levelof theory

Gaussian functions have also been employed by Ruppet al [4344] to fit the atomization energies of organicmolecules For the structural description of a wide rangeof molecules a Coulomb matrix has been used While thismethod does not yet represent a PES suitable for atomisticsimulations as only the global minimum structure of eachmolecule can be described and as no forces are availablethis very promising technique has certainly the capabilitiesto become a serious alternative approach in the years tocome A first step in this direction has been taken veryrecently in a first study of conformational energies of thenatural product Archazolid A [45]

23 Modified Shepard interpolation

In 1994 Ischtwan and Collins developed a technique toconstruct the PESs of small molecules based on the mod-ified Shepard interpolation (MSI) method [46] The basicidea is to express the energy of an atomic configuration xas a sum of weighted second order Taylor expansions cen-tered at a set of reference points known from electronicstructure calculations Typically a set of 3Natom minus 6 in-verse interatomic distances is used to describe the atomicconfiguration x

Then for each reference point i the energy of a con-figuration x in its close environment is approximated bya second order Taylor series Ei(x) as

Ei (x) = E(xi)

+(x minus xi

)TGi

+12(x minus xi

)THi

(x minus xi

)+ (8)

Page 6 of 16 Eur Phys J B (2014) 87 152

where Gi is the gradient and Hi is the matrix of secondderivatives ie the Hessian at the position xi of the ref-erence point

Typically many reference points are available and theenergy prediction for an unknown configuration can beimproved significantly by employing a weighted sum ofthe Taylor expansions centered at all reference points

E (x) =Nrefsum

i=1

wi (x) middot Ei (x) (9)

The normalized weight wi of each individual Taylorexpansion is given by

wi(x) =vi (x)

sumNrefk=1 vk (x)

(10)

and can be determined from the unnormalized weightfunction

vi (x) =1

|x minus xi|p (11)

which approaches zero for very distant reference pointsand infinity for x = xi corresponding to a normalizedweight wi(x) = 1 The power parameter p needs to beequal or larger than the order of the Taylor expansionThe obtained PES has a number of favorable features Itdepends only on interatomic distances and therefore itis translationally and rotationally invariant Further thePES is also invariant with respect to a permutation ofchemically equivalent atoms

The accuracy of the method crucially depends on thechoice of the reference points as the approximation of asecond order Taylor expansion of the energy is only suffi-ciently accurate if there is a dense set of reference struc-tures It has been suggested by Ischtwan and Collins [47]to start with a reasonable first set of structures which canbe chosen along a reaction path and to improve the qual-ity of the PES by adding more structures in chemicallyrelevant parts of the configuration space These pointsare identified by running classical molecular dynamicstrajectories employing preliminary potentials which arethen iteratively refined by adding more and more pointsfrom electronic structure calculations carried out at theconfigurations visited in these trajectories

The modified Shepard interpolation scheme has beenapplied successfully to study chemical reaction dynamicsfor a number of systems like NH + H2 rarr NH2 + H [47]OH + H2 rarr H2O + H [4849] and many others [50ndash53] Ithas also been adapted to describe molecule-surface inter-actions using the dissociation of H2 at the Pt(111) surfaceas an example by Crespos et al [5455] To keep the di-mensionality and thus the complexity of the system at areasonable level the positions of the surface atoms havebeen frozen resulting in a six-dimensional PES As the hy-drogen atoms can move along the surface an extension ofequation (12) has been required to include the symmetryof the surface The modified energy expression is

E (x) =Nrefsum

i=1

Nsymsum

j=1

wij (x) middot Ei (x) (12)

The purpose of the second summation over all symmetryelements Nsym ensures that the full symmetry of the sur-face is taken into account exactly without the requirementto add structures which are equivalent by symmetry intothe reference set

The modified Shepard interpolation is very appealingbecause of its simplicity and the accuracy of the obtainedPESs but it has also a number of drawbacks First to dateits applicability is limited to very low-dimensional systemsand once the PES has been constructed the system sizecannot be changed because if an atom would be addedthe distance to the reference points in the Taylor expan-sions would not be defined Moreover for the prediction ofthe energy of a new configuration the reference set needs tobe known and a summation of all Taylor expansions is re-quired which makes the method more demanding for largereference sets Further the construction of the potentialrequires energies gradients and Hessians from electronicstructure calculations which makes the determination ofthe reference data rather costly

Given the similarities between Shepard interpolationand polynomial fitting in terms of applicability it is worthcomparing these methods in terms of how much data isrequired to achieve a fit for the CH+

5 system Huang et almade use of 20 728 training energies to fit a polynomialsurface [30] However Wu et al have been able to achievea similar fit using just 50 training examples employingMSI [5051] However these 50 training examples are notjust energies but include forces and Hessians In order todetermine these Hessians numerous energy calculationsmust be performed resulting in 19 440 energy calculationsndash a comparable number to that of Huang et al

24 Interpolating moving least squares

The standard implementation of interpolating movingleast squares (IMLS) [5657] which is a method derivedfrom the modified Shepard interpolation determines aconfigurationrsquos energy using a number of linearly inde-pendent basis functions which can be represented by amatrix The energy is then given as the sum of M basisfunctions bi of the atomic coordinates x where each basisfunction is weighted by a coefficient ai

E(x) =Msum

i=1

ai(x)bi(x) (13)

The basis functions have either the form of a Taylorseries [57] or a many-body expansion similar to thoseused in high-dimensional model representation neu-ral networks [58] The coefficients are determined byminimization of the objective function

D[E(x)] =Nrefsum

i=1

wi(x)

⎣Msum

j=1

aj(x)bj(x) minus E(xi)

⎦2

(14)

which is the weighted sum of squared potential energy er-rors with wi(x) being the weight defined by the prox-imity of the trial structure to a reference structure i

Eur Phys J B (2014) 87 152 Page 7 of 16

Hence the largest weight would be found if the trial con-figuration coincides with a configuration in the referenceset The method is easily expanded to include further fit-ting data such as forces and the Hessians of the train-ing points [5859] which expands the design matrix ndash thematrix containing the simultaneous equations that mustbe solved in order to find the best combination of basisfunctions The coefficients aj are detemined by solving

BTWBa = BTWV (15)

Here V is the vector of reference energies and a isthe transpose of the vector containing the coefficientsW is a NreftimesNref diagonal matrix containing the weightswhere Wij = wi(x)δij B is the Nref times M design matrix

Initial applications of the IMLS method focused onfitting available PESs for HN2 [60] HOOH [5861] andHCN [58] before moving on to PESs computed using ac-curate ab initio methods such as for HOOH CH4 andHCN [59] CH2 and HCN [62] as well as the O+ HClsystem [63] More recently Dawes et al have appliedIMLS fitted PES to van der Waals systems including theCO [64] NNO [65] and OCS [66] dimers described asfour-dimensional PESs where data points have been ob-tained from CCSD(T) calculations This method of datageneration is suitable for IMLS as accurate fits can beconstructed from relatively small amounts of data Thislevel of accuracy is often required eg for the predictionof rovibrational spectra

The basis functions used by Dawes et al take the formof numerous many-body terms that together form a high-dimensional model representation [67] High order termsare not included as basis functions as they would be poorlydetermined in scenarios where the PES has been sparselysampled

In order to perform the fitting of the HOOH PESDawes et al used and iteratively sampled trainingset which eventually included almost 2000 referencepoints [58] However compared to Shepard interpolationmodels and neural networks IMLS models are costly toevaluate Dawes et al showed that IMLS can be used as aprocedure to create an accurate PES from a sparse set ofreference data points This IMLS PES is then employedto generate training points that are in turn used to traina Shepard interpolation model and a neural network

Using two IMLS models where one uses basis func-tions that are an order of degree higher than those usedin the other model can guide the sampling of the PESRegions with a large error between the two models whichwill occur between the current training points will be theideal locations to sample and in turn improve the modelIn this way the manner in which the PES is sampled issimilar to GRPs where the sampling is sparse while non-stochastic in nature [59] The final model used by Daweset al in this example is a modification of IMLS a local-IMLS [68] which is similar to the Shepard interpolationscheme Local-IMLS computes energies for a configura-tion by using the reference structures that are similarrather than using all reference configurations Howeverlocal-IMLS is more general since the basis functions can

Fig 2 A small two-dimensional feed-forward Neural Network(NN) containing a single hidden layer The input nodes G1 andG2 provide the structural information to the NN which thenyields the energy E Apart from the architecture of the NNie the number of layer and the number of nodes per layerthe NN output is determined by the numerical values of theweight parameters which are shown as arrows indicating theflow of information through the NN The functional form ofthis NN is given in equation (17)

take any form rather than just the Taylor series expan-sions used in Shepard interpolation For the HOOH PES asimilar accuracy compared to previous work was achievedusing 30 less reference points

25 Neural network potentials

Artificial neural networks (NNs) [6970] have first been in-troduced already in 1943 to develop mathematical modelsof the signal processing in the brain [71] and after contin-uous methodical extensions in the following decades theyhave now become an established research topic in math-ematics and computer science with many applications invarious fields Also in chemistry and physics they havefound wide use eg for the analysis of spectra and pat-tern recognition in experimental data [72] The use of NNsto construct PESs employing reference data from elec-tronic structure calculations has been suggested by Blanket al [73] almost twenty years ago and in the meantimeneural network potentials (NNPs) have been reported fora number of molecules [74ndash76] and molecules interactingwith surfaces [77ndash79] A comprehensive overview aboutthe systems that have been addressed by NNPs can befound in several recent reviews [80ndash82]

The basic component of almost any NNP is a feed-forward NN and a small example corresponding to atwo-dimensional PES is shown in Figure 2 It consistsof artificial neurons or nodes which are arranged inlayers The neurons in the input layer represent thecoordinates Gi defining the atomic configuration the neu-ron in the output layer yields the energy of this configu-ration In between the input and the output layer thereare one or more so-called hidden layers each of whichcontains a number of additional neurons The purpose of

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 3: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152 Page 3 of 16

or approximations Instead the goal of these ldquomathemati-cal potentialsrdquo is to reproduce a given set of reference datafrom electronic structure calculations as accurately as pos-sible Due to the purely mathematical nature of these po-tentials they are able to describe all types of bonding fromdispersion interactions via covalent bonds to metallic sys-tems with similar accuracy employing the same functionalform Many of these methods which have emerged only inrecent years in the literature in the context of PESs areoften also summarized under the keyword ldquomachine learn-ingrdquo potentials They can be constructed using a varietyof different functional units like Gaussians Taylor seriesexpansions sigmoid-like functions and many others andoften a superposition or some other form of combinationof a large number of these functions is used to expresscomplicated multidimensional PESs Atomistic potentialsof this type are still not widely distributed but their de-velopment has seen enormous progress in recent years andthey have now reached a level of maturity making theminteresting candidates for solving problems in chemistryand condensed matter physics

When constructing accurate atomistic potentials anumber of substantial challenges have to be met

1 Accuracy ndash The energy and forces provided by the po-tential should be as close as possible to the data ofreference electronic structure calculations A manda-tory condition to reach this goal is that the potentialis sufficiently high-dimensional Only then it will beable to capture the effects of polarization and chargetransfer and the consideration of many-body effectsis also most important for describing metals Furtherthe potential should be transferable since a potentialis only useful if it is applicable to structures whichhave not been used for its construction

2 Efficiency ndash Once constructed the potential should beas fast as possible to evaluate to enable the exten-sion of the time and length scales of atomistic simula-tions clearly beyond the realm of electronic structuremethods

3 Generality ndash The potential should have a universal andunbiased functional form ie it should describe alltypes of atomic interactions with the same accuracyFurther the functional form should enable the calcula-tion of analytic derivatives to determine forces neededin molecular dynamics simulations

4 Reactivity ndash The potential should not require the speci-fication of fixed bonding patterns or a further discrim-ination of different atom types for a given elementInstead it should be able to describe the making andbreaking of bonds and arbitrary structural changeswhich is equivalent to depending solely on the positionsand nuclear charges of the atoms

5 Automation ndash The parametrization of the potential fora specific system should require a minimum of humaneffort

6 Costs ndash The amount of demanding electronic structuredata required to construct the potential should be assmall as possible

Apart from criterion three all these points are equallyimportant for ldquophysicalrdquo and ldquomathematicalrdquo potentialsand meeting all these requirements to full satisfaction isvery difficult to achieve for any type of potential currentlyavailable

Some further aspects of the construction of potentialsneed to be discussed here which are specific for purelymathematical potentials while they usually do not rep-resent serious problems for conventional potentials Theyconcern some invariances of the PES that need to be in-corporated in a proper way First the potential-energy ofa system must be invariant with respect to rotation andtranslation of the system since only the relative atomicpositions are important for the energy and forces Furtherthe energy of a system must not change upon interchangeof the positions of any two atoms of the same chemicalelement Including this permutation invariance is straight-forward eg in classical force fields which express the en-ergy as a sum of many individual low-dimensional termsor in electronic structure methods which incorporate thisinvariance by the diagonalization of the Hamiltonian Incase of high-order many-body atomistic potentials whichare required to reach an accuracy comparable to that offirst-principles methods including this permutation in-variance is one of the main conceptual challenges In thefollowing sections some methods which have evolved inrecent years as general-purpose potentials for a wide rangeof systems are highlighted and their properties withrespect to the criteria mentioned above will be discussed

The ultimate goal of such atomisitic potentials is thesimulation of large periodic and non-periodic systemsranging from large molecules bulk liquids and solids toextended interfaces for which electronic structure meth-ods are computationally too demanding to use or are noteven tractable

2 General-purpose atomistic potentials

21 Polynomial fitting

In 2003 Brown et al introduced a method employing poly-nomials as basis functions to construct PESs of molec-ular systems [25] The structural description of the sys-tem starts with a full set of interatomic distances likeR12 R13 R14 R23 R24 R34 for the example of a tetra-atomic molecule A4 The Rij are then transformed to aset of Morse variables defined as

Yij = exp (minusRijγ) (1)

with γ being a constant between 15 and 3 Bohr Sincedistances as well as Morse variables represent internal co-ordinates whose values do not change with rotation andtranslation of the molecule this description ensures trans-lational and rotational invariance of the potential A fulldistance matrix is however not invariant with respect toa permutation of like atoms Therefore a straightforward

Page 4 of 16 Eur Phys J B (2014) 87 152

construction of the PES as a linear combination of poly-nomials [26]

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f =0

Cabcdef

[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(2)

with coefficients Cabcdef does not exhibit this invari-ance Here a b c d e and f are non-negative integerswith a sum between zero and the maximum order nmax

of the polynomials to be considered Typically the largestexponents used are five or six

In order to obtain a permutationally invariant ba-sis set the monomials are symmetrized employing asymmetry operator S

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f=0

DabcdefS[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(3)

which yields the same basis functions irrespective ofthe order of chemically equivalent atoms All coefficientsDabcdef of the symmetrized permutationally invariantpolynomials are fitted simultaneously using least-squaresfitting procedures to a database of high-level ab initio cal-culations which contains typically several tens of thou-sands of data points The specific form of the symmetrizedterms whose number can be substantial depends on thechemical composition of the molecule and to date themethod has been applied to systems containing up to10 atoms More details about the symmetrization proce-dure for which efficient methods are available [26] andthe resulting terms can be found elsewhere [27]

Although the complexity of the method currently re-quires a truncation of the order of the polynomials at com-parably low order and restricts the formal dimensionalityto that of small molecules the resulting potentials can stillbe applied to large systems like bulk water using two- andthree-body terms only [28] This is possible since in molec-ular systems high-order many-body interactions are usu-ally not essential Further examples for successful applica-tions of the method are various charged and neutral waterclusters [262930] the vinyl radical [31] and many othersThus while the construction of PESs using polynomialsis a powerful technique for the investigation of dynam-ics reactions dissociation and conformational changes forsmall molecules or systems composed of these there hasbeen no work towards implementing the technique withrespect to the creation of accurate potentials that can beused to model systems like metals and semiconductors

22 Gaussian processes regression

Gaussian process regression (GPR) [32] also known asKriging is a method that models observables as a real-ization of a underlying statistical probabilistic model ndash

the probability density function The basic concept is thattraining examples with correlated observables ie ener-gies have also a close correlation in their variables iethe coordinates describing the atomic configuration Theassumption then is that the function which models thenon-linear relationship between the variables and the ob-servables the PES is smoothly varying The underlyingstatistical model the probability density function deter-mines the probabilities that for a given set of inputs aparticular output is observed

Since the output is known for a number of configura-tions from reference electronic structure calculations thereverse question can be addressed Can the probabilitydensity function be found that best models the PES for agiven number of structures and their associated energiesand takes into account the reliability of predictions in theregions near to the known reference points

All reference data for the GPR model come from adeterministic function which operates on a set of inputcoordinates where reference point i has coordinates xi =(xi

1 xiNdim

) and Ndim is the number of dimensions ordegrees of freedom of the system

E(xi) = z(xi) +sum

h

βhfh(xi) (4)

The fh are some linear or non-linear functions of x andβ are coefficients where h runs over all functions thatdefine the PES It then follows that because the referencedata comes from a deterministic source the ldquoerrorsrdquo arenot random but repeatable for each configuration Theseerrors can then be described as a further energy functionwhose values are smoothly varying and correlated Theerror within the reference data z(xi) can be modelled byGaussian functions and sampled points that are similar inthe input space have similar errors Thus the energy for adata point is the sum of two varying functions ndash the errornoise and the deterministic source function Consequentlythe energy prediction can be reformulated as

E(xi) = z(xi) + μ (5)

The error is the deviation z(xi) of the function value fromthe global average μ which is provided by the determin-istic source and also includes the error due to noise inthe sampled data This is now expressed by the GPRThe required correlation is constructed from Gaussianone for each of the known reference structures The en-ergy prediction then relies on these Gaussian functionswhere the contribution by each function depends uponthe correlation between the reference configurations andthe trial configuration Given that two similar input vec-tors xi and xj have comparable errors this similarity canbe represented by a correlation matrix C with elements

Cij = Corr(z(xi) z(xj))

= exp

(minus

Ndimsum

h=1

θh|xih minus xj

h|ph

)

(6)

θh and ph are two parameters of the model with θh ge 0and 1le ph le2 Fitting the model then relies on a) the de-termination of the variance σ2 of the Gaussians and b)

Eur Phys J B (2014) 87 152 Page 5 of 16

xi

E(xi)

Fig 1 A one-dimensional interpolation by a Gaussian ProcessRegression (GPR) The grey region represents the uncertaintyof the model for a given input x which is small near or atknown reference points (black dots) The solid curve is theGPR interpolation of the PES The likelihood is then low inthose grey regions with large widths and so sampling of newtraining points is directed towards these configurations

the fitting of the parameter vectors of θ and p Then σ2θ and p are found by the maximization of the likelihoodfunction L with these parameters modulating how corre-lated the Gaussian process values are over the coordinatespace The likelihood function describes the probabilitythat the parameters σ2 θ and p determine the PESand with the vector ε containing all Nref reference energiesε = [E(xi) i = 1 2 Nref ]T we have

L(θp σ2

)=

1

(2π)Nref2 (σ2)Nref2 |C|12

times exp[minus (ε minus Iμ)T Cminus1(ε minus Iμ)

2σ2

] (7)

I is a column vector of 1rsquos and σ2 can be expressed withrespect to θ and p Thus the likelihood function is maxi-mized with respect to θ and p and typically it is the log ofthe likelihood function that is maximized given partlogL

part(σ2) = 0In most applications the elements of p are set to 2 furthersimplifying the fitting of the likelihood parameters

New reference structures are added to the data set bydetermining the likelihood function for unseen examplesThose with the worst likelihood are then used as new ref-erence points The GPR is iteratively improved by addingmore examples to the reference set with the aim to im-prove the likelihood function for all reference and unseenstructures Ideally a model should yield high likelihoodfunction values for inputs within the data set but whichhas been constructed using the least amount of Gaussiansin order to ensure simplicity and optimum generalizationproperties when applied to new configurations This is par-ticularly useful if the amount of data is limited such as inthe case of using high-level electronic structure methodsto generate the reference data (Fig 1)

GPRs have already been shown to be applicableto similar types of problems as neural networks andspecifically in the work of the Popelier group GPRshave been compared to neural networks for modellingenvironment-dependent atom-centered electrostatic mul-tipole moments [33] It has been found that in this ap-plication GPRs can be even superior to neural networkswith almost a 50 decrease in the electrostatic inter-action energy error for the pentamer water cluster [33]

In the following years the use of GPRs in the Popeliergroup has been expanded and been used for the predic-tion of the electrostatic multipole moments for ethanol [34]and alanine [35] as well as the simulation of hydratedsodium ions [36] and the prediction hydrogen bondedcomplexes [37]

Bartok et al have used GPRs to construct PESs ofa variety of condensed systems [38] In this very accu-rate approach the total energy is expressed as a sum ofenvironment-dependent atomic energies and provided asa superposition of Gaussians centered at known referencepoints A central aspect of the method is the use of four-dimensional spherical harmonics to provide a structuralcharacterization which is invariant with respect to rota-tion and translation of the system [38ndash40] Their applica-tion of GPRs has been either to model the entire PES likefor carbon silicon germanium iron and GaN [38] or toprovide a correction to a DFT PES [40] In the latter ap-plication very accurate energies for water can be obtainedby first calculating the energy at the DFT-BLYP level oftheory [4142] The GPR model is then fitted in order tocorrect the one and two-body interactions using more ac-curate data eg from coupled cluster calculations Thusa more computationally affordable method can producehigh accuracy results without having the need to explic-itly compute the energies from an unaffordable high levelof theory

Gaussian functions have also been employed by Ruppet al [4344] to fit the atomization energies of organicmolecules For the structural description of a wide rangeof molecules a Coulomb matrix has been used While thismethod does not yet represent a PES suitable for atomisticsimulations as only the global minimum structure of eachmolecule can be described and as no forces are availablethis very promising technique has certainly the capabilitiesto become a serious alternative approach in the years tocome A first step in this direction has been taken veryrecently in a first study of conformational energies of thenatural product Archazolid A [45]

23 Modified Shepard interpolation

In 1994 Ischtwan and Collins developed a technique toconstruct the PESs of small molecules based on the mod-ified Shepard interpolation (MSI) method [46] The basicidea is to express the energy of an atomic configuration xas a sum of weighted second order Taylor expansions cen-tered at a set of reference points known from electronicstructure calculations Typically a set of 3Natom minus 6 in-verse interatomic distances is used to describe the atomicconfiguration x

Then for each reference point i the energy of a con-figuration x in its close environment is approximated bya second order Taylor series Ei(x) as

Ei (x) = E(xi)

+(x minus xi

)TGi

+12(x minus xi

)THi

(x minus xi

)+ (8)

Page 6 of 16 Eur Phys J B (2014) 87 152

where Gi is the gradient and Hi is the matrix of secondderivatives ie the Hessian at the position xi of the ref-erence point

Typically many reference points are available and theenergy prediction for an unknown configuration can beimproved significantly by employing a weighted sum ofthe Taylor expansions centered at all reference points

E (x) =Nrefsum

i=1

wi (x) middot Ei (x) (9)

The normalized weight wi of each individual Taylorexpansion is given by

wi(x) =vi (x)

sumNrefk=1 vk (x)

(10)

and can be determined from the unnormalized weightfunction

vi (x) =1

|x minus xi|p (11)

which approaches zero for very distant reference pointsand infinity for x = xi corresponding to a normalizedweight wi(x) = 1 The power parameter p needs to beequal or larger than the order of the Taylor expansionThe obtained PES has a number of favorable features Itdepends only on interatomic distances and therefore itis translationally and rotationally invariant Further thePES is also invariant with respect to a permutation ofchemically equivalent atoms

The accuracy of the method crucially depends on thechoice of the reference points as the approximation of asecond order Taylor expansion of the energy is only suffi-ciently accurate if there is a dense set of reference struc-tures It has been suggested by Ischtwan and Collins [47]to start with a reasonable first set of structures which canbe chosen along a reaction path and to improve the qual-ity of the PES by adding more structures in chemicallyrelevant parts of the configuration space These pointsare identified by running classical molecular dynamicstrajectories employing preliminary potentials which arethen iteratively refined by adding more and more pointsfrom electronic structure calculations carried out at theconfigurations visited in these trajectories

The modified Shepard interpolation scheme has beenapplied successfully to study chemical reaction dynamicsfor a number of systems like NH + H2 rarr NH2 + H [47]OH + H2 rarr H2O + H [4849] and many others [50ndash53] Ithas also been adapted to describe molecule-surface inter-actions using the dissociation of H2 at the Pt(111) surfaceas an example by Crespos et al [5455] To keep the di-mensionality and thus the complexity of the system at areasonable level the positions of the surface atoms havebeen frozen resulting in a six-dimensional PES As the hy-drogen atoms can move along the surface an extension ofequation (12) has been required to include the symmetryof the surface The modified energy expression is

E (x) =Nrefsum

i=1

Nsymsum

j=1

wij (x) middot Ei (x) (12)

The purpose of the second summation over all symmetryelements Nsym ensures that the full symmetry of the sur-face is taken into account exactly without the requirementto add structures which are equivalent by symmetry intothe reference set

The modified Shepard interpolation is very appealingbecause of its simplicity and the accuracy of the obtainedPESs but it has also a number of drawbacks First to dateits applicability is limited to very low-dimensional systemsand once the PES has been constructed the system sizecannot be changed because if an atom would be addedthe distance to the reference points in the Taylor expan-sions would not be defined Moreover for the prediction ofthe energy of a new configuration the reference set needs tobe known and a summation of all Taylor expansions is re-quired which makes the method more demanding for largereference sets Further the construction of the potentialrequires energies gradients and Hessians from electronicstructure calculations which makes the determination ofthe reference data rather costly

Given the similarities between Shepard interpolationand polynomial fitting in terms of applicability it is worthcomparing these methods in terms of how much data isrequired to achieve a fit for the CH+

5 system Huang et almade use of 20 728 training energies to fit a polynomialsurface [30] However Wu et al have been able to achievea similar fit using just 50 training examples employingMSI [5051] However these 50 training examples are notjust energies but include forces and Hessians In order todetermine these Hessians numerous energy calculationsmust be performed resulting in 19 440 energy calculationsndash a comparable number to that of Huang et al

24 Interpolating moving least squares

The standard implementation of interpolating movingleast squares (IMLS) [5657] which is a method derivedfrom the modified Shepard interpolation determines aconfigurationrsquos energy using a number of linearly inde-pendent basis functions which can be represented by amatrix The energy is then given as the sum of M basisfunctions bi of the atomic coordinates x where each basisfunction is weighted by a coefficient ai

E(x) =Msum

i=1

ai(x)bi(x) (13)

The basis functions have either the form of a Taylorseries [57] or a many-body expansion similar to thoseused in high-dimensional model representation neu-ral networks [58] The coefficients are determined byminimization of the objective function

D[E(x)] =Nrefsum

i=1

wi(x)

⎣Msum

j=1

aj(x)bj(x) minus E(xi)

⎦2

(14)

which is the weighted sum of squared potential energy er-rors with wi(x) being the weight defined by the prox-imity of the trial structure to a reference structure i

Eur Phys J B (2014) 87 152 Page 7 of 16

Hence the largest weight would be found if the trial con-figuration coincides with a configuration in the referenceset The method is easily expanded to include further fit-ting data such as forces and the Hessians of the train-ing points [5859] which expands the design matrix ndash thematrix containing the simultaneous equations that mustbe solved in order to find the best combination of basisfunctions The coefficients aj are detemined by solving

BTWBa = BTWV (15)

Here V is the vector of reference energies and a isthe transpose of the vector containing the coefficientsW is a NreftimesNref diagonal matrix containing the weightswhere Wij = wi(x)δij B is the Nref times M design matrix

Initial applications of the IMLS method focused onfitting available PESs for HN2 [60] HOOH [5861] andHCN [58] before moving on to PESs computed using ac-curate ab initio methods such as for HOOH CH4 andHCN [59] CH2 and HCN [62] as well as the O+ HClsystem [63] More recently Dawes et al have appliedIMLS fitted PES to van der Waals systems including theCO [64] NNO [65] and OCS [66] dimers described asfour-dimensional PESs where data points have been ob-tained from CCSD(T) calculations This method of datageneration is suitable for IMLS as accurate fits can beconstructed from relatively small amounts of data Thislevel of accuracy is often required eg for the predictionof rovibrational spectra

The basis functions used by Dawes et al take the formof numerous many-body terms that together form a high-dimensional model representation [67] High order termsare not included as basis functions as they would be poorlydetermined in scenarios where the PES has been sparselysampled

In order to perform the fitting of the HOOH PESDawes et al used and iteratively sampled trainingset which eventually included almost 2000 referencepoints [58] However compared to Shepard interpolationmodels and neural networks IMLS models are costly toevaluate Dawes et al showed that IMLS can be used as aprocedure to create an accurate PES from a sparse set ofreference data points This IMLS PES is then employedto generate training points that are in turn used to traina Shepard interpolation model and a neural network

Using two IMLS models where one uses basis func-tions that are an order of degree higher than those usedin the other model can guide the sampling of the PESRegions with a large error between the two models whichwill occur between the current training points will be theideal locations to sample and in turn improve the modelIn this way the manner in which the PES is sampled issimilar to GRPs where the sampling is sparse while non-stochastic in nature [59] The final model used by Daweset al in this example is a modification of IMLS a local-IMLS [68] which is similar to the Shepard interpolationscheme Local-IMLS computes energies for a configura-tion by using the reference structures that are similarrather than using all reference configurations Howeverlocal-IMLS is more general since the basis functions can

Fig 2 A small two-dimensional feed-forward Neural Network(NN) containing a single hidden layer The input nodes G1 andG2 provide the structural information to the NN which thenyields the energy E Apart from the architecture of the NNie the number of layer and the number of nodes per layerthe NN output is determined by the numerical values of theweight parameters which are shown as arrows indicating theflow of information through the NN The functional form ofthis NN is given in equation (17)

take any form rather than just the Taylor series expan-sions used in Shepard interpolation For the HOOH PES asimilar accuracy compared to previous work was achievedusing 30 less reference points

25 Neural network potentials

Artificial neural networks (NNs) [6970] have first been in-troduced already in 1943 to develop mathematical modelsof the signal processing in the brain [71] and after contin-uous methodical extensions in the following decades theyhave now become an established research topic in math-ematics and computer science with many applications invarious fields Also in chemistry and physics they havefound wide use eg for the analysis of spectra and pat-tern recognition in experimental data [72] The use of NNsto construct PESs employing reference data from elec-tronic structure calculations has been suggested by Blanket al [73] almost twenty years ago and in the meantimeneural network potentials (NNPs) have been reported fora number of molecules [74ndash76] and molecules interactingwith surfaces [77ndash79] A comprehensive overview aboutthe systems that have been addressed by NNPs can befound in several recent reviews [80ndash82]

The basic component of almost any NNP is a feed-forward NN and a small example corresponding to atwo-dimensional PES is shown in Figure 2 It consistsof artificial neurons or nodes which are arranged inlayers The neurons in the input layer represent thecoordinates Gi defining the atomic configuration the neu-ron in the output layer yields the energy of this configu-ration In between the input and the output layer thereare one or more so-called hidden layers each of whichcontains a number of additional neurons The purpose of

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 4: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 4 of 16 Eur Phys J B (2014) 87 152

construction of the PES as a linear combination of poly-nomials [26]

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f =0

Cabcdef

[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(2)

with coefficients Cabcdef does not exhibit this invari-ance Here a b c d e and f are non-negative integerswith a sum between zero and the maximum order nmax

of the polynomials to be considered Typically the largestexponents used are five or six

In order to obtain a permutationally invariant ba-sis set the monomials are symmetrized employing asymmetry operator S

E (Y12 Y13 Y14 Y23 Y24 Y34)

=nmaxsum

a+b+c+d+e+f=0

DabcdefS[Y a

12 Yb13 Y

c14 Y

d23 Y

e24 Y

f34

]

(3)

which yields the same basis functions irrespective ofthe order of chemically equivalent atoms All coefficientsDabcdef of the symmetrized permutationally invariantpolynomials are fitted simultaneously using least-squaresfitting procedures to a database of high-level ab initio cal-culations which contains typically several tens of thou-sands of data points The specific form of the symmetrizedterms whose number can be substantial depends on thechemical composition of the molecule and to date themethod has been applied to systems containing up to10 atoms More details about the symmetrization proce-dure for which efficient methods are available [26] andthe resulting terms can be found elsewhere [27]

Although the complexity of the method currently re-quires a truncation of the order of the polynomials at com-parably low order and restricts the formal dimensionalityto that of small molecules the resulting potentials can stillbe applied to large systems like bulk water using two- andthree-body terms only [28] This is possible since in molec-ular systems high-order many-body interactions are usu-ally not essential Further examples for successful applica-tions of the method are various charged and neutral waterclusters [262930] the vinyl radical [31] and many othersThus while the construction of PESs using polynomialsis a powerful technique for the investigation of dynam-ics reactions dissociation and conformational changes forsmall molecules or systems composed of these there hasbeen no work towards implementing the technique withrespect to the creation of accurate potentials that can beused to model systems like metals and semiconductors

22 Gaussian processes regression

Gaussian process regression (GPR) [32] also known asKriging is a method that models observables as a real-ization of a underlying statistical probabilistic model ndash

the probability density function The basic concept is thattraining examples with correlated observables ie ener-gies have also a close correlation in their variables iethe coordinates describing the atomic configuration Theassumption then is that the function which models thenon-linear relationship between the variables and the ob-servables the PES is smoothly varying The underlyingstatistical model the probability density function deter-mines the probabilities that for a given set of inputs aparticular output is observed

Since the output is known for a number of configura-tions from reference electronic structure calculations thereverse question can be addressed Can the probabilitydensity function be found that best models the PES for agiven number of structures and their associated energiesand takes into account the reliability of predictions in theregions near to the known reference points

All reference data for the GPR model come from adeterministic function which operates on a set of inputcoordinates where reference point i has coordinates xi =(xi

1 xiNdim

) and Ndim is the number of dimensions ordegrees of freedom of the system

E(xi) = z(xi) +sum

h

βhfh(xi) (4)

The fh are some linear or non-linear functions of x andβ are coefficients where h runs over all functions thatdefine the PES It then follows that because the referencedata comes from a deterministic source the ldquoerrorsrdquo arenot random but repeatable for each configuration Theseerrors can then be described as a further energy functionwhose values are smoothly varying and correlated Theerror within the reference data z(xi) can be modelled byGaussian functions and sampled points that are similar inthe input space have similar errors Thus the energy for adata point is the sum of two varying functions ndash the errornoise and the deterministic source function Consequentlythe energy prediction can be reformulated as

E(xi) = z(xi) + μ (5)

The error is the deviation z(xi) of the function value fromthe global average μ which is provided by the determin-istic source and also includes the error due to noise inthe sampled data This is now expressed by the GPRThe required correlation is constructed from Gaussianone for each of the known reference structures The en-ergy prediction then relies on these Gaussian functionswhere the contribution by each function depends uponthe correlation between the reference configurations andthe trial configuration Given that two similar input vec-tors xi and xj have comparable errors this similarity canbe represented by a correlation matrix C with elements

Cij = Corr(z(xi) z(xj))

= exp

(minus

Ndimsum

h=1

θh|xih minus xj

h|ph

)

(6)

θh and ph are two parameters of the model with θh ge 0and 1le ph le2 Fitting the model then relies on a) the de-termination of the variance σ2 of the Gaussians and b)

Eur Phys J B (2014) 87 152 Page 5 of 16

xi

E(xi)

Fig 1 A one-dimensional interpolation by a Gaussian ProcessRegression (GPR) The grey region represents the uncertaintyof the model for a given input x which is small near or atknown reference points (black dots) The solid curve is theGPR interpolation of the PES The likelihood is then low inthose grey regions with large widths and so sampling of newtraining points is directed towards these configurations

the fitting of the parameter vectors of θ and p Then σ2θ and p are found by the maximization of the likelihoodfunction L with these parameters modulating how corre-lated the Gaussian process values are over the coordinatespace The likelihood function describes the probabilitythat the parameters σ2 θ and p determine the PESand with the vector ε containing all Nref reference energiesε = [E(xi) i = 1 2 Nref ]T we have

L(θp σ2

)=

1

(2π)Nref2 (σ2)Nref2 |C|12

times exp[minus (ε minus Iμ)T Cminus1(ε minus Iμ)

2σ2

] (7)

I is a column vector of 1rsquos and σ2 can be expressed withrespect to θ and p Thus the likelihood function is maxi-mized with respect to θ and p and typically it is the log ofthe likelihood function that is maximized given partlogL

part(σ2) = 0In most applications the elements of p are set to 2 furthersimplifying the fitting of the likelihood parameters

New reference structures are added to the data set bydetermining the likelihood function for unseen examplesThose with the worst likelihood are then used as new ref-erence points The GPR is iteratively improved by addingmore examples to the reference set with the aim to im-prove the likelihood function for all reference and unseenstructures Ideally a model should yield high likelihoodfunction values for inputs within the data set but whichhas been constructed using the least amount of Gaussiansin order to ensure simplicity and optimum generalizationproperties when applied to new configurations This is par-ticularly useful if the amount of data is limited such as inthe case of using high-level electronic structure methodsto generate the reference data (Fig 1)

GPRs have already been shown to be applicableto similar types of problems as neural networks andspecifically in the work of the Popelier group GPRshave been compared to neural networks for modellingenvironment-dependent atom-centered electrostatic mul-tipole moments [33] It has been found that in this ap-plication GPRs can be even superior to neural networkswith almost a 50 decrease in the electrostatic inter-action energy error for the pentamer water cluster [33]

In the following years the use of GPRs in the Popeliergroup has been expanded and been used for the predic-tion of the electrostatic multipole moments for ethanol [34]and alanine [35] as well as the simulation of hydratedsodium ions [36] and the prediction hydrogen bondedcomplexes [37]

Bartok et al have used GPRs to construct PESs ofa variety of condensed systems [38] In this very accu-rate approach the total energy is expressed as a sum ofenvironment-dependent atomic energies and provided asa superposition of Gaussians centered at known referencepoints A central aspect of the method is the use of four-dimensional spherical harmonics to provide a structuralcharacterization which is invariant with respect to rota-tion and translation of the system [38ndash40] Their applica-tion of GPRs has been either to model the entire PES likefor carbon silicon germanium iron and GaN [38] or toprovide a correction to a DFT PES [40] In the latter ap-plication very accurate energies for water can be obtainedby first calculating the energy at the DFT-BLYP level oftheory [4142] The GPR model is then fitted in order tocorrect the one and two-body interactions using more ac-curate data eg from coupled cluster calculations Thusa more computationally affordable method can producehigh accuracy results without having the need to explic-itly compute the energies from an unaffordable high levelof theory

Gaussian functions have also been employed by Ruppet al [4344] to fit the atomization energies of organicmolecules For the structural description of a wide rangeof molecules a Coulomb matrix has been used While thismethod does not yet represent a PES suitable for atomisticsimulations as only the global minimum structure of eachmolecule can be described and as no forces are availablethis very promising technique has certainly the capabilitiesto become a serious alternative approach in the years tocome A first step in this direction has been taken veryrecently in a first study of conformational energies of thenatural product Archazolid A [45]

23 Modified Shepard interpolation

In 1994 Ischtwan and Collins developed a technique toconstruct the PESs of small molecules based on the mod-ified Shepard interpolation (MSI) method [46] The basicidea is to express the energy of an atomic configuration xas a sum of weighted second order Taylor expansions cen-tered at a set of reference points known from electronicstructure calculations Typically a set of 3Natom minus 6 in-verse interatomic distances is used to describe the atomicconfiguration x

Then for each reference point i the energy of a con-figuration x in its close environment is approximated bya second order Taylor series Ei(x) as

Ei (x) = E(xi)

+(x minus xi

)TGi

+12(x minus xi

)THi

(x minus xi

)+ (8)

Page 6 of 16 Eur Phys J B (2014) 87 152

where Gi is the gradient and Hi is the matrix of secondderivatives ie the Hessian at the position xi of the ref-erence point

Typically many reference points are available and theenergy prediction for an unknown configuration can beimproved significantly by employing a weighted sum ofthe Taylor expansions centered at all reference points

E (x) =Nrefsum

i=1

wi (x) middot Ei (x) (9)

The normalized weight wi of each individual Taylorexpansion is given by

wi(x) =vi (x)

sumNrefk=1 vk (x)

(10)

and can be determined from the unnormalized weightfunction

vi (x) =1

|x minus xi|p (11)

which approaches zero for very distant reference pointsand infinity for x = xi corresponding to a normalizedweight wi(x) = 1 The power parameter p needs to beequal or larger than the order of the Taylor expansionThe obtained PES has a number of favorable features Itdepends only on interatomic distances and therefore itis translationally and rotationally invariant Further thePES is also invariant with respect to a permutation ofchemically equivalent atoms

The accuracy of the method crucially depends on thechoice of the reference points as the approximation of asecond order Taylor expansion of the energy is only suffi-ciently accurate if there is a dense set of reference struc-tures It has been suggested by Ischtwan and Collins [47]to start with a reasonable first set of structures which canbe chosen along a reaction path and to improve the qual-ity of the PES by adding more structures in chemicallyrelevant parts of the configuration space These pointsare identified by running classical molecular dynamicstrajectories employing preliminary potentials which arethen iteratively refined by adding more and more pointsfrom electronic structure calculations carried out at theconfigurations visited in these trajectories

The modified Shepard interpolation scheme has beenapplied successfully to study chemical reaction dynamicsfor a number of systems like NH + H2 rarr NH2 + H [47]OH + H2 rarr H2O + H [4849] and many others [50ndash53] Ithas also been adapted to describe molecule-surface inter-actions using the dissociation of H2 at the Pt(111) surfaceas an example by Crespos et al [5455] To keep the di-mensionality and thus the complexity of the system at areasonable level the positions of the surface atoms havebeen frozen resulting in a six-dimensional PES As the hy-drogen atoms can move along the surface an extension ofequation (12) has been required to include the symmetryof the surface The modified energy expression is

E (x) =Nrefsum

i=1

Nsymsum

j=1

wij (x) middot Ei (x) (12)

The purpose of the second summation over all symmetryelements Nsym ensures that the full symmetry of the sur-face is taken into account exactly without the requirementto add structures which are equivalent by symmetry intothe reference set

The modified Shepard interpolation is very appealingbecause of its simplicity and the accuracy of the obtainedPESs but it has also a number of drawbacks First to dateits applicability is limited to very low-dimensional systemsand once the PES has been constructed the system sizecannot be changed because if an atom would be addedthe distance to the reference points in the Taylor expan-sions would not be defined Moreover for the prediction ofthe energy of a new configuration the reference set needs tobe known and a summation of all Taylor expansions is re-quired which makes the method more demanding for largereference sets Further the construction of the potentialrequires energies gradients and Hessians from electronicstructure calculations which makes the determination ofthe reference data rather costly

Given the similarities between Shepard interpolationand polynomial fitting in terms of applicability it is worthcomparing these methods in terms of how much data isrequired to achieve a fit for the CH+

5 system Huang et almade use of 20 728 training energies to fit a polynomialsurface [30] However Wu et al have been able to achievea similar fit using just 50 training examples employingMSI [5051] However these 50 training examples are notjust energies but include forces and Hessians In order todetermine these Hessians numerous energy calculationsmust be performed resulting in 19 440 energy calculationsndash a comparable number to that of Huang et al

24 Interpolating moving least squares

The standard implementation of interpolating movingleast squares (IMLS) [5657] which is a method derivedfrom the modified Shepard interpolation determines aconfigurationrsquos energy using a number of linearly inde-pendent basis functions which can be represented by amatrix The energy is then given as the sum of M basisfunctions bi of the atomic coordinates x where each basisfunction is weighted by a coefficient ai

E(x) =Msum

i=1

ai(x)bi(x) (13)

The basis functions have either the form of a Taylorseries [57] or a many-body expansion similar to thoseused in high-dimensional model representation neu-ral networks [58] The coefficients are determined byminimization of the objective function

D[E(x)] =Nrefsum

i=1

wi(x)

⎣Msum

j=1

aj(x)bj(x) minus E(xi)

⎦2

(14)

which is the weighted sum of squared potential energy er-rors with wi(x) being the weight defined by the prox-imity of the trial structure to a reference structure i

Eur Phys J B (2014) 87 152 Page 7 of 16

Hence the largest weight would be found if the trial con-figuration coincides with a configuration in the referenceset The method is easily expanded to include further fit-ting data such as forces and the Hessians of the train-ing points [5859] which expands the design matrix ndash thematrix containing the simultaneous equations that mustbe solved in order to find the best combination of basisfunctions The coefficients aj are detemined by solving

BTWBa = BTWV (15)

Here V is the vector of reference energies and a isthe transpose of the vector containing the coefficientsW is a NreftimesNref diagonal matrix containing the weightswhere Wij = wi(x)δij B is the Nref times M design matrix

Initial applications of the IMLS method focused onfitting available PESs for HN2 [60] HOOH [5861] andHCN [58] before moving on to PESs computed using ac-curate ab initio methods such as for HOOH CH4 andHCN [59] CH2 and HCN [62] as well as the O+ HClsystem [63] More recently Dawes et al have appliedIMLS fitted PES to van der Waals systems including theCO [64] NNO [65] and OCS [66] dimers described asfour-dimensional PESs where data points have been ob-tained from CCSD(T) calculations This method of datageneration is suitable for IMLS as accurate fits can beconstructed from relatively small amounts of data Thislevel of accuracy is often required eg for the predictionof rovibrational spectra

The basis functions used by Dawes et al take the formof numerous many-body terms that together form a high-dimensional model representation [67] High order termsare not included as basis functions as they would be poorlydetermined in scenarios where the PES has been sparselysampled

In order to perform the fitting of the HOOH PESDawes et al used and iteratively sampled trainingset which eventually included almost 2000 referencepoints [58] However compared to Shepard interpolationmodels and neural networks IMLS models are costly toevaluate Dawes et al showed that IMLS can be used as aprocedure to create an accurate PES from a sparse set ofreference data points This IMLS PES is then employedto generate training points that are in turn used to traina Shepard interpolation model and a neural network

Using two IMLS models where one uses basis func-tions that are an order of degree higher than those usedin the other model can guide the sampling of the PESRegions with a large error between the two models whichwill occur between the current training points will be theideal locations to sample and in turn improve the modelIn this way the manner in which the PES is sampled issimilar to GRPs where the sampling is sparse while non-stochastic in nature [59] The final model used by Daweset al in this example is a modification of IMLS a local-IMLS [68] which is similar to the Shepard interpolationscheme Local-IMLS computes energies for a configura-tion by using the reference structures that are similarrather than using all reference configurations Howeverlocal-IMLS is more general since the basis functions can

Fig 2 A small two-dimensional feed-forward Neural Network(NN) containing a single hidden layer The input nodes G1 andG2 provide the structural information to the NN which thenyields the energy E Apart from the architecture of the NNie the number of layer and the number of nodes per layerthe NN output is determined by the numerical values of theweight parameters which are shown as arrows indicating theflow of information through the NN The functional form ofthis NN is given in equation (17)

take any form rather than just the Taylor series expan-sions used in Shepard interpolation For the HOOH PES asimilar accuracy compared to previous work was achievedusing 30 less reference points

25 Neural network potentials

Artificial neural networks (NNs) [6970] have first been in-troduced already in 1943 to develop mathematical modelsof the signal processing in the brain [71] and after contin-uous methodical extensions in the following decades theyhave now become an established research topic in math-ematics and computer science with many applications invarious fields Also in chemistry and physics they havefound wide use eg for the analysis of spectra and pat-tern recognition in experimental data [72] The use of NNsto construct PESs employing reference data from elec-tronic structure calculations has been suggested by Blanket al [73] almost twenty years ago and in the meantimeneural network potentials (NNPs) have been reported fora number of molecules [74ndash76] and molecules interactingwith surfaces [77ndash79] A comprehensive overview aboutthe systems that have been addressed by NNPs can befound in several recent reviews [80ndash82]

The basic component of almost any NNP is a feed-forward NN and a small example corresponding to atwo-dimensional PES is shown in Figure 2 It consistsof artificial neurons or nodes which are arranged inlayers The neurons in the input layer represent thecoordinates Gi defining the atomic configuration the neu-ron in the output layer yields the energy of this configu-ration In between the input and the output layer thereare one or more so-called hidden layers each of whichcontains a number of additional neurons The purpose of

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 5: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152 Page 5 of 16

xi

E(xi)

Fig 1 A one-dimensional interpolation by a Gaussian ProcessRegression (GPR) The grey region represents the uncertaintyof the model for a given input x which is small near or atknown reference points (black dots) The solid curve is theGPR interpolation of the PES The likelihood is then low inthose grey regions with large widths and so sampling of newtraining points is directed towards these configurations

the fitting of the parameter vectors of θ and p Then σ2θ and p are found by the maximization of the likelihoodfunction L with these parameters modulating how corre-lated the Gaussian process values are over the coordinatespace The likelihood function describes the probabilitythat the parameters σ2 θ and p determine the PESand with the vector ε containing all Nref reference energiesε = [E(xi) i = 1 2 Nref ]T we have

L(θp σ2

)=

1

(2π)Nref2 (σ2)Nref2 |C|12

times exp[minus (ε minus Iμ)T Cminus1(ε minus Iμ)

2σ2

] (7)

I is a column vector of 1rsquos and σ2 can be expressed withrespect to θ and p Thus the likelihood function is maxi-mized with respect to θ and p and typically it is the log ofthe likelihood function that is maximized given partlogL

part(σ2) = 0In most applications the elements of p are set to 2 furthersimplifying the fitting of the likelihood parameters

New reference structures are added to the data set bydetermining the likelihood function for unseen examplesThose with the worst likelihood are then used as new ref-erence points The GPR is iteratively improved by addingmore examples to the reference set with the aim to im-prove the likelihood function for all reference and unseenstructures Ideally a model should yield high likelihoodfunction values for inputs within the data set but whichhas been constructed using the least amount of Gaussiansin order to ensure simplicity and optimum generalizationproperties when applied to new configurations This is par-ticularly useful if the amount of data is limited such as inthe case of using high-level electronic structure methodsto generate the reference data (Fig 1)

GPRs have already been shown to be applicableto similar types of problems as neural networks andspecifically in the work of the Popelier group GPRshave been compared to neural networks for modellingenvironment-dependent atom-centered electrostatic mul-tipole moments [33] It has been found that in this ap-plication GPRs can be even superior to neural networkswith almost a 50 decrease in the electrostatic inter-action energy error for the pentamer water cluster [33]

In the following years the use of GPRs in the Popeliergroup has been expanded and been used for the predic-tion of the electrostatic multipole moments for ethanol [34]and alanine [35] as well as the simulation of hydratedsodium ions [36] and the prediction hydrogen bondedcomplexes [37]

Bartok et al have used GPRs to construct PESs ofa variety of condensed systems [38] In this very accu-rate approach the total energy is expressed as a sum ofenvironment-dependent atomic energies and provided asa superposition of Gaussians centered at known referencepoints A central aspect of the method is the use of four-dimensional spherical harmonics to provide a structuralcharacterization which is invariant with respect to rota-tion and translation of the system [38ndash40] Their applica-tion of GPRs has been either to model the entire PES likefor carbon silicon germanium iron and GaN [38] or toprovide a correction to a DFT PES [40] In the latter ap-plication very accurate energies for water can be obtainedby first calculating the energy at the DFT-BLYP level oftheory [4142] The GPR model is then fitted in order tocorrect the one and two-body interactions using more ac-curate data eg from coupled cluster calculations Thusa more computationally affordable method can producehigh accuracy results without having the need to explic-itly compute the energies from an unaffordable high levelof theory

Gaussian functions have also been employed by Ruppet al [4344] to fit the atomization energies of organicmolecules For the structural description of a wide rangeof molecules a Coulomb matrix has been used While thismethod does not yet represent a PES suitable for atomisticsimulations as only the global minimum structure of eachmolecule can be described and as no forces are availablethis very promising technique has certainly the capabilitiesto become a serious alternative approach in the years tocome A first step in this direction has been taken veryrecently in a first study of conformational energies of thenatural product Archazolid A [45]

23 Modified Shepard interpolation

In 1994 Ischtwan and Collins developed a technique toconstruct the PESs of small molecules based on the mod-ified Shepard interpolation (MSI) method [46] The basicidea is to express the energy of an atomic configuration xas a sum of weighted second order Taylor expansions cen-tered at a set of reference points known from electronicstructure calculations Typically a set of 3Natom minus 6 in-verse interatomic distances is used to describe the atomicconfiguration x

Then for each reference point i the energy of a con-figuration x in its close environment is approximated bya second order Taylor series Ei(x) as

Ei (x) = E(xi)

+(x minus xi

)TGi

+12(x minus xi

)THi

(x minus xi

)+ (8)

Page 6 of 16 Eur Phys J B (2014) 87 152

where Gi is the gradient and Hi is the matrix of secondderivatives ie the Hessian at the position xi of the ref-erence point

Typically many reference points are available and theenergy prediction for an unknown configuration can beimproved significantly by employing a weighted sum ofthe Taylor expansions centered at all reference points

E (x) =Nrefsum

i=1

wi (x) middot Ei (x) (9)

The normalized weight wi of each individual Taylorexpansion is given by

wi(x) =vi (x)

sumNrefk=1 vk (x)

(10)

and can be determined from the unnormalized weightfunction

vi (x) =1

|x minus xi|p (11)

which approaches zero for very distant reference pointsand infinity for x = xi corresponding to a normalizedweight wi(x) = 1 The power parameter p needs to beequal or larger than the order of the Taylor expansionThe obtained PES has a number of favorable features Itdepends only on interatomic distances and therefore itis translationally and rotationally invariant Further thePES is also invariant with respect to a permutation ofchemically equivalent atoms

The accuracy of the method crucially depends on thechoice of the reference points as the approximation of asecond order Taylor expansion of the energy is only suffi-ciently accurate if there is a dense set of reference struc-tures It has been suggested by Ischtwan and Collins [47]to start with a reasonable first set of structures which canbe chosen along a reaction path and to improve the qual-ity of the PES by adding more structures in chemicallyrelevant parts of the configuration space These pointsare identified by running classical molecular dynamicstrajectories employing preliminary potentials which arethen iteratively refined by adding more and more pointsfrom electronic structure calculations carried out at theconfigurations visited in these trajectories

The modified Shepard interpolation scheme has beenapplied successfully to study chemical reaction dynamicsfor a number of systems like NH + H2 rarr NH2 + H [47]OH + H2 rarr H2O + H [4849] and many others [50ndash53] Ithas also been adapted to describe molecule-surface inter-actions using the dissociation of H2 at the Pt(111) surfaceas an example by Crespos et al [5455] To keep the di-mensionality and thus the complexity of the system at areasonable level the positions of the surface atoms havebeen frozen resulting in a six-dimensional PES As the hy-drogen atoms can move along the surface an extension ofequation (12) has been required to include the symmetryof the surface The modified energy expression is

E (x) =Nrefsum

i=1

Nsymsum

j=1

wij (x) middot Ei (x) (12)

The purpose of the second summation over all symmetryelements Nsym ensures that the full symmetry of the sur-face is taken into account exactly without the requirementto add structures which are equivalent by symmetry intothe reference set

The modified Shepard interpolation is very appealingbecause of its simplicity and the accuracy of the obtainedPESs but it has also a number of drawbacks First to dateits applicability is limited to very low-dimensional systemsand once the PES has been constructed the system sizecannot be changed because if an atom would be addedthe distance to the reference points in the Taylor expan-sions would not be defined Moreover for the prediction ofthe energy of a new configuration the reference set needs tobe known and a summation of all Taylor expansions is re-quired which makes the method more demanding for largereference sets Further the construction of the potentialrequires energies gradients and Hessians from electronicstructure calculations which makes the determination ofthe reference data rather costly

Given the similarities between Shepard interpolationand polynomial fitting in terms of applicability it is worthcomparing these methods in terms of how much data isrequired to achieve a fit for the CH+

5 system Huang et almade use of 20 728 training energies to fit a polynomialsurface [30] However Wu et al have been able to achievea similar fit using just 50 training examples employingMSI [5051] However these 50 training examples are notjust energies but include forces and Hessians In order todetermine these Hessians numerous energy calculationsmust be performed resulting in 19 440 energy calculationsndash a comparable number to that of Huang et al

24 Interpolating moving least squares

The standard implementation of interpolating movingleast squares (IMLS) [5657] which is a method derivedfrom the modified Shepard interpolation determines aconfigurationrsquos energy using a number of linearly inde-pendent basis functions which can be represented by amatrix The energy is then given as the sum of M basisfunctions bi of the atomic coordinates x where each basisfunction is weighted by a coefficient ai

E(x) =Msum

i=1

ai(x)bi(x) (13)

The basis functions have either the form of a Taylorseries [57] or a many-body expansion similar to thoseused in high-dimensional model representation neu-ral networks [58] The coefficients are determined byminimization of the objective function

D[E(x)] =Nrefsum

i=1

wi(x)

⎣Msum

j=1

aj(x)bj(x) minus E(xi)

⎦2

(14)

which is the weighted sum of squared potential energy er-rors with wi(x) being the weight defined by the prox-imity of the trial structure to a reference structure i

Eur Phys J B (2014) 87 152 Page 7 of 16

Hence the largest weight would be found if the trial con-figuration coincides with a configuration in the referenceset The method is easily expanded to include further fit-ting data such as forces and the Hessians of the train-ing points [5859] which expands the design matrix ndash thematrix containing the simultaneous equations that mustbe solved in order to find the best combination of basisfunctions The coefficients aj are detemined by solving

BTWBa = BTWV (15)

Here V is the vector of reference energies and a isthe transpose of the vector containing the coefficientsW is a NreftimesNref diagonal matrix containing the weightswhere Wij = wi(x)δij B is the Nref times M design matrix

Initial applications of the IMLS method focused onfitting available PESs for HN2 [60] HOOH [5861] andHCN [58] before moving on to PESs computed using ac-curate ab initio methods such as for HOOH CH4 andHCN [59] CH2 and HCN [62] as well as the O+ HClsystem [63] More recently Dawes et al have appliedIMLS fitted PES to van der Waals systems including theCO [64] NNO [65] and OCS [66] dimers described asfour-dimensional PESs where data points have been ob-tained from CCSD(T) calculations This method of datageneration is suitable for IMLS as accurate fits can beconstructed from relatively small amounts of data Thislevel of accuracy is often required eg for the predictionof rovibrational spectra

The basis functions used by Dawes et al take the formof numerous many-body terms that together form a high-dimensional model representation [67] High order termsare not included as basis functions as they would be poorlydetermined in scenarios where the PES has been sparselysampled

In order to perform the fitting of the HOOH PESDawes et al used and iteratively sampled trainingset which eventually included almost 2000 referencepoints [58] However compared to Shepard interpolationmodels and neural networks IMLS models are costly toevaluate Dawes et al showed that IMLS can be used as aprocedure to create an accurate PES from a sparse set ofreference data points This IMLS PES is then employedto generate training points that are in turn used to traina Shepard interpolation model and a neural network

Using two IMLS models where one uses basis func-tions that are an order of degree higher than those usedin the other model can guide the sampling of the PESRegions with a large error between the two models whichwill occur between the current training points will be theideal locations to sample and in turn improve the modelIn this way the manner in which the PES is sampled issimilar to GRPs where the sampling is sparse while non-stochastic in nature [59] The final model used by Daweset al in this example is a modification of IMLS a local-IMLS [68] which is similar to the Shepard interpolationscheme Local-IMLS computes energies for a configura-tion by using the reference structures that are similarrather than using all reference configurations Howeverlocal-IMLS is more general since the basis functions can

Fig 2 A small two-dimensional feed-forward Neural Network(NN) containing a single hidden layer The input nodes G1 andG2 provide the structural information to the NN which thenyields the energy E Apart from the architecture of the NNie the number of layer and the number of nodes per layerthe NN output is determined by the numerical values of theweight parameters which are shown as arrows indicating theflow of information through the NN The functional form ofthis NN is given in equation (17)

take any form rather than just the Taylor series expan-sions used in Shepard interpolation For the HOOH PES asimilar accuracy compared to previous work was achievedusing 30 less reference points

25 Neural network potentials

Artificial neural networks (NNs) [6970] have first been in-troduced already in 1943 to develop mathematical modelsof the signal processing in the brain [71] and after contin-uous methodical extensions in the following decades theyhave now become an established research topic in math-ematics and computer science with many applications invarious fields Also in chemistry and physics they havefound wide use eg for the analysis of spectra and pat-tern recognition in experimental data [72] The use of NNsto construct PESs employing reference data from elec-tronic structure calculations has been suggested by Blanket al [73] almost twenty years ago and in the meantimeneural network potentials (NNPs) have been reported fora number of molecules [74ndash76] and molecules interactingwith surfaces [77ndash79] A comprehensive overview aboutthe systems that have been addressed by NNPs can befound in several recent reviews [80ndash82]

The basic component of almost any NNP is a feed-forward NN and a small example corresponding to atwo-dimensional PES is shown in Figure 2 It consistsof artificial neurons or nodes which are arranged inlayers The neurons in the input layer represent thecoordinates Gi defining the atomic configuration the neu-ron in the output layer yields the energy of this configu-ration In between the input and the output layer thereare one or more so-called hidden layers each of whichcontains a number of additional neurons The purpose of

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 6: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 6 of 16 Eur Phys J B (2014) 87 152

where Gi is the gradient and Hi is the matrix of secondderivatives ie the Hessian at the position xi of the ref-erence point

Typically many reference points are available and theenergy prediction for an unknown configuration can beimproved significantly by employing a weighted sum ofthe Taylor expansions centered at all reference points

E (x) =Nrefsum

i=1

wi (x) middot Ei (x) (9)

The normalized weight wi of each individual Taylorexpansion is given by

wi(x) =vi (x)

sumNrefk=1 vk (x)

(10)

and can be determined from the unnormalized weightfunction

vi (x) =1

|x minus xi|p (11)

which approaches zero for very distant reference pointsand infinity for x = xi corresponding to a normalizedweight wi(x) = 1 The power parameter p needs to beequal or larger than the order of the Taylor expansionThe obtained PES has a number of favorable features Itdepends only on interatomic distances and therefore itis translationally and rotationally invariant Further thePES is also invariant with respect to a permutation ofchemically equivalent atoms

The accuracy of the method crucially depends on thechoice of the reference points as the approximation of asecond order Taylor expansion of the energy is only suffi-ciently accurate if there is a dense set of reference struc-tures It has been suggested by Ischtwan and Collins [47]to start with a reasonable first set of structures which canbe chosen along a reaction path and to improve the qual-ity of the PES by adding more structures in chemicallyrelevant parts of the configuration space These pointsare identified by running classical molecular dynamicstrajectories employing preliminary potentials which arethen iteratively refined by adding more and more pointsfrom electronic structure calculations carried out at theconfigurations visited in these trajectories

The modified Shepard interpolation scheme has beenapplied successfully to study chemical reaction dynamicsfor a number of systems like NH + H2 rarr NH2 + H [47]OH + H2 rarr H2O + H [4849] and many others [50ndash53] Ithas also been adapted to describe molecule-surface inter-actions using the dissociation of H2 at the Pt(111) surfaceas an example by Crespos et al [5455] To keep the di-mensionality and thus the complexity of the system at areasonable level the positions of the surface atoms havebeen frozen resulting in a six-dimensional PES As the hy-drogen atoms can move along the surface an extension ofequation (12) has been required to include the symmetryof the surface The modified energy expression is

E (x) =Nrefsum

i=1

Nsymsum

j=1

wij (x) middot Ei (x) (12)

The purpose of the second summation over all symmetryelements Nsym ensures that the full symmetry of the sur-face is taken into account exactly without the requirementto add structures which are equivalent by symmetry intothe reference set

The modified Shepard interpolation is very appealingbecause of its simplicity and the accuracy of the obtainedPESs but it has also a number of drawbacks First to dateits applicability is limited to very low-dimensional systemsand once the PES has been constructed the system sizecannot be changed because if an atom would be addedthe distance to the reference points in the Taylor expan-sions would not be defined Moreover for the prediction ofthe energy of a new configuration the reference set needs tobe known and a summation of all Taylor expansions is re-quired which makes the method more demanding for largereference sets Further the construction of the potentialrequires energies gradients and Hessians from electronicstructure calculations which makes the determination ofthe reference data rather costly

Given the similarities between Shepard interpolationand polynomial fitting in terms of applicability it is worthcomparing these methods in terms of how much data isrequired to achieve a fit for the CH+

5 system Huang et almade use of 20 728 training energies to fit a polynomialsurface [30] However Wu et al have been able to achievea similar fit using just 50 training examples employingMSI [5051] However these 50 training examples are notjust energies but include forces and Hessians In order todetermine these Hessians numerous energy calculationsmust be performed resulting in 19 440 energy calculationsndash a comparable number to that of Huang et al

24 Interpolating moving least squares

The standard implementation of interpolating movingleast squares (IMLS) [5657] which is a method derivedfrom the modified Shepard interpolation determines aconfigurationrsquos energy using a number of linearly inde-pendent basis functions which can be represented by amatrix The energy is then given as the sum of M basisfunctions bi of the atomic coordinates x where each basisfunction is weighted by a coefficient ai

E(x) =Msum

i=1

ai(x)bi(x) (13)

The basis functions have either the form of a Taylorseries [57] or a many-body expansion similar to thoseused in high-dimensional model representation neu-ral networks [58] The coefficients are determined byminimization of the objective function

D[E(x)] =Nrefsum

i=1

wi(x)

⎣Msum

j=1

aj(x)bj(x) minus E(xi)

⎦2

(14)

which is the weighted sum of squared potential energy er-rors with wi(x) being the weight defined by the prox-imity of the trial structure to a reference structure i

Eur Phys J B (2014) 87 152 Page 7 of 16

Hence the largest weight would be found if the trial con-figuration coincides with a configuration in the referenceset The method is easily expanded to include further fit-ting data such as forces and the Hessians of the train-ing points [5859] which expands the design matrix ndash thematrix containing the simultaneous equations that mustbe solved in order to find the best combination of basisfunctions The coefficients aj are detemined by solving

BTWBa = BTWV (15)

Here V is the vector of reference energies and a isthe transpose of the vector containing the coefficientsW is a NreftimesNref diagonal matrix containing the weightswhere Wij = wi(x)δij B is the Nref times M design matrix

Initial applications of the IMLS method focused onfitting available PESs for HN2 [60] HOOH [5861] andHCN [58] before moving on to PESs computed using ac-curate ab initio methods such as for HOOH CH4 andHCN [59] CH2 and HCN [62] as well as the O+ HClsystem [63] More recently Dawes et al have appliedIMLS fitted PES to van der Waals systems including theCO [64] NNO [65] and OCS [66] dimers described asfour-dimensional PESs where data points have been ob-tained from CCSD(T) calculations This method of datageneration is suitable for IMLS as accurate fits can beconstructed from relatively small amounts of data Thislevel of accuracy is often required eg for the predictionof rovibrational spectra

The basis functions used by Dawes et al take the formof numerous many-body terms that together form a high-dimensional model representation [67] High order termsare not included as basis functions as they would be poorlydetermined in scenarios where the PES has been sparselysampled

In order to perform the fitting of the HOOH PESDawes et al used and iteratively sampled trainingset which eventually included almost 2000 referencepoints [58] However compared to Shepard interpolationmodels and neural networks IMLS models are costly toevaluate Dawes et al showed that IMLS can be used as aprocedure to create an accurate PES from a sparse set ofreference data points This IMLS PES is then employedto generate training points that are in turn used to traina Shepard interpolation model and a neural network

Using two IMLS models where one uses basis func-tions that are an order of degree higher than those usedin the other model can guide the sampling of the PESRegions with a large error between the two models whichwill occur between the current training points will be theideal locations to sample and in turn improve the modelIn this way the manner in which the PES is sampled issimilar to GRPs where the sampling is sparse while non-stochastic in nature [59] The final model used by Daweset al in this example is a modification of IMLS a local-IMLS [68] which is similar to the Shepard interpolationscheme Local-IMLS computes energies for a configura-tion by using the reference structures that are similarrather than using all reference configurations Howeverlocal-IMLS is more general since the basis functions can

Fig 2 A small two-dimensional feed-forward Neural Network(NN) containing a single hidden layer The input nodes G1 andG2 provide the structural information to the NN which thenyields the energy E Apart from the architecture of the NNie the number of layer and the number of nodes per layerthe NN output is determined by the numerical values of theweight parameters which are shown as arrows indicating theflow of information through the NN The functional form ofthis NN is given in equation (17)

take any form rather than just the Taylor series expan-sions used in Shepard interpolation For the HOOH PES asimilar accuracy compared to previous work was achievedusing 30 less reference points

25 Neural network potentials

Artificial neural networks (NNs) [6970] have first been in-troduced already in 1943 to develop mathematical modelsof the signal processing in the brain [71] and after contin-uous methodical extensions in the following decades theyhave now become an established research topic in math-ematics and computer science with many applications invarious fields Also in chemistry and physics they havefound wide use eg for the analysis of spectra and pat-tern recognition in experimental data [72] The use of NNsto construct PESs employing reference data from elec-tronic structure calculations has been suggested by Blanket al [73] almost twenty years ago and in the meantimeneural network potentials (NNPs) have been reported fora number of molecules [74ndash76] and molecules interactingwith surfaces [77ndash79] A comprehensive overview aboutthe systems that have been addressed by NNPs can befound in several recent reviews [80ndash82]

The basic component of almost any NNP is a feed-forward NN and a small example corresponding to atwo-dimensional PES is shown in Figure 2 It consistsof artificial neurons or nodes which are arranged inlayers The neurons in the input layer represent thecoordinates Gi defining the atomic configuration the neu-ron in the output layer yields the energy of this configu-ration In between the input and the output layer thereare one or more so-called hidden layers each of whichcontains a number of additional neurons The purpose of

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 7: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152 Page 7 of 16

Hence the largest weight would be found if the trial con-figuration coincides with a configuration in the referenceset The method is easily expanded to include further fit-ting data such as forces and the Hessians of the train-ing points [5859] which expands the design matrix ndash thematrix containing the simultaneous equations that mustbe solved in order to find the best combination of basisfunctions The coefficients aj are detemined by solving

BTWBa = BTWV (15)

Here V is the vector of reference energies and a isthe transpose of the vector containing the coefficientsW is a NreftimesNref diagonal matrix containing the weightswhere Wij = wi(x)δij B is the Nref times M design matrix

Initial applications of the IMLS method focused onfitting available PESs for HN2 [60] HOOH [5861] andHCN [58] before moving on to PESs computed using ac-curate ab initio methods such as for HOOH CH4 andHCN [59] CH2 and HCN [62] as well as the O+ HClsystem [63] More recently Dawes et al have appliedIMLS fitted PES to van der Waals systems including theCO [64] NNO [65] and OCS [66] dimers described asfour-dimensional PESs where data points have been ob-tained from CCSD(T) calculations This method of datageneration is suitable for IMLS as accurate fits can beconstructed from relatively small amounts of data Thislevel of accuracy is often required eg for the predictionof rovibrational spectra

The basis functions used by Dawes et al take the formof numerous many-body terms that together form a high-dimensional model representation [67] High order termsare not included as basis functions as they would be poorlydetermined in scenarios where the PES has been sparselysampled

In order to perform the fitting of the HOOH PESDawes et al used and iteratively sampled trainingset which eventually included almost 2000 referencepoints [58] However compared to Shepard interpolationmodels and neural networks IMLS models are costly toevaluate Dawes et al showed that IMLS can be used as aprocedure to create an accurate PES from a sparse set ofreference data points This IMLS PES is then employedto generate training points that are in turn used to traina Shepard interpolation model and a neural network

Using two IMLS models where one uses basis func-tions that are an order of degree higher than those usedin the other model can guide the sampling of the PESRegions with a large error between the two models whichwill occur between the current training points will be theideal locations to sample and in turn improve the modelIn this way the manner in which the PES is sampled issimilar to GRPs where the sampling is sparse while non-stochastic in nature [59] The final model used by Daweset al in this example is a modification of IMLS a local-IMLS [68] which is similar to the Shepard interpolationscheme Local-IMLS computes energies for a configura-tion by using the reference structures that are similarrather than using all reference configurations Howeverlocal-IMLS is more general since the basis functions can

Fig 2 A small two-dimensional feed-forward Neural Network(NN) containing a single hidden layer The input nodes G1 andG2 provide the structural information to the NN which thenyields the energy E Apart from the architecture of the NNie the number of layer and the number of nodes per layerthe NN output is determined by the numerical values of theweight parameters which are shown as arrows indicating theflow of information through the NN The functional form ofthis NN is given in equation (17)

take any form rather than just the Taylor series expan-sions used in Shepard interpolation For the HOOH PES asimilar accuracy compared to previous work was achievedusing 30 less reference points

25 Neural network potentials

Artificial neural networks (NNs) [6970] have first been in-troduced already in 1943 to develop mathematical modelsof the signal processing in the brain [71] and after contin-uous methodical extensions in the following decades theyhave now become an established research topic in math-ematics and computer science with many applications invarious fields Also in chemistry and physics they havefound wide use eg for the analysis of spectra and pat-tern recognition in experimental data [72] The use of NNsto construct PESs employing reference data from elec-tronic structure calculations has been suggested by Blanket al [73] almost twenty years ago and in the meantimeneural network potentials (NNPs) have been reported fora number of molecules [74ndash76] and molecules interactingwith surfaces [77ndash79] A comprehensive overview aboutthe systems that have been addressed by NNPs can befound in several recent reviews [80ndash82]

The basic component of almost any NNP is a feed-forward NN and a small example corresponding to atwo-dimensional PES is shown in Figure 2 It consistsof artificial neurons or nodes which are arranged inlayers The neurons in the input layer represent thecoordinates Gi defining the atomic configuration the neu-ron in the output layer yields the energy of this configu-ration In between the input and the output layer thereare one or more so-called hidden layers each of whichcontains a number of additional neurons The purpose of

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 8: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 8 of 16 Eur Phys J B (2014) 87 152

the hidden layers which do not have a physical meaningis to define the functional form of the NN assigning theenergy to the structure The more hidden layers and themore nodes per hidden layer the higher is the flexibilityof the NN Haykin has stated that in the case of a NNarchitecture consisting of two hidden layers that the firsthidden layers capture the local features of the function be-ing modelled while the second hidden layer capture theglobal features of the function [83] All nodes in all lay-ers are connected to the nodes in the adjacent layers byso-called weight parameters which are the fitting param-eters of the NN In Figure 2 they are shown as arrowsindicating the flow of information through the NN Thevalue yj

i of a node i in layer j is then calculated as a lin-ear combination of the values of the nodes in the previouslayer using the weight parameters as coefficients The lin-ear combination is then shifted by a bias weight bj

i actingas an adjustable offset Finally a non-linear function theactivation function of the NN is applied which providesthe capability of the NN to fit any real-valued functionto arbitrary accuracy This fundamental property of NNshas been proven by several groups independently [8485]and is the theoretical basis for the applicability of NNsto construct atomistic potentials Accordingly yj

i is thenobtained as

yji = f

⎝bji +

Njminus1sum

k=1

ajminus1jki

⎠ (16)

where ajminus1jki is the weight parameter connecting node k

in layer jminus1 containing Njminus1 neurons to node i in layer jIn the special case of the input layer with superscript 0the input coordinates are represented by a vector G =Gi =

y0

i

The complete total energy expression of the

example NN shown in Figure 2 is then given by

E = f

⎝b21 +

3sum

j=1

a12j1 middot f

(b1j +

2sum

i=1

a01ij middot Gi

)⎞

⎠ (17)

The weight parameters of the NN are typically determinedby minimizing the error of a training set of electronicstructure data employing gradient-based optimization al-gorithms and in particular the backpropagation algo-rithm [86] and the global extended Kalman filter [87] havebeen frequently used in the context of NNPs The targetquantity is usually the energy but also forces have beenused in some cases [8889]

NNPs constructing the potential-energy using just asingle feed-forward NN have been developed for a numberof systems with great success but for a long time NNPshave been restricted to low-dimensional PESs involvingonly a few degrees of freedom This limitation has a num-ber of reasons First of all each additional atom in thesystem introduces three more degrees of freedom whichneed to be provided to the NN in the input layer Increas-ing the number of nodes in the NN reduces the efficiency ofthe NN evaluation but it also complicates the determina-tion of the growing number of weight parameters Further

conventional NNPs can only be applied to systems witha fixed chemical composition because the number of in-put nodes of the NN cannot be changed once the weightparameters have been determined If an atom would beadded the weight parameters for the additional input in-formation would not be available while upon removing anatom the values of the corresponding input nodes wouldbe ill-defined

The most severe challenge of constructing NNPs is thechoice of a suitable set of input coordinates Cartesian co-ordinates cannot be used because their numerical valueshave no direct physical meaning and only relative atomicpositions are important for the potential-energy If for in-stance a molecule would be translated or rotated in spaceits Cartesian coordinates would change while its internalstructure and thus its energy are still the same An NNPbased on Cartesian coordinates however would predict adifferent energy since its input vector has changed Incor-porating this rotational and translational invariance by acoordinate transformation onto a suitable set of functionsexhibiting these properties is a significant challenge Avery simple solution appropriate for small molecules wouldbe to use internal coordinates like interatomic distancesangles and dihedral angles and indeed this has been donesuccessfully for a number of molecules but this approachis unfeasible for large systems due to the rapidly increas-ing number of coordinates Further a full set of internalcoordinates contains a lot of redundant information andalso using a full-distance matrix is possible only for smallsystems

Another problem that is still present even if internalcoordinates are used concerns the invariance of the totalenergy with respect to the interchange of atoms of thesame element If for instance the order of both hydro-gen atoms in a free water molecule is switched then thischange is also present in the NN input vector Since allNN input nodes are connected to the NN by numericallydifferent weight parameters this exchange will modify thetotal energy of the system although the atomic configura-tion is still the same Including this permutation symme-try of like atoms into NNPs has been a challenge since theadvent of NNPs A solution for molecular systems basedon a symmetrization of internal coordinates has been sug-gested in a seminal paper by Gassner et al [76] for molecu-lar systems and a similar scheme has also been developedfor the dissociation of molecules at surfaces [78] Unfor-tunately both methods are only applicable to very low-dimensional systems It should be noted however thatalso low-dimensional NNPs have been applied to studylarge systems by representing only specific interactionsby NNs Examples are the work of Cho et al [90] whoused NNs to include polarization in the TIP4P watermodel [91] and the description of three-body interactionsin the solvation of Al3+ ions by Gassner et al [76]

It has soon been recognized that further methodical ex-tensions are required to construct high-dimensional NNPscapturing all relevant many-body effects in large systemsA first step in this direction has been made by Manzhosand Carrington [9293] by decomposing the PESs into

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 9: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152 Page 9 of 16

interactions of increasing order in the spirit of a many-body expansion

E =Natomsum

i=1

Ei +sum

i

sum

jgti

Eij +sum

i

sum

jgti

sum

kgtj

Eijk + (18)

with the Ei Eij Eijk being one- two- three-body termsand so forth

Specifically they have used the high-dimensionalmodel representation of Li et al [67] to reduce the com-plexity of the multidimensional PES by employing a sumof mode terms each of which is represented by an in-dividual NN and depends only on a small subset of thecoordinates Since the number of terms and consequentlyalso the number of NNs that need to be evaluated growsrapidly with system size and with the maximum order thatis considered to date the method has only been appliedto comparably small molecular systems In the followingyears the efficiency could be improved by reducing theeffective dimensionality employing optimized redundantcoordinates [9495] This approach is still the most sys-tematic way to construct NNPs and very accurate PESscan be obtained A similar method based upon many-bodyexpansions has also been suggested by Malshe et al [96]

First attempts to develop high-dimensional NNPs havebeen made by Smith and coworkers already in 1999 byintroducing NNs of variable size and by decomposing con-densed systems into a number of atomic chains [9798] Inthis early work the NN parameters have not been deter-mined using electronic structure reference data and onlyin 2007 the method has been further developed to a gen-uine NNP approach [99100] using silicon as example sys-tem Surprisingly no further applications or extensions ofthis method have been reported to date

A high-dimensional NNP method which has been ap-plied to a manifold of systems has been developed byBehler and Parrinello in 2007 [101] In this approachthe energy of the system is constructed as a sum ofenvironment-dependent atomic energy contributions Ei

E =sum

i

Ei (19)

Each atomic energy contribution is obtained from an in-dividual atomic NN The input for these NNs is formedby a vector of symmetry functions [102] which provide astructural fingerprint of the chemical environments up to acutoff radius of about 6minus10 A The functional form of thesymmetry functions ensures that the potential is transla-tionally and rotationally invariant and due to the summa-tion in equation (19) the potential is also invariant withrespect to permutations of like atoms Consequently thisapproach overcomes all conceptual limitations of conven-tional low-dimensional NNPs For multicomponent sys-tems long-range electrostatic interactions can be includedby an explicit electrostatic term employing environment-dependent charges which are constructed using a secondset of atomic NNs [103104] This high-dimensional NNPwhich is applicable to very large systems containing thou-sands of atoms has been applied to a number of systems

like silicon [105] carbon [106] sodium [107] copper [108]GeTe [109] ZnO [104] the ternary CuZnO system [110]and water clusters [111112]

NNs have also been employed by Popelier and cowork-ers to improve the description of electrostatic interactionsin classical force fields [33113ndash115] For this purpose theyhave developed a method to also express higher order elec-trostatic multipole moments of atoms as a function of thechemical environment The reference multipole momentsused to train the NNs are obtained from DFT calcula-tions and a partitioning of the self-consistent electron den-sity employing the concept of quantum chemical topology(QCT) [116ndash118] The neural network acts by taking theinput the relative position of atoms about the atom forwhich the predictions are made and then expressing themultipole moments up to high orders within the atomiclocal frame The advantage of this method is that all elec-trostatic effects such as Coulombic interactions chargetransfer and polarization which traditionally are repre-sented in force fields using different interaction terms areinstead treated equally since they all are a result of thepredicted multipoles Furthermore this also means that allelectrostatic interactions are more realistic because of thenon-spherical nature of the atomic electron densities thatare described by the multipoles Polarization is then a nat-ural result of the prediction of these multipoles as theychange as atoms move around and atomic electron densi-ties deform It has been demonstrated that the accuracy ofelectrostatic interactions in classical force fields can be sig-nificantly improved but if high-order multipoles are usedthe number of NNs to be evaluated can be substantial

Recently NNPs have also been constructed using in-put coordinates which have been used before in othertypes of mathematical potentials An illustrative exam-ple is the use of permutationally invariant polynomialsby Li et al [119] which have been used very success-fully by Bowman and coworkers for a number of molec-ular systems (cf Sect 21) On the other hand in thework of Fournier and Slava symmetry functions similarto those introduced by Behler [102] are used for an atom-istic potential but here a many-body expansion is em-ployed rather than a neural network [120] Using the sym-metry functions as the dimensions of the descriptor spacea clustering procedure can be used to define the atomtypes Within the PES the energy contribution for anatom is then determined from the linear combination ofeach atom type In a similar manner a symmetry func-tion based neural network can also be used for structureanalysis in computer simulations [121]

NNs and GPRs show similar limitations namely theirinability to extrapolate accurate energies for atomic con-figurations being very different from the structures in-cluded in the training set Boths methods are thus relianton the sampling of the training points to guide the fittingof the function [122] GPRs model smoothly varying func-tions in order to perform regression and interpolation andtypically the kernals used in most GPR models performpoorly for extrapolation though there are attempts toaddress this issue [123]

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 10: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 10 of 16 Eur Phys J B (2014) 87 152

26 Support vector machines

Support vector machines (SVMs) [124] are a machinelearning method that has found much use in the fields ofbioinformatics and cheminformatics [125126] mainly forclassification problems However SVMs can also be usedfor regression and so are suitable for PES fitting thoughexamples of such applications of SVMs are still rare

In general SVMs aim to define a linear separation ofdata points in feature space by finding a vector that yieldsthe largest linear separation width between two groups ofpoints with different properties The width of this separa-tion is given by the reference points that lie at the margineither side of the separating vector and these referencepoints are called support vectors A balance needs to befound between finding the maximum separation and er-rors which arise from those cases where data points lieon the wrong side of the separation vector and so areincorrectly classified

Unfortunately a linear separation is often impossiblein the original input coordinate feature space This prob-lem can be solved by transforming the input coordinatesand recasting them in a higher-dimensional space wherea linear separation becomes possible Depending on theway this non-linear transformation is performed the lin-ear separation vector found in this high-dimensional spacecan then be transformed back into a non-linear separatorin the original coordinate space

There are many ways to perform the transformationfrom the original input space into the higher dimensionalfeature space One possibility is to define the dimensions ofthe higher-dimensional space as combinations of the orig-inal coordinates This feature space could then in theorycontain an infinite number of dimensions The classifica-tion a(x) of a trial point xi is then given by the generalform of a SVM

a(x) =Nrefsum

i=1

λiyiK(xix) minus w0 (20)

The sum is over the number of support vectors whichis equal to the number of reference examples K is thekernel that transforms the input x from the original rep-resentation to the new higher order representation andcompares the similarity between the trial point and eachof the reference points

Equation (20) can also be viewed as a representation ofa neural network by a SVM where w0 would be the bias ofthe hidden layer This means that yi is the classification ofreference example xi and λi is a coefficient The elementsof K are found by

K(xix) = tanh(k0 + k1〈xix〉) (21)

k0 is the bias weight of the first layer of the neural networkThus k1〈xix〉 is the dot product of the vectors and sois a measure of similarity K(xix) in effect representsthe action of the first layer of hidden nodes of a neuralnetwork as it applies a hyperbolic tangent function tok1〈xix〉

A SVM is more general than a neural network sincedifferent kernels allow for other non-linear transformationsto be used Without the non-linear transformation eachelement of the kernel can be viewed as a measure of sim-ilarity between the reference points and the trial pointsIf we perform the non-linear transformation to the kernelwe have two approaches The first is to explicitly computethe new variables of the vector with respect to the newhigh-dimensional space While this is trivial if the num-ber of dimensions is small the computation will grow incomplexity quadratically as more dimensions are addedto the point where the computation will not fit withinthe computer memory A solution to this problem is theso-called ldquokernel trickrdquo which accounts for the non-lineartransformation into a high-dimensional space but wherethe original dot product matrix is still used The trans-formation is then applied after finding the dot productsThus the non-linear transformation is implicit and sim-plifies the whole process since this kernel is integral tothen finding the linear separation vector

The main task to be solved when using SVMs is findingthe linear separator so that the distance of separation ismaximized the errors in classification are reduced andidentifying the appropriate non-linear transformation orkernel function that yields the reference data in a newhigh-dimensional feature space where the separation canbe performed

Since there is an infinite number of possible non-lineartransformations and the use of multiple kernels can allowfor multiple classifications to be separated it can hap-pen that the flexibility offered by SVMs results in overfit-ting and a loss of generalization capabilities For this rea-son like in many other fitting methods cross validationis applied during fitting

The linear separation in high-dimensional space is thendefined by

w middot Xminus b = 0 (22)

where X is a set of reference points that define the linearseparator and w is the normal vector to the linear sepa-rator Furthermore the margins of the separation can bedefined by

w middot Xminus b = minus1 (23)

and

w middot Xminus b = 1 (24)

as shown in Figure 3The width of the separation is given by 2

w The ob-jective of a SVM is then to maximize this separation Theconstraint of the linear separation can be rewritten as

yi

(w middot xi minus b

) ge 1 (25)

where xi is reference training point that must be correctlyclassified and yi is the classification function ie the out-put is then either 1 or minus1 This can be modified to allow forsoft margins and so allow for a degree of misclassification

yi

(w middot xi minus b

) ge 1 minus ξi (26)

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 11: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152 Page 11 of 16

wX-b=1wX-b=0wX-b=-1

x1x1

x2 x2

Fig 3 In the initial two-dimensional space that data points(stars) fall into two groups ndash those within the circle and thoseoutside of it Within this space there is no way to perform alinear separation A kernel allows to recast the data points intoa new space where a linear separation is possible Those withinthe circle have classification lsquo1rsquo and those outside have classi-fication lsquominus1rsquo The linear separation is defined by a vector witha gradient of w and an intercept of b The linear separation isfound as vector that gives the widest separation between thetwo classes This width is determined by two data points thesupport vectors

ξi is called the slack variable a measure of the degreeof misclassification This is a penalty that must be mini-mized as part of the objective function J

min J(w ξ) =12wT w + c

Nrefsum

i=1

ξi (27)

where c is a parameter modulating the influence of theslack function

For regression purposes like the construction of PESsthe objective is the reverse problem where the referencedata points should lie as close as possible to a linear sepa-ration This is the basis of Least-Squares Support VectorMachines (LS-SVMs) and their applications to PESs inthe work of Balabin and Lomakina [127]

LS-SVMs require the minimization of

min J(w b e) =ν

2wT w +

ζ

2

Nrefsum

i=1

e2i (28)

The slack values are now replaced by the sum of squarederrors ei of the reference points and ν and ζ are pa-rameters that determine the smoothness of the fit BothSVMs and GPRs are similar in that they can both be ad-justed by the choice of the kernal function used Gaussiansare frequently used as they are computationally simple toimplement but other choices of functions can adjust thesensitivity of the fitting method [128]

Specifically Balabin and Lomakina used LS-SVMs topredict energies of calculations with converged basis setsemploying electronic structure data obtained using smallerbasis sets for 208 different molecules in their minimumgeometry containing carbon hydrogen oxygen nitrogenand fluorine Therefore while not describing the PES for

X

X+

I2 I3 I4I1

-

divideX

I2 I3 I4I1

-

X

divide+

I2 I3 I4I1

-

XX

I2 I3 I4I1

-

Fig 4 The two parent Genetic Programming (GP) trees atthe top have been randomly partnered and at the same po-sition on each tree the connection is marked by an lsquotimesrsquo wherethe branches will be exchanged At the end of the branches arethe input nodes I1-I4 which provide information about thesystem At each node on the way up the tree an arithmetic op-eration is performed that combines the values passed up fromthe lower level This process continues until the last node hasbeen reached where the output is determined

arbitrary configurations this approach is still an interest-ing example of machine learning methods applied to en-ergy predictions and an extension to the representationof PESs would be possible In comparison to neural net-works the LS-SVM model required less reference pointsand provided slightly better results Very recently SVMshave been used for the first time to construct a continu-ous PES by Vitek et al [129] who studied water clusterscontaining up to six molecules

27 Genetic programming ndash function searching

Genetic programming (GP) uses populations of solutionsanalyzes them for their success and employs similar to ge-netic algorithms survival of the fittest concepts as well asmating of solutions and random mutations to create newpopulations containing better solutions [130] It is basedupon the idea of a tree structure (cf Fig 4) At the verybase of the tree are the input nodes which feed in infor-mation about the configuration of the atomic system Asone moves up the tree two inputs from a previous levelare combined returning an output which is fed upwardsthe tree to the next node and so forth until in the upper-most node a final arithmetic procedure is performed andreturns the output the energy prediction In this mannercomplex functions can be represented as a combination ofsimpler arithmetic functions

Genetic Programming in the above manner then relieson the randomization of the functions in all members ofthe population New daughter solutions are generated byrandomly performing interchanges between two membersof the best members of the parent population The in-terchange occurs by randomly selecting where in the treestructure for each parent the interchange will occur Thismeans that the tree structure can be very different for thedaughter solutions with the tree structure having been

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 12: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 12 of 16 Eur Phys J B (2014) 87 152

extended or pruned Randomly the daughter solutionsmay also undergo mutations ie a change in a functionused in the tree structure or a change in a parameterMany other types of crossover of tree structures can alsobe used Depending on the choice of functions GPs canhave the advantage that they can provide human read-able and sometimes even physically meaningful functionsin contrast to most methods discussed above

Genetic programming has been applied to simplePESs such as that of a water molecule [131] though morecomplex formulations of GPs have been used by Bellucciand Coker [132] to represent an empirical valence bondpotential formulation for determining the energy changein reactions Here three PESs have to be found one ofthe products one for the reactants and one that relatesthe two surfaces and describes the reaction Bellucci andCoker build on the concept of directed GPs where thefunctional form of the PES is partially defined and theGP search modifies and improves this first potential Forexample the GP searches for the best function that per-forms a non-linear transformation on the inputs beforethey are passed to the main predefined function The aimis to find the best Morse and Gaussian functions that to-gether describe the reactions They go beyond traditionalGP in the use of multiple GPs where a higher level ofGP modifies the probabilities of the search functions thatdirect the lower level GP fitting the PES functions

Brown et al proposed a hierachical approach to GPwhere the GP population is ranked based upon their fit-ness [133] This allows for the training procedure to retainindividuals so that while convergence is achieved it is notat the cost of the diversity of the solutions Maintaining adiverse set of solutions for the PES fitting allows for thegeneration of better approximations to the global mini-mum of the target function rather than just convergingto a local minimum The ranking of solutions is similar tothe concept of multi-objective genetic algorithms and theuse of Pareto ranking [134]

3 Discussion

All the potentials discussed in this review have in commonthat they are employing very general functional formswhich are not based on physical considerations There-fore they are equally applicable to all types of bondingand in this sense they are ldquonon-physicalrdquo ie they arecompletely unbiased In order to describe PESs correctlythe information about the topology of the PESs needs tobe provided in form of usually very large sets of referenceelectronic structure calculations and the presented poten-tials differ in the manner this information is stored Forsome of the potentials it is required that this data baseis still available when the energy of a new configurationis to be predicted This is the case for the MSI IMLSand GPRs making these methods more demanding withgrowing reference data set size On the other hand thesemethods reproduce the energies of the reference structuresin the final potentials error-free Methods like NNPs and

permutation invariant polynomials transform the informa-tion contained in the reference data set into a substantialnumber of parameters and the determination of a suitableset of parameter values can be a demanding task as far ascomputing time is concerned Still in contrast to manyconventional physical potentials there is usually no man-ual trial-and-error component in the fitting process Oftenlike in case of NNPs gradient-based optimization algo-rithms are used but they bear the risk of getting trappedin local minima of the high-dimensional parameter spaceand there is usually no hope to find the global minimumFortunately in most cases sufficiently accurate local min-ima in parameter space can be found which yield reliablepotentials

An interesting alternative approach to find the param-eters which has been used a lot in the context of potentialdevelopment but not for the potentials above is to em-ploy genetic algorithms (GAs) [135] Like neural networksthey represent a class of algorithms inspired by biologyand belong to the group of machine learning techniquesGAs work by representing the parameters determining atarget value as a bit string This bit string is randomizedand used to generate an initial population of bit stringsthat represent a range of values for the parameters Thusthe bit string represents not one but a series of parametervalues where each member of the population has differ-ent values for these parameters but each parameter canonly vary within given thresholds Each member of the ini-tial population is assessed for its performance The bestperforming members of the population are then used togenerate the next ldquodaughter populationrdquo This is done byrandomly selecting where along the bit string two popula-tion members undergo an interchange The result is thatfrom two ldquoparentrdquo bit strings two daughter bit strings areobtained The central idea is that one of the daughter bitstrings will retain the best parameters of both parents Inorder to allow for exploration of the parameter space thedaughters may also randomly undergo a ldquomutationrdquo Amutation is where a random bit in the daughter bit stringis switched from 0 to 1 or vice versa The final stage is tocombine the daughter and parent populations and fromthe combined population remove the poorly performingbit strings The final population then should be a combi-nation of the previous population and the newer daughterpopulation From this new population the process can bestarted again and over a number of generations the pop-ulation should converge on a bit string that gives the bestparameter set GAs have found wide use for many prob-lems in chemistry and physics such as the searching theconformation space [136137] and protein folding [138] aswell as the assignment of spectral information [139140]

GAs have also been used in many cases for the determi-nation of parameters in atomistic potentials The aim is tofind a set of parameters that minimizes a fitness functionwhere the fitness is given as a sum of weighted squarederrors For instance the work of Marques et al aims to si-multaneously fit the potential in their case the extended-Rydberg potential for NaLi and Ar2 to both ab initioenergies and to spectral data [141] Similarly in the work

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 13: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152 Page 13 of 16

of Da Cunha et al [142] 77 parameters have been fitted forthe Na + HF PES and Roncaratti et al [143] used GAs tofind the PESs of H+

2 and Li2 Xu and Liu use a GA to fitthe EAM model for bulk nickel where six parameters havebeen varied [144] Rather than fitting parameters a GAcan also be used to identify energetically relevant termsfrom a large number of possible interactions which hasbeen suggested for the cluster expansion method in theapproach taken by Hart et al [145146] The danger in allof these cases is that the GA will yield a single solutionand that this solution represents the optimization of theobjective function to a local minimum Therefore thereis the need to rerun the fitting procedure from differentstarting points ie different positions on the PES fittingerror surface in order to not get trapped in local minima

Parameter fitting for force fields has also been investi-gated by Pahari and Chaturvedi [147] Larsson et al [148]and Handley and Deeth [149] In the work of Larssonet al GAs are used to automate the exploration of themultidimensional parameter space of the ReaxFF forcefield where 67 parameters are to be optimized for theprediction of the PES of SiOH Pahari and Chaturvediperform a GA fitting of 51 of the ReaxFF parameterswhich were determined to be important to the descriptionof CH3NO In the work of Handley and Deeth the aim hasbeen to parameterize the Ligand Field Molecular Mechan-ics (LFMM) force field for the simulation of iron aminecomplexes [149] LFMM augments standard force fieldswith terms that allow for determination of the influenceof d-orbital electrons on the PES Here a multi-objectiveGA approach has been implemented where Pareto frontranking is performed upon the population [134]

GAs have not yet been employed to find parametersfor the potentials reviewed above and certainly furthermethodical extensions are required to reduce the costs ofusing GAs for determining thousands of parameters egin case of NNPs The reason for these high costs is that thedetermination of the fitness function requires performinggradient-based optimizations of the parameters for eachmember of the population in each generation since mu-tations or crossovers for the generation of new populationmembers typically results in non-competitive parametersets that need to be optimized to find the closest localminimum

4 Conclusions

A variety of methods has now become available to con-struct general-purpose potentials using bias-free and flexi-ble functional forms enabling the description of all types ofinteractions on an equal footing These developments havebeen driven by the need for highly accurate potentials pro-viding a quality close to first-principles methods which isvery difficult to achieve by conventional atomistic poten-tials based on physical approximations The selection ofthe specific functional forms of these ldquonext-generation po-tentialsrdquo has been guided by the physical problems to bestudied and there are methods aiming primarily at low-dimensional systems like small molecules and molecule-

surface interactions eg polynomial fitting the modifiedShepard interpolation and IMLS Other methods havefocussed on condensed systems and are able to capturehigh-order many-body interactions as needed for the de-scription of metals like NNPs and GPRs The use of fur-ther methods like support vector machines and geneticprogramming for the construction of potentials is still lessevolved but this will certainly change in the near future

All of these potentials have now overcome the con-ceptual problems related to incorporating the rotationaland translational invariance as well as the permutationsymmetry of the energy with respect to the exchange oflike atoms Most of the underlying difficulties rather con-cerned the geometrical description of the atomic config-urations than the fitting process itself A lot of interest-ing solutions have been developed for this purpose whichshould in principle be transferable from one method toanother The current combinations of these quite modu-lar components ie describing the atomic configurationand assigning the energy have been mainly motivated bythe intended applications Therefore it is only a questionof time when ideas will spread and concepts from one ap-proach will be used in other contexts to establish evenmore powerful techniques

At the present stage it has been demonstrated formany examples covering a wide range of systems fromsmall molecules to solids containing thousands of atomsthat a very high accuracy of typically only a few meVper atom with respect to a given reference method canbe reached However due to the non-physical functionalform large reference data sets are required to constructthese potentials making their development computation-ally more demanding than for most conventional poten-tials Still this effort pays back quickly in large-scale appli-cations which are simply impossible otherwise Howeverthe number of applications which clearly go beyond theproof-of-principle level by addressing and solving physicalproblems that cannot be studied by other means is stillmoderate Most of these applications can be related to afew very active research groups which are involved in thedevelopment of these methods and have the required ex-perience This is because presently the complexity of themethods and of the associated fitting machinery is a ma-jor barrier towards their wider use but again it is only aquestion of time when more program packages will becomegenerally available

In summary tremendous progress has been made inthe development of atomistic potentials A very high ac-curacy can now be achieved and it can be anticipatedthat these methods will be further extended and becomeestablished tools in materials science Some challenges re-main like the construction of potentials for systems con-taining a very large number of chemical elements whichare difficult to describe geometrically and rather costly tosample when constructing the reference sets due to thesize of their configuration space On the other hand todate only a tiny fraction of the knowledge about machinelearning methods which is available in the computer sci-ence community has been transferred to the construction

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 14: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 14 of 16 Eur Phys J B (2014) 87 152

of atomistic potentials Consequently many exciting newdevelopments are to be expected in the years to come

Financial support by the DFG (Emmy Noether groupBe32643-1) is gratefully acknowledged

References

1 RG Parr W Yang Density Functional Theory of Atomsand Molecules (Oxford University Press 1989)

2 R Car M Parrinello Phys Rev Lett 55 2471 (1985)3 D Marx J Hutter Ab Initio Molecular Dynamics Basic

Theory and Advanced Methods (Cambridge UniversityPress 2009)

4 AP Sutton MW Finnis DG Pettifor Y Ohta JPhys C 21 35 (1988)

5 CM Goringe DR Bowler E Hernandez Rep ProgPhys 60 1447 (1997)

6 M Elstner Theor Chem Acc 116 316 (2006)7 L Colombo Comput Mater Sci 12 278 (1998)8 T Hammerschmidt R Drautz DG Pettifor Int J

Mater Res 100 1479 (2009)9 N Allinger in Advances in Physical Organic Chemistry

(Academic Press 1976) Vol 13 pp 1ndash8210 BR Brooks RE Bruccoleri BD Olafson DJ States

S Swaminathan M Karplus J Comput Chem 4 187(1983)

11 ACT van Duin S Dasgupta F Lorant WA GoddardIII J Phys Chem A 105 9396 (2001)

12 A Warshel RM Weiss J Am Chem Soc 102 6218(1980)

13 J Tersoff Phys Rev Lett 56 632 (1986)14 J Tersoff Phys Rev B 37 6991 (1988)15 DW Brenner Phys Rev B 42 9458 (1990)16 E Pijper G-J Kroes RA Olsen EJ Baerends J

Chem Phys 117 5885 (2002)17 MS Daw SM Foiles MI Baskes Mater Sci Rep 9

251 (1993)18 MI Baskes Phys Rev Lett 59 2666 (1987)19 JC Slater GF Koster Phys Rev 94 1498 (1954)20 Th Frauenheim G Seifert M Elsterner Z Hajnal G

Jungnickel D Porezag S Suhai R Scholz Phys StatSol B 217 41 (2000)

21 DA Papaconstantopoulos MJ Mehl J PhysCondens Matter 15 R413 (2003)

22 R Hoffman J Chem Phys 39 1397 (1963)23 RJ Deeth Coord Chem Rev 212 11 (2001)24 RJ Deeth A Anastasi C Diedrich K Randell Coord

Chem Rev 253 795 (2009)25 A Brown BJ Braams KM Christoffel Z Jin JM

Bowman J Chem Phys 119 8790 (2003)26 Z Xie JM Bowman J Chem Theory Comput 6 26

(2010)27 BJ Braams JM Bowman Int Rev Phys Chem 28

577 (2009)28 Y Wang BC Shepler BJ Braams JM Bowman J

Chem Phys 131 054511 (2009)29 X Huang BJ Braams JM Bowman J Phys Chem

A 110 445 (2006)

30 X Huang BJ Braams JM Bowman J Chem Phys122 044308 (2005)

31 AR Sharma BJ Braams S Carter BC Shepler JMBowman J Chem Phys 130 174301 (2009)

32 CE Rasmussen CKI Williams Gaussian Processes forMachine Learning (MIT Press 2006)

33 CM Handley GI Hawe DB Kell PLA PopelierPhys Chem Chem Phys 11 6365 (2009)

34 MJL Mills PLA Popelier Comput Theor Chem975 42 (2011)

35 MJL Mills PLA Popelier Theor Chem Acc 131 1(2012)

36 MJL Mills GI Hawe CM Handley PLA PopelierPhys Chem Chem Phys 15 18249 (2013)

37 TJ Hughes SM Kandathil PLA PopelierSpectrochim Acta A Mol Biomol Spectrosc inpress (2013) DOI101016jsaa201310059

38 AP Bartok MC Payne R Kondor G Csanyi PhysRev B 87 184115 (2013)

39 AP Bartok MC Payne R Kondor G Csanyi PhysRev Lett 104 136403 (2010)

40 AP Bartok MJ Gillan FR Manby G Csanyi PhysRev B 88 054104 (2013)

41 C Lee W Yang RG Parr Phys Rev B 37 785 (1998)42 AD Becke Phys Rev A 38 3098 (1998)43 M Rupp A Tkatchenko K-R Muller OA von

Lilienfeld Phys Rev Lett 108 058301 (2012)44 K Hansen G Montavon F Biegler S Fazli M Rupp

M Scheffler OA von Lilienfeld A Tkatchenko KRMuller J Chem Theory Comput 9 3404 (2013)

45 M Rupp MR Bauer R Wilcken A Lange MReutlinger FM Boeckler G Schneider PLoS ComputBiol 10 e1003400 (2014)

46 P Lancaster K Salkauskas Curve and Surface FittingAn Introduction (Academic Press 1986)

47 J Ischtwan MA Collins J Chem Phys 100 8080(1994)

48 MJT Jordan KC Thompson MA Collins J ChemPhys 103 9669 (1995)

49 MJT Jordan MA Collins J Chem Phys 104 4600(1996)

50 T Wu H-J Werner U Manthe Science 306 2227(2004)

51 T Wu H-J Werner U Manthe J Chem Phys 124164307 (2006)

52 T Takata T Taketsugu K Hiaro MS Gordon JChem Phys 109 4281 (1998)

53 CR Evenhuis U Manthe J Chem Phys 129 024104(2008)

54 C Crespos MA Collins E Pijper GJ Kroes ChemPhys Lett 376 566 (2003)

55 C Crespos MA Collins E Pijper GJ Kroes J ChemPhys 120 2392 (2004)

56 DH McLain Comput J 17 318 (1974)57 T Ishida GC Schatz Chem Phys Lett 314 369

(1999)58 R Dawes DL Thompson Y Guo AF Wagner M

Minkoff J Chem Phys 126 184108 (2007)59 R Dawes DL Thompson AF Wagner M Minkoff J

Chem Phys 128 084107 (2008)60 GG Maisuradze DL Thompson AF Wagner M

Minkoff J Chem Phys 119 10002 (2003)

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 15: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Eur Phys J B (2014) 87 152 Page 15 of 16

61 Y Guo A Kawano DL Thompson AF Wagner MMinkoff J Chem Phys 121 5091 (2004)

62 R Dawes AF Wagner DL Thompson J Phys ChemA 113 4709 (2009)

63 JP Camden R Dawes DL Thompson J Phys ChemA 113 4626 (2009)

64 R Dawes X-G Wang T Carrington J Phys ChemA 117 7612 (2013)

65 R Dawes X-G Wang AW Jasper T Carrington JChem Phys 133 134304 (2010)

66 J Brown X-G Wang R Dawes T Carrington JChem Phys 136 134306 (2012)

67 G Li J Hu S-W Wang PG Georgopoulos JSchoendorf H Rabitz J Phys Chem A 110 2474(2006)

68 A Kawano IV Tokmakov DL Thompson AFWagner M Minkoff J Chem Phys 124 054105 (2006)

69 CM Bishop Neural Networks for Pattern Recognition(Oxford University Press 1996)

70 S Haykin Neural Networks and Learning Machines(Pearson Education 1986)

71 W McCulloch W Pitts Bull Math Biophys 5 115(1943)

72 J Gasteiger J Zupan Angew Chem Int Ed 32 503(1993)

73 TB Blank SD Brown AW Calhoun DJ Doren JChem Phys 103 4129 (1995)

74 E Tafeit W Estelberger R Horejsi R Moeller KOettl K Vrecko G Reibnegger J Mol Graphics 1412 (1996)

75 FV Prudente PH Acioli JJ Soares Neto J ChemPhys 109 8801 (1998)

76 H Gassner M Probst A Lauenstein K HermanssonJ Phys Chem A 102 4596 (1998)

77 S Lorenz A Groszlig M Scheffler Chem Phys Lett 395210 (2004)

78 J Behler S Lorenz K Reuter J Chem Phys 127014705 (2007)

79 J Ludwig DG Vlachos J Chem Phys 127 154716(2007)

80 CM Handley PLA Popelier J Phys Chem A 1143371 (2010)

81 J Behler Phys Chem Chem Phys 13 17930 (2011)82 J Behler J Phys Condens Matter 26 183001 (2014)83 S Haykin Neural Networks A Comprehensive

Foundation (Macmillan College Publishing Company1994)

84 G Cybenko Math Control Signals Systems 2 303 (1989)85 K Hornik M Stinchcombe H White Neural Netw 2

359 (1989)86 DE Rumelhart GE Hinton RJ Williams Nature

323 533 (1986)87 TB Blank SD Brown J Chemometrics 8 391 (1994)88 A Pukrittayakamee M Malshe M Hagan LM Raff

R Narulkar S Bukkapatnum R Komanduri J ChemPhys 130 134101 (2009)

89 N Artrith J Behler Phys Rev B 85 045439 (2012)90 K-H Cho K-H No HA Scheraga J Mol Struct 641

77 (2002)91 WL Jorgensen J Chandrasekhar JD Madura RW

Impey J Chem Phys 79 962 (1983)92 S Manzhos T Carrington J Chem Phys 125 84109

(2006)

93 S Manzhos T Carrington J Chem Phys 125 194105(2006)

94 S Manzhos T Carrington J Chem Phys 127 014103(2007)

95 S Manzhos T Carrington J Chem Phys 129 224104(2008)

96 M Malshe R Narulkar LM Raff M Hagan SBukkapatnam PM Agrawal R Komanduri J ChemPhys 130 184101 (2009)

97 S Hobday R Smith J BelBruno Modelling SimulMater Sci Eng 7 397 (1999)

98 S Hobday R Smith J BelBruno Nucl InstrumMethods Phys Res B 153 247 (1999)

99 A Bholoa SD Kenny R Smith Nucl InstrumMethods Phys Res B 255 1 (2007)

100 E Sanville A Bholoa R Smith SD Kenny J PhysCondens Matter 20 285219 (2008)

101 J Behler M Parrinello Phys Rev Lett 98 146401(2007)

102 J Behler J Chem Phys 134 074106 (2011)103 N Artrith T Morawietz J Behler Phys Rev B 83

153101 (2011)104 T Morawietz V Sharma J Behler J Chem Phys 136

064103 (2012)105 J Behler R Martonak D Donadio M Parrinello Phys

Rev Lett 100 185501 (2008)106 RZ Khaliullin H Eshet TD Kuhne J Behler M

Parrinello Phys Rev B 81 100103 (2010)107 H Eshet RZ Khaliullin TD Kuhne J Behler M

Parrinello Phys Rev B 81 184107 (2010)108 N Artrith J Behler Phys Rev B 85 045439 (2012)109 GC Sosso G Miceli S Caravati J Behler M

Bernasconi Phys Rev B 85 174103 (2012)110 N Artrith B Hiller J Behler Phys Stat Sol B 250

1191 (2013)111 T Morawietz J Behler J Phys Chem A 117 7356

(2013)112 T Morawietz J Behler Z Phys Chem 227 1559 (2013)113 S Houlding SY Liem PLA Popelier Int J Quant

Chem 107 2817 (2007)114 MG Darley CM Handley PLA Popelier J Chem

Theor Comput 4 1435 (2008)115 CM Handley PLA Popelier J Chem Theor Chem

5 1474 (2009)116 PLA Popelier Atoms in Molecules An Introduction

(Pearson Education 2000)117 PLA Popelier FM Aicken ChemPhysChem 4 824

(2003)118 PLA Popelier AG Bremond Int J Quant Chem

109 2542 (2009)119 J Li B Jiang H Guo J Chem Phys 139 204103

(2013)120 R Fournier O Slava J Chem Phys 139 234110 (2013)121 P Geiger C Dellago J Chem Phys 139 164105 (2013)122 PJ Haley D Soloway in International Joint Conference

on Neural Networks IJCNN 1992 Vol 4 pp 25ndash30123 AG Wilson E Gilboa A Nehorai JP Cunningham

arXiv13105288v3 (2013)124 VN Vapnik The Nature of Statistical Learning Theory

(Springer-Verlag 1995)125 C Cortes V Vapnik Mach Learn 20 273 (1995)126 ZR Yang Brief Bioinform 5 328 (2004)

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References
Page 16: Next generation interatomic potentials for condensed systems · Next generation interatomic potentials for condensed systems ... all imaginable systems and applications. The Tersoff

Page 16 of 16 Eur Phys J B (2014) 87 152

127 RM Balabin EI Lomakina Phys Chem Chem Phys13 11710 (2011)

128 W Chu SS Keerthi CJ Ong IEEE Trans NeuralNetworks 15 29 (2004)

129 A Vitek M Stachon P Kromer V Snael inInternational Conference on Intelligent Networking andCollaborative Systems (INCoS) 2013 pp 121ndash126

130 JR Koza Genetic Programming On the Programing ofComputers by Means of Natural Selection (MIT Press1992)

131 DE Makarov H Metiu J Chem Phys 108 590 (1998)132 MA Bellucci DF Coker J Chem Phys 135 044115

(2011)133 MW Brown AP Thompson PA Schultz J Chem

Phys 132 024108 (2010)134 E Zitzler L Thiele IEEE Trans Evol Comput 3 257

(1999)135 DE Goldberg Genetic Algorithms in Search

Optimization and Machine Learning (Addison-Wesley1998)

136 B Hartke Struct Bond 110 33 (2004)137 AR Oganov CW Glass J Chem Phys 124 244704

(2006)138 F Koskowski B Hartke J Comput Chem 26 1169

(2005)

139 W Leo Meerts M Schmitt Int Rev Phys Chem 25353 (2006)

140 MH Hennessy AM Kelley Phys Chem Chem Phys6 1085 (2004)

141 JMC Marques FV Prudente FB Pereira MMAlmeida MM Maniero CE Fellows J Phys B 41085103 (2008)

142 WF Da Cunha LF Roncaratti R Gargano GMESilva Int J Quant Chem 106 2650 (2006)

143 LF Roncaratti R Gargano GME Silva J MolStruct (Theochem) 769 47 (2006)

144 YG Xu GR Liu J Micromech Microeng 13 254(2003)

145 GLW Hart V Blum MJ Walorski A Zunger NatMater 4 391 (2005)

146 V Blum GLW Hart MJ Walorski A Zunger PhysRev B 72 165113 (2005)

147 P Pahari S Chaturvedi J Mol Model 18 1049 (2012)148 HR Larsson ACT van Duin BJ Hartke Comput

Chem 34 2178 (2013)149 CM Handley RJ Deeth J Chem Theor Comput 8

194 (2012)

  • Introduction
  • General-purpose atomistic potentials
  • Discussion
  • Conclusions
  • References