arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

16
Neural Network Potentials: A Concise Overview of Methods Emir Kocer, 1, a) Tsz Wai Ko, 2, a) and J¨ org Behler 3, b) 1) Universit¨ at G¨ ottingen, Institut f¨ ur Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 G¨ottingen, Germany; email: [email protected]; ORCID: 0000-0003-4207-8220 2) Universit¨ at G¨ ottingen, Institut f¨ ur Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 G¨ottingen, Germany; email: [email protected]; ORCID: 0000-0002-0802-9559 3) Universit¨ at G¨ ottingen, Institut f¨ ur Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 G¨ottingen, Germany; email: [email protected]; ORCID: 0000-0002-1220-1542 In the past two decades, machine learning potentials (MLP) have reached a level of maturity that now enables applications to large-scale atomistic simulations of a wide range of systems in chemistry, physics and materials science. Different machine learning algorithms have been used with great success in the construction of these MLPs. In this review, we discuss an important group of MLPs relying on artificial neural networks to establish a mapping from the atomic structure to the potential energy. In spite of this common feature, there are important conceptual differences, which concern the dimensionality of the systems, the inclusion of long- range electrostatic interactions, global phenomena like non-local charge transfer, and the type of descriptor used to represent the atomic structure, which can either be predefined or learnable. A concise overview is given along with a discussion of the open challenges in the field. Keywords: neural network potentials, machine learning, atomistic simulations, molecular dynamics, potential energy surfaces I. INTRODUCTION AND SCOPE OF THIS REVIEW Machine learning (ML) plays an increasingly impor- tant role in many aspects of life 1,2 . The ability to provide, after a training process, accurate predictions make mod- ern ML algorithms a very attractive tool for applications in all fields of science, in which the analysis, classifica- tion and interpretation of data is important 3–8 . Conse- quently, data science and machine learning are now called the “fourth paradigm of science” 9 along with the three established paradigms of empirical experimental studies, theoretical models, and computer simulations. A particularly fruitful combination of machine learn- ing and computer simulations that has emerged in the past two decades is the representation of the multidi- mensional potential energy surface (PES) by machine learning potentials (MLP) 10–20 . The PES is of central importance for reaching an atomic-level understanding of any type of system, from small molecules to bulk ma- terials, as it contains all the information about the sta- ble and metastable structures, the atomic forces driv- ing the dynamics at finite temperatures, the transition states and barriers governing reactions and structural transitions, and also the atomic vibrations. For a long time computer simulations in chemistry, molecular biol- ogy and materials science either had to rely on computa- tionally demanding electronic structure calculations like density-functional theory (DFT) or efficient but signif- icantly less accurate empirical potentials or force fields based on physical intuition and approximations. Modern MLPs bridge this gap by learning the shape of the PES from reference data obtained from high-level electronic a) These authors contributed equally to this paper. b) Corresponding author: [email protected] structure calculations. The resulting analytic MLPs rep- resent the atomic interactions, commonly as a function of the atomic positions and nuclear charges, and can then be used in large-scale simulations like molecular dynamics (MD) many orders of magnitude faster than the underly- ing electronic structure calculations without a significant loss in accuracy. MLPs thus represent an important tool to meet the continuously increasing need for computer simulations of more and more complex systems. Since the advent of the first MLP in 1995 21 , many different types of methods have been proposed, like neu- ral network potentials (NNP) 22–25 , Gaussian approxima- tion potentials (GAP) 26,27 , kernel-based approaches like gradient domain ML (GDML) 28 , spectral neighbor anal- ysis potentials (SNAP) 29,30 , moment tensor potentials (MTP) 31 , atomic cluster expansion (ACE) 32 , atomic per- mutationally invariant polynomials (aPIP) 33 and support vector machines (SVM) 34 . MLPs offer many advantages, like a very flexible functional form that allows to repre- sent the available reference data with a very high accu- racy 35 , generality making them suitable for all types of bonding and atomic interactions, from covalent bonds via metallic bonding to dispersion interactions, and a well- defined analytic form enabling the consistent computa- tion of energies and gradient-based properties like forces and the stress tensor. The main disadvantage of MLPs is the lack of a physical functional form and the resulting limited transferability beyond the structural diversity of the underlying training data, making a careful validation of the obtained potentials an essential step. For the same reason, the construction of MLPs is computationally very demanding, since large training sets are required to en- sure that all required information about the topology of the PES can be learned from the reference data. The applicability of current MLPs is very diverse in many respects, e.g., concerning the dimensionality of the systems, the range of the atomic interactions that can arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

Transcript of arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

Page 1: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

Neural Network Potentials: A Concise Overview of MethodsEmir Kocer,1, a) Tsz Wai Ko,2, a) and Jorg Behler3, b)1)Universitat Gottingen, Institut fur Physikalische Chemie, Theoretische Chemie, Tammannstraße 6,37077 Gottingen, Germany; email: [email protected]; ORCID: 0000-0003-4207-82202)Universitat Gottingen, Institut fur Physikalische Chemie, Theoretische Chemie, Tammannstraße 6,37077 Gottingen, Germany; email: [email protected]; ORCID: 0000-0002-0802-95593)Universitat Gottingen, Institut fur Physikalische Chemie, Theoretische Chemie, Tammannstraße 6,37077 Gottingen, Germany; email: [email protected]; ORCID: 0000-0002-1220-1542

In the past two decades, machine learning potentials (MLP) have reached a level of maturity that nowenables applications to large-scale atomistic simulations of a wide range of systems in chemistry, physics andmaterials science. Different machine learning algorithms have been used with great success in the constructionof these MLPs. In this review, we discuss an important group of MLPs relying on artificial neural networks toestablish a mapping from the atomic structure to the potential energy. In spite of this common feature, thereare important conceptual differences, which concern the dimensionality of the systems, the inclusion of long-range electrostatic interactions, global phenomena like non-local charge transfer, and the type of descriptorused to represent the atomic structure, which can either be predefined or learnable. A concise overview isgiven along with a discussion of the open challenges in the field.

Keywords: neural network potentials, machine learning, atomistic simulations, molecular dynamics, potentialenergy surfaces

I. INTRODUCTION AND SCOPE OF THIS REVIEW

Machine learning (ML) plays an increasingly impor-tant role in many aspects of life1,2. The ability to provide,after a training process, accurate predictions make mod-ern ML algorithms a very attractive tool for applicationsin all fields of science, in which the analysis, classifica-tion and interpretation of data is important3–8. Conse-quently, data science and machine learning are now calledthe “fourth paradigm of science”9 along with the threeestablished paradigms of empirical experimental studies,theoretical models, and computer simulations.

A particularly fruitful combination of machine learn-ing and computer simulations that has emerged in thepast two decades is the representation of the multidi-mensional potential energy surface (PES) by machinelearning potentials (MLP)10–20. The PES is of centralimportance for reaching an atomic-level understandingof any type of system, from small molecules to bulk ma-terials, as it contains all the information about the sta-ble and metastable structures, the atomic forces driv-ing the dynamics at finite temperatures, the transitionstates and barriers governing reactions and structuraltransitions, and also the atomic vibrations. For a longtime computer simulations in chemistry, molecular biol-ogy and materials science either had to rely on computa-tionally demanding electronic structure calculations likedensity-functional theory (DFT) or efficient but signif-icantly less accurate empirical potentials or force fieldsbased on physical intuition and approximations. ModernMLPs bridge this gap by learning the shape of the PESfrom reference data obtained from high-level electronic

a)These authors contributed equally to this paper.b)Corresponding author: [email protected]

structure calculations. The resulting analytic MLPs rep-resent the atomic interactions, commonly as a function ofthe atomic positions and nuclear charges, and can then beused in large-scale simulations like molecular dynamics(MD) many orders of magnitude faster than the underly-ing electronic structure calculations without a significantloss in accuracy. MLPs thus represent an important toolto meet the continuously increasing need for computersimulations of more and more complex systems.

Since the advent of the first MLP in 199521, manydifferent types of methods have been proposed, like neu-ral network potentials (NNP)22–25, Gaussian approxima-tion potentials (GAP)26,27, kernel-based approaches likegradient domain ML (GDML)28, spectral neighbor anal-ysis potentials (SNAP)29,30, moment tensor potentials(MTP)31, atomic cluster expansion (ACE)32, atomic per-mutationally invariant polynomials (aPIP)33 and supportvector machines (SVM)34. MLPs offer many advantages,like a very flexible functional form that allows to repre-sent the available reference data with a very high accu-racy35, generality making them suitable for all types ofbonding and atomic interactions, from covalent bonds viametallic bonding to dispersion interactions, and a well-defined analytic form enabling the consistent computa-tion of energies and gradient-based properties like forcesand the stress tensor. The main disadvantage of MLPsis the lack of a physical functional form and the resultinglimited transferability beyond the structural diversity ofthe underlying training data, making a careful validationof the obtained potentials an essential step. For the samereason, the construction of MLPs is computationally verydemanding, since large training sets are required to en-sure that all required information about the topology ofthe PES can be learned from the reference data.

The applicability of current MLPs is very diverse inmany respects, e.g., concerning the dimensionality of thesystems, the range of the atomic interactions that can

arX

iv:2

107.

0372

7v1

[ph

ysic

s.ch

em-p

h] 8

Jul

202

1

Page 2: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

2

be described, and the possible inclusion of physical lawsand concepts. Recently, a classification scheme for MLPsinto four generations has been proposed36–38, which wewill also adopt in this review. First-generation MLPscan provide accurate PESs for low-dimensional systemslike small molecules. Second-generation MLPs make useof the locality of a major part of the atomic interac-tions and employ the concept of atomic energies makingthese potentials applicable to high-dimensional systemscontaining thousands of atoms. Third-generation MLPsovercome this locality approximation by explicitly includ-ing in addition long-range interactions like electrostaticsbased on Coulomb’s law without truncation. Still, theyrely on atomic charges depending on the local chemicalenvironment. Finally, fourth-generation potentials areable to capture phenomena like non-local charge transferby taking into account the global structure and chargedistribution of the system.

In this review, we will focus on discussing MLPs basedon neural networks (NN). NNPs do not only have thelongest history, but also by far the largest diversity interms of methods and concepts of all classes of MLPs.We will restrict our discussion to methods providingcontinuous PESs for arbitrary atomic positions suitablefor molecular dynamics simulations, while we note thatmany other interesting NN-based approaches have beenproposed and successfully tested to predict, e.g., atom-ization energies of large sets of organic molecules in theirglobal minimum geometries39 and a variety of propertiesof materials40–42. Further, we do not include methodsused for predicting atomic partial charges or multipolesunless they are used to explicitly compute the electro-static energy contribution to the PES. Our aim is topresent the main methods and concepts, while applica-tions are only briefly mentioned. The interested readeris referred to many reviews providing comprehensive listsof such applications10,11,15,43.

Even with this restricted scope, giving a comprehen-sive overview about all available methods in this rapidlyexpanding field is an impossible endeavor. Figure 1 showsa schematic timeline of the historic evolution of NNPs us-ing representative examples. The presented methods areclassified based on the types of descriptors of the atomicconfigurations, which can be predefined or learnable, thedimensionality of the systems that can be studied, andthe inclusion or absence of long-range electrostatic inter-actions. These criteria will provide the central guidancefor the discussion of the individual methods in this re-view in the following sections. In addition, the evolutionof NNPs is put into the context of other important MLPmethods proposed to date, which are given in the rightpart of Fig. 1 but are not discussed here in more detail.

II. THE FIRST GENERATION: LOW-DIMENSIONALSYSTEMS

In 1995, Doren and coworkers published the first ma-chine learning potential trained to electronic structuredata enabling simulations at strongly reduced costs21.They used a feed-forward neural network to construct amapping from the atomic positions to the potential en-ergy for the adsorption of H2 on a cluster model of theSi(100) surface. In this pioneering work, many techni-cal problems have been investigated and solved, whichnowadays are considered as textbook knowledge aboutMLPs, and the advantages and limitations of machinelearning for representing PESs have been thoroughly dis-cussed. In the following decade, the applicability of neu-ral networks has further been explored by several groupsfor numerous molecular systems44–47. Another branchof applications addressed adsorption and gas-surface dy-namics22,48–50. The construction of NNPs for condensedsystems like liquid and bulk materials was out of reachat that time, but several attempts have been made toaugment empirical potentials suitable for larger systemsby neural networks51–53.

In addition to these applications, very promising me-thodical advances have been made, resulting e.g. in veryaccurate potentials based on a sum of products, opti-mized coordinates and a high-dimensional model rep-resentation proposed by Manzhos and Carrington Jr.,which reached spectroscopic accuracy54–56. In 2009, Mal-she et al. proposed to use a set of neural networks forthe different terms in a many-body expansion57. Finally,in 2013 Jiang, Li and Guo used permutationally invari-ant polynomials (PIP), originally developed by Braams,Bowman and coworkers for the construction of PESs withfull permutation invariance58,59, as input coordinates forconstructing molecular PIP-NNs60,61.

In spite of the diversity of methods and systems thathave been studied in the initial decade, the common fea-ture of all these first-generation MLPs, which exclusivelyrelied on neural networks, is the small number of de-grees of freedom that could be taken into account ex-plicitly, limiting applications to small molecules in vac-uum, molecules interacting with frozen surfaces, or theselection of specific low-dimensional additive terms, oftenin combination with conventional empirical potentials orforce fields. A summary of these applications can befound in two early reviews10,11.

III. THE SECOND GENERATION: LOCAL METHODSFOR HIGH-DIMENSIONAL SYSTEMS

A. Overview

Many important advances have been made in first-generation NNPs, but the use of first-generation NNPsremained limited to rather low-dimensional systems,which include the degrees of freedom of a few atoms only.

Page 3: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

3

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

1995

1998

Doren [21]

Hermansson [52]

Lorenz [22]

singleNN [72]

BIM-NN [78]

Manzhos [55]

PhysNet [25]

Second Generation

Third Generation

First Generation

CENT [107]

Tensormol [106]

DeePMD [81]

ANI [71]

GAP [26]

MTP [31]

SNAP [29]

ACE [32]

aPIP [33]

3G-HDNNP [102]

2G-HDNNP [23]

AIMNet [91]

AIMNet-ME [16]

SchNet [89]

GDML [28]

HIP-NN [90]

DTNN [88]

BpopNN [108]

4G-HDNNP [38]

PredefinedDescriptors

Neural Network Potentials

LearnableDescriptors

Other Typesof MLPs

PIPNN [60]

PairHDNNP [77]

EANN [66]

AP-Net [80]

no electrostatics

low-dimensional

high-dimensional

3G electrostatics

4G electrostatics

Fourth Generation

FIG. 1. Timeline for the evolution of neural network potentials. NNPs using predefined descriptors are shown on the left, NNPswith learnable descriptors (message passing neural networks) on the right. The frame width and background color label thedimensionality and the absence or inclusion of long-range electrostatic interactions in third (3G) and fourth (4G) generationpotentials. Some other important types of MLPs not discussed in this review are given with references in grey on the right.

The main conceptual problem preventing the applicationof MLPs to high-dimensional systems containing thou-sands of atoms has been the lack of suitable structuraldescriptors including the mandatory invariances of thetotal energy with respect to translation, rotation and per-mutation, i.e. the order, of chemically equivalent atomsin the system. Further, the use of a single neural net-work to express the global energy of the system, which isa central component of many early NNPs, did not allow

the application to systems containing variable numbersof atoms, as the dimensionality of the system is related tothe number of input neurons, which cannot be changedafter the training of the NN.

Many empirical potentials in the field of materialsscience construct the total energy as a sum of atomicenergies,

Page 4: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

4

Etotal =

Natoms∑i=1

Ei , (1)

and also most classical force fields used, e.g., in biomolec-ular chemistry can be cast into this form. For these typesof potentials incorporating all invariances into the poten-tial is straightforward, since they consist of very simplelow-dimensional terms, which are additive and rely ontranslationally and rotationally invariant internal coor-dinates like interatomic distances, angles, and dihedralangles. This is different in case of MLPs, which have amore complex multidimensional functional form allowingto reach a very high numerical accuracy and require anordered vector of input coordinates lacking intrinsic per-mutation invariance. Consequently, in the first decadeof MLP development it has not been possible to makeuse of Eq. 1, and the search for suitable descriptors tak-ing all invariances into account has been a frustratingchallenge with only a few special-purpose solutions forlow-dimensional systems52,62.

This problem has now been solved, and from today’sperspective it is hard to imagine the difficult situation ofearly MLP developers, because a huge variety of differentdescriptors for high-dimensional systems is now readilyavailable63,64. The resulting very powerful second gener-ation of MLPs based on Eq. 1 is applicable to systemsof in principle arbitrary size, and many combinations ofthe two main ingredients of modern MLPs, the descrip-tors of the atomic environments and the ML algorithmconnecting the descriptor values to energies, have beenexplored.

Two major classes of second-generation NNPs can bedistinguished depending on the type of descriptor. Thefirst high-dimensional NNPs that became available madeuse of descriptors with predefined functional forms, whichcontain a few parameters defining the spatial shape. Inthis respect, they share some similarities with basis func-tions used in electronic structure calculations. Some-times these parameters are even optimized during thefitting process65,66. Due to their simplicity and efficiency,such predefined descriptors are still dominantly used inapplications of current MLPs. In addition, in recent yearsa second class of second-generation NNPs has receivedincreasing attention, which is based on the automaticlearning of descriptors from structural information usingmessage passing neural networks. This is not to be con-fused with the optimization of the parameters of the pre-defined descriptors, since no ad-hoc assumptions aboutthe functional form are made. Both approaches will bepresented in the subsequent sections and the most im-portant examples will be briefly discussed.

B. Predefined Descriptors

1. High-Dimensional Neural Network Potentials

Using Eq. 1 to construct the energy of the system im-plies the partitioning of the total energy into atomiccontributions, and these contributions can often, to agood approximation, be expressed as a function of thelocal atomic environment, if the spatial cutoff for de-scribing the environment is chosen large enough to in-clude all energetically relevant atomic interactions. Thefirst MLPs making use of this locality approximation arehigh-dimensional neural network potentials (HDNNP)proposed by Behler and Parrinello in 200723. In thisapproach, which for the first time enabled to performlarge-scale simulations driven by a MLP, each atomic en-ergy is calculated as the output of an individual atomicneural network (s. Fig. 2a,b). To ensure permutationinvariance, the architecture and weight parameters of allatomic neural networks for atoms of the same element areconstrained to be the same. Alternative, the method canbe viewed as using one separate atomic neural networkper element, which is evaluated as many times as atomsof the respective element are present in the system. Thetraining of these atom-based second-generation HDNNPs(2G-HDNNPs) is done using energies and forces. Thepartitioning into atomic energies is carried out automat-ically in the fitting process allowing the direct use of totalenergies. Once trained, the HDNNP can then be appliedto simulations of systems of variable size by adaptingthe number of atomic neural networks accordingly, re-sulting in O(N) scaling of the computational costs of themethod.The key step for the introduction of HDNNPs has beenthe development of a novel type of descriptor, atom-centered symmetry functions (ACSF)23,67,68, which forthe first time allowed to construct structural fingerprintsof the local environments with exact translational, ro-tational and permutational invariances. Since all struc-turally equivalent atomic environments give rise to thesame symmetry function vector, these invariances arealso preserved in the atomic energy outputs of the neuralnetworks.The local atomic environments are defined by a cutofffunction

fcut(Ri,j) =

{0.5(

cos(πRij

Rc

)+ 1), if Rij ≤ Rc

0, if Rij ≥ Rc

(2)

which decays smoothly to zero in value and slope at thecutoff radius Rc, and many alternative cutoff functionsare available69,70. Rij is the distance between the ref-erence atom i and any neighboring atom j. The cut-off function is a central ingredient of ACSFs and manyother similar descriptors, which describe the positions ofall neighboring atoms inside the resulting cutoff spheres.Many types of ACSFs have been proposed in the litera-ture to characterize the radial and angular distribution

Page 5: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

5

R1

RN

E1

EN

Etotal

CartesianCoordinates

Second-Generation Neural Network Potentials with Predefined DescriptorsGeneral Structure

Atomic/PairNNs

Atomic/PairEnergies

{G }1

{G }N

Atom-CenteredSymmetryFunctions

{G }12

{G }N,N-1

Pair SymmetryFunctions

{φ }

N

GTOs EmbeddedDensity

Pair-BasedHDNNPs

Atom-BasedHDNNPs

Embedded AtomNNs

{φ }

1 ρ

N

1

ρ

{R }

Nj

PairVectors

Local AtomicFrame

DeepMD

{R }

1j {D }1j

Nj{D }

a)

b) c) d) e)

FIG. 2. a) General structure of second-generation NNPs with predefined descriptors. The total (short-ranged) energy isthe sum of atomic or pair energies provided as outputs of individual atomic or pair neural networks. Starting from theCartesian coordinates of the atoms {Ri}, different types of transformations to input descriptors characterizing the local atomicenvironments are performed for b) atom or c) pair-based HDNNPs, d) DeepMD and e) embedded atom NNs. Details of thesemethods are discussed in Sections III B 1, III B 2, III B 3 and III B 4, respectively.

of neighbors67. The most commonly used radial ACSFhas the form

Gradial,i =∑j∈Rc

e−η(Rij−Rs)2

· fc(Rij) (3)

with the hyperparameters η and Rs defining the shapeof a Gaussian sphere around the central atom. Due tothe summation over all neighboring atoms the value ofGradial,i can be interpreted as a continuous coordinationnumber and a combination of different parameter sets canbe used to obtain information about the radial distribu-tion of neighbors. In a similar way the angular distribu-tion of neighbors can be probed by the angular function

Gangular,i = 21−ζ∑j∈Rc

∑k 6=j∈Rc

(1 + λcosθijk)ζ

· e−η(R2ij+R

2ik+R

2jk) · fc(Rij)fc(Rik)fc(Rjk)

(4)with θijk being the angle enclosed by ij and ik, and thehyperparameters ζ, λ and η. For multi-element systems,ACSFs for each pair of elements (radial) or each tripleof elements (angular) are constructed resulting in a com-binatorial increase of the number of symmetry functions

if many elements are present. A detailed discussion ofthe properties of ACSFs is beyond the scope of this re-view and can be found in Ref. 67. The typical error ofenergies and forces obtained from HDNNPs is around1.0 meV and 0.1 eV/A, respectively, and in the pastdecade, HDNNPs have been reported for a variety of sys-tems43, from bulk materials via aqueous solutions to in-terfaces, and also very generally applicable parameteriza-tions of HDNNPs for a broad range of organic molecules,like ANI71, which employs a modified angular symmetryfunction, have been published.

Some attempts have been made in recent years to over-come the combinatorial increase in the number of ACSFsfor multi-element systems. In 2018, Gastegger et al.introduced weighted atom-centered symmetry functions(wASCF)65. In wACSFs, the ACSFs sharing the sameparameters are merged for all element-combinations us-ing element-dependent weighting-prefactors making theinput dimensionality of the element-specific atomic neu-ral networks independent of the number of elements inthe system. In a recent study, Liu and Kitchin usedwACSFs and proposed to also combine the atomic NNsinto one single universal atomic NN with multiple out-

Page 6: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

6

puts for each chemical species in a system72. All ele-ments thus share the same connecting weights in the sin-gle atomic NN, while the weights connecting the outputneurons remain element-specific, which is based on theassumption of a roughly linear relation between a uni-versal feature vector generated by the atomic NN andthe element-specific atomic energy, which has been in-vestigated for bulk-elements. A similar method has alsobeen proposed by Profitt and Pearson focusing on molec-ular systems73. Finally, the extraction of the relevant in-formation from a large pool of ACSFs can also be doneautomatically, e.g. by CUR74,75 or using convolutionalneural networks76.

2. Pair-Based HDNNPs

The sum over atomic energies in Eq. 1 is the mostfrequently used energy expression in high-dimensionalMLPs. However, it can equivalently be written as asum of atom pair energies directly reflecting the atomicinteractions as

E =

Natoms∑i=1

Ei =1

2

Natoms∑i=1

Natoms∑j 6=i

Eij . (5)

In 2012, Behler and coworkers77 suggested to constructthe environment-dependent pair energies using pair sym-metry functions establishing a new class of descriptorscharacterizing the combined environments of both atomsin each pair up to a cutoff radius with full translational,rotational and permutational symmetry. The resultingstructure of the high-dimensional pair NNP is very simi-lar to conventional atom-based HDNNPs with atom pairsup to a maximum interatomic distance replacing theatoms as central structural entities. For each elementcombination there is one distinct NN to be trained, whichis evaluated as many times as atom pairs of the respectivetype are present in the system (s. Fig. 2c). The methodhas been shown to be as accurate as the atom-based ap-proach, but because of the larger number of atom pairscompared to atoms the computational costs are increasedas there is a larger number of NNs to be evaluated.

An alternative method to express the energy as a sumof bond energies has been proposed by Parkhill in 201778.In this bonds-in-molecule neural network (BIM-NN) amodified version of the bags of bonds descriptor79 hasbeen employed using an element-specific distance cutoffto take only covalent bonds into account. The descriptorconsists of the bond length as well as information aboutthe directly connected bonds. The resulting pair energieshave been interpreted as chemical bond energies. So farthe method has only been applied to isolated moleculesin vacuum.

A further related approach is AP-Net, which has re-cently been introduced by Sherrill and coworkers80. AP-Net has been developed focusing on non-covalent inter-

actions by representing physically meaningful interactionenergies derived from symmetry adapted perturbationtheory, which allowed to represent all element combi-nations by a single neural network for a given type ofphysical interaction.

3. Deep Potential Molecular Dynamics

Deep Potential Molecular Dynamics (DeepMD) devel-oped by E and coworkers81 is a force-extended versionof the Deep Potential method82, which includes energytraining only. Like in HDNNPs, the total energy is writ-ten as a sum of environment-dependent atomic energies(Eq. 1). For describing the positions of the neighbor-ing atoms, a local atomic frame based on the two closestneighboring atoms is defined. All neighboring atoms arethen sorted by element and inverse distance, and for eachneighbor j a descriptor vector

Dij = {D0ij , D

1ij , D

2ij , D

3ij} = { 1

Rij,xijR2ij

,yijR2ij

,zijR2ij

}(6)

with xij , yij and zij being the components of the con-necting vector Rij is defined. Beyond a predefined num-ber of closest neighbors inside the cutoff radius just thefirst component 1/Rij is used corresponding to radial in-formation only. The descriptor vectors of all neighboringatoms are then provided jointly as {Dij} into atomic neu-ral networks to yield the atomic energies Ei as a functionof the geometry of all atom pairs relative to the referenceframe (s. Fig. 2d). An advantage of DeepMD is theabsence of any tunable hyperparameters in the descrip-tors, which depend only on the atomic positions, reduc-ing the complexity of the potential. Drawbacks are thesimple functional forms of the descriptors without ex-plicit many-body information, which therefore needs tobe established by the atomic NNs, and the presence ofsmall discontinuities in the forces at the cutoff radius asno smooth cutoff function is applied. The latter limi-tation has been overcome recently by introducing scalarweight functions in the Deep Potential - Smooth Edition(DeepMD - SE)83, in which the atomic neural networksare split into encoding networks determining self-adapteddescriptors by mapping the structure to multiple outputsand fitting networks, which are both optimized in thetraining process.

4. Embedded Atom Neural Network Potentials

In 2019 Zhang, Hu and Jiang proposed the embed-ded atom neural network potential (EANN)66, which hasbeen inspired by the embedded atom method (EAM)84,an empirical potential frequently used in materials sci-ence in which each atomic energy is given as an embed-ding energy arising from the interaction with the densityof the surrounding atoms. In EANN, the scalar density

Page 7: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

7

value at the position of each atom used in EAM is re-placed by a vector of Gaussian-type-orbital (GTO)-baseddensities, which are used as input for atomic neural net-works yielding the atomic energy contributions (s. Fig.2e). The expansion coefficients of the GTOs forming an-gular momentum-specific elements of the density vectorare optimized during the training process like in a lin-ear basis set expansion. The density vector of each atomis then used as input vector for the respective atomicNN yielding the atomic energy. Only neighboring atomsup to a cutoff contribute to the density, and a sufficientorbital overlap must be ensured to obtain accurate po-tentials. A transformation to an angular basis makes themethod computationally very efficient, as three-body de-scriptors do not need to be computed explicitly.

C. Learnable Descriptors

1. Overview: Message Passing Neural Networks

All NNPs discussed so far have in common that the de-scriptor functions for the atomic structure are predefinedand often contain some hyperparameters, which may bemanually selected or adapted during the training process.The purpose of these descriptors is to provide local struc-tural fingerprints of the atomic environments as input foratomic neural networks. Thus, the only necessary infor-mation is the atomic structure, and a vector of functionvalues needs to be calculated to characterize each envi-ronment as uniquely as possible with as few as possiblefunctions.

In the context of ML applications in molecular mod-elling, the idea of replacing predefined “static” descrip-tors by learned “dynamic” ones using structural informa-tion was first introduced by Duvenaud and coworkers85.Inspired by the extended-connectivity circular finger-prints (ECFP)86, they treated molecules as graph net-works and employed a convolutional layer to processthese graphical networks to predict their final feature vec-tors, i.e., the descriptors. Although they did not exploitthese methods to construct PESs, this work paved theway for a new class of MLP, for which Gilmer et al. in2017 coined the name “message passing neural networks”(MPNN)87. All MPNN methods have in common thatpredefined descriptors are replaced by automatically de-termined descriptors learned from the geometric struc-ture.

In MPNNs, a molecule is considered as a three-dimensional graph consisting of nodes (vertices) and con-nections (edges), which are associated with the atoms de-fined by their nuclear charges {Zi} and bonds, or moregenerally interatomic distances {Rij}, respectively. Eachatom is described by a feature vector, which is itera-tively updated in an interaction block in a message pass-ing phase using message functions processing informa-tion about neighboring atoms, like distances and theirrespective current feature vectors. The various flavors of

MPNNs differ in the types of message functions that it-eratively embed atoms into their chemical environmentsyielding the messages interacting with the atomic fea-ture vectors, the atomic update functions (vertex updatefunctions) describing these interactions, and the distanceof neighbors included. Once the message passing phasehas been completed after a predefined number of steps,the resulting atomic feature vectors, which now containthe information about the atom in its environment, areused in a readout phase to predict the target property.In the context of NNPs, the atomic feature vectors aretypically passed to atomic neural networks in this stepto yield atomic energy contributions summing up to thetotal energy.

2. Deep Tensor Neural Networks (DTNN)

The first MLP employing message passing neural net-works has been the deep tensor neural network (DTNN)proposed by Schutt and coworkers in 201788, which hasbeen designed to describe the properties of small organicmolecules. Starting from the nuclear charges and a com-plete distance matrix D, the pairwise interatomic dis-tances Dij are first expanded in a Gaussian basis yieldinga coefficient vector dij for each atomic pair (s. Fig. 3a).Further, each atom i is represented by a multidimensionalfeature vector ci starting with an initial random nuclearcharge-specific vector ci(Zi). The atomic feature vectorsare then iteratively refined in an interaction block usingmessage vectors vij reflecting the interactions of atom iwith all other atoms j by

cit+1 = ci

t +

Natoms∑j 6=i

vij . (7)

In total, T = 3 of these global refinement steps are carriedout in DTNN88, after which the accuracy has been foundto saturate for the investigated systems. The vij containthe non-linear coupling of the neighboring atomic fea-ture vectors and the corresponding interatomic distancevectors and are computed in a tensor layer,

vij = tanh[Wfc

((Wcfcj + bf1

)◦(Wdfdij + bf2

))],

(8)where ◦ is the element-wise multiplication and Wfc, Wcf ,Wdf , bf1 and bf2 are weight and bias matrices to betrained. Consequently, the atomic feature vectors startfrom the state of free atoms and are then iteratively re-fined using information about the distances and featurevectors of all other atoms. Once this refinement stepthrough convolutional mapping is complete, the featurevectors are provided to an embedding layer containingatomic NNs yielding atomic energies as output enteringEq. 1.

The accuracy of DTNNs has been demonstrated incompositional space for the atomization energies of opti-mized molecules, in configurational space for the isomers

Page 8: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

8

Etotal

Second-Generation Neural Network Potentials with Learnable Descriptors

Etotal

DTNN SchNeta) AIMNetd)b) HIP-NNc)

interaction block

embedding layer

{R }i i{Z }

i{E }

D i ic (Z )

{R }i i{Z }

ACSF i ic (Z )

{v }ij

{c }it

embedding layer

interaction block

AIM layer

Etotal Etotal

interaction block 1

interaction block 1

interaction block N

embedding layer

{R }i {Z ,R }i ii{Z }

i{E }

CFCL

0

i ic (Z )

on-site layers

E

1E

interaction block 2

on-site layers

2Ei{q }i{E }

FIG. 3. Schematic structure of the message passing neural networks DTNN88, SchNet89, HIP-NN90, AIMNet91 and PhysNet25

discussed in this review in Sections III C 2, III C 3, III C 4, III C 5 and IV, respectively.

of composition C7O2H10 and in conformational spacefor molecular dynamics trajectories of small moleculesin vacuum88. Due to the use of a complete set of in-teratomic distances DTNNs represent a global methodapplicable to small molecules, but in principle a spatialcutoff could be introduced for the message vectors to ex-tend the method to a second-generation potential.

3. SchNet

Briefly after the introduction of DTNNs, Schutt etal. proposed an improved MPNN called SchNet24,89 (s.Fig. 3b), which shares many properties with the originalDTNN approach. In SchNet, so-called continuous-filterconvolutional layers (CFCL) are used to represent theatomic environments, which are continuous generaliza-tions of the discrete convolutional networks typically usedin image analysis applications, since in contrast to pixelgrids in images atomic positions are continuous quan-tities. In this way the coefficient vector of the Gaus-sian expansion of the interatomic distances can be re-placed by the atomic positions. After the atomic featurevectors have been randomly initialized based on theirspecies, they are distributed to a set of N interactionblocks in which CFCLs act on each local atomic neigh-borhood through a message passing phase where they it-eratively refine structural representations by processingatomic feature vectors. For the interatomic distances, acutoff is applied making the method applicable to sys-tems with large number of atoms including periodic sys-tems like crystals. Like in DTNN the parameters of theatomic neural network layers are shared for all atoms,and different elements are distinguished based on their

features. After the refinement of the atomic feature vec-tors in the iteraction blocks, they are used as input ofatomic neural networks yielding the atomic energy con-tributions, which are added to obtain the total energy ofthe system.

4. Hierarchically Interacting Particle Neural Network

A NNP combining the concepts of MPNNs and conven-tional many-body expansions is the “Hierarchically In-teracting Particle Neural Network” (HIP-NN) proposedby Lubbers, Smith and Barros in 201890. Going beyondEq. 1, the atomic energies Ei are further decomposed intocontributions of order n,

Ei =

Nbody∑n=0

Eni , (9)

which are learned simultaneously and consistently by asingle hierarchical NN consisting of many layers and in-termediate interaction blocks (s. Fig. 3c). The basisof the structural description are the nuclear charges andthose pairwise interatomic distances, which are below apredefined cutoff distance. The term E0 =

∑E0i is com-

puted to minimize the least squares error of the trainingdata using nuclear charge information only. The first in-teraction block corresponds to the first message passingstep in that information between atoms within a cutoffradius is exchanged. The atomic feature vectors are thenlearned through a sequence of independent atomic on-sitelayers the last of which corresponds to a linear regres-sion on the learned atomic descriptor vectors and yieldingE1 =

∑E1i . Then, the second interaction block repre-

sents another message passing step, followed by atomic

Page 9: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

9

on-site layers to yield features to compute E2 and soforth. In HIP-NN, this sequence is terminated after then = 2 term. The total energy is then given according toEq. 9 as the sum of all computed terms. As it is assumedthat the energy contributions decay with increasing or-der, the magnitude of the different terms can be used toestimate the uncertainty of the HIP-NN prediction.

5. AIMNet

Another NNP with learnable descriptors called AIM-Net has been proposed by Isayev and coworkers in 201991

(s. Fig. 3d). Starting from the atomic positions and nu-clear charges, the atomic feature vectors are constructedby combining two components, which are the element-independent geometric description by atomic environ-ment vectors consisting of modified ACSFs of the ANIHDNNP71 employing a cutoff, and the initial featurevectors containing the information about the nuclearcharges. Therefore, like in DTNN and SchNet, the di-mensionality of the resulting atomic feature vectors doesnot depend on the number of chemical elements in thesystem. In addition to the atomic feature vectors, alsoatom-pair feature vectors are used to describe the bondsin the system.

In contrast to other MPNNs, in AIMNet an embeddinglayer can be used directly after the first feature vectorshave been constructed, because the latter already containright from the start information about the atomic envi-ronments via the ACSFs and do not rely on a preparingmessage passing step. Like e.g. in SchNet, the effectivecutoff of atomic interactions can be increased by multiplerepeated updated steps of the feature vectors in interac-tion blocks. Finally, an atom-in-molecule (AIM) layerthen allows to compute the atomic properties.

Apart from the environment-dependent atomic ener-gies yielding the total energy, other properties like atomicpartial charges can be provided by AIMNet, which can beredistributed within the interaction range covered by thesequence of message passing steps. Typically, in AIMNetthe AIM layer is calculated three times correspondingto two message passing updates of the feature vectors.Long-range interactions like electrostatics are not explic-itly included beyond the distance covered by these mes-sage passing steps. A recent extension, AIMNet-ME16,can also be trained to systems in different global chargestates by an additional message-passing stack for thespins and charges in the system.

IV. THE THIRD GENERATION: LONG-RANGEINTERACTIONS

Second-generation NNPs, as well as second-generationMLPs in general, have been applied very successfullyto a wide range of systems, and they represent today’sworkhorse methods in machine learning-based atomistic

simulations. Still, the underlying locality approximationand the neglect of interactions beyond the cutoff radiuscan be expected to result in notable errors for systems, inwhich long-range interactions are important. The mostprominent example for such long-range interactions iselectrostatics, but also comparably weak dispersion in-teractions can amount to decisive energy contributionsfor large systems92.A basic prerequisite for considering long-range electro-static interactions is the availability of atomic partialcharges. Starting with the pioneering work of Popelierand coworkers93,94, who explored the capabilities of NNsand other machine learning methods to represent elec-trostatic mono- and multipoles for the improvement ofelectrostatics in classical force fields, many groups haveproposed to represent atomic properties like charges andmultipoles by machine learning95–101. However, to dateonly a few NNPs contain long-range electrostatic inter-actions by explicitly computing the Coulomb interactionwithout truncation. These potentials define the thirdgeneration of NNPs.The first MLP of the third generation has been the 3G-HDNNP introduced in 2011102,103. In this method, whichis an extension of the second-generation HDNNPs pro-posed by Behler and Parrinello (s. Sec. III B 1), in addi-tion to the atomic NNs yielding the short-range atomicenergies, a second set of atomic NNs is introduced to pre-dict environment-dependent atomic charges, which arethen employed in an Ewald summation104 to compute theelectrostatic energy Eelec. Often the same ACSF vectorGi for describing the atomic environment is used. Thetotal energy expression of this method is given by

Etotal = Eshort + Eelec

=

Natoms∑i=1

Ei ({Gi}) +

Natoms∑i>j

qi({Gi})qj({Gj})Rij

(10)Since the short-range atomic energies can describe inprinciple all types of atomic interactions including elec-trostatics up to the cutoff radius, double counting of en-ergy terms has to be avoided by a sequential trainingprocess. First, the atomic neural network representingthe charges are trained using reference charges from elec-tronic structure calculations. Then, the electrostatic en-ergies and forces are computed and removed from thereference energies and charges to obtain the target prop-erties needed for training the short-range energies. Thistwo-step procedure is also required since the computa-tion of electrostatic forces resulting from environment-dependent charges required the the derivatives of theatomic charges with respect to the atomic positions,which can be provided by the charge NNs but are notaccessible in electronic structure calculations. Anotherthird-generation HDNNP containing long-range electro-statics and also dispersion interactions using GrimmesD2 method105 is the Tensormol approach introduced in2018 by Yao et al.106. In this method, the charges are

Page 10: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

10

trained by NNs to reproduce reference dipole momentsobtained in electronic structure calculations.

In addition to HDNNPs relying on predefined descrip-tors, also MPNNs (s. Section III C) including long-rangeinteractions have been reported. The most importantexample is PhysNet introduced by Unke and Meuwly in201925. This method is closely related to SchNet withtwo important modifications: First, the information flowin the interaction block is enhanced by pre-activationresidual layers that enable a higher level of expressivenessand distance-based attention masks are used as messen-ger functions to increase the efficiency of the learningcycle. The second major component is the explicit treat-ment of long-range interactions and electrostatics. InPhysNet, atomic coefficient vectors pass through one sin-gle refinement box that contains many interaction blocksand are then fed into atomic neural networks to pre-dict the atomic energies and charges, which are in turnused to compute the long-range electrostatic energy. Animportant feature of this approach is the simultaneousprediction of both, atomic energies and charges, by thesame NN increasing the computational efficiency of themethod.

In spite of these advances, third-generation NNPsare rarely used, as in many condensed systems long-range electrostatic interactions are efficiently screenedcontributing only very little to the atomic interactionsbeyond the typical cutoff radii of 6-10 A. In addition,the explicit computation of the Ewald sum, as well asthe introduction of additional NNs in several methods,substantially increases the computational costs with verysmall improvements in accuracy.

V. THE FOURTH GENERATION: NON-LOCALINTERACTIONS

A. Overview: Non-Local Dependencies

Even though long-range electrostatic interactions areincluded in third-generation NNPs, the underlyingcharges are assumed to be local in that they only de-pend on the positions of the neighboring atoms up to acutoff radius. While this is a reasonable assumption formany systems, the locality approximation breaks downfor systems in which the atomic partial charges dependon structural features outside the atomic cutoff spheres.These can be distant functional groups in molecules oreven ion substitution, doping and defects in solid materi-als. Another typical situation involving non-local modifi-cations of the electronic structure is a change in the globalcharge state of the system, e.g. by ionization, protona-tion or deprotonation. In these situations the first threegenerations of NNPs, which implicitly assume a singlefixed total charge, are unable to represent the PES reli-ably, since it is impossible to represent the atomic partialcharges as a function of the local environment only.

Potentials, which are able to capture these non-local

or even global dependencies for high-dimensional systemsdefine fourth generation NNPs. We note that the termi-nology in the literature is not consistent because oftenthere is no clear distinction between long-range interac-tions and non-local interactions. In this review, long-range interactions refer to electrostatics, and also vander Waals interactions, which are not truncated, but de-pend on local properties, like charges, and hence can bedescribed by third-generation NNPs. Non-local interac-tions arise from long-range or even global dependenciesin the electronic structure and require fourth-generationNNPs for a qualitatively correct description. In recentyears, a few NNPs of the fourth generation have beenproposed (cf. Fig. 4), which will be discussed in thefollowing Sections.

B. Methods

1. Charge Equilibration Neural Network Technique

The first MLP of the fourth generation has beenthe charge equilibration via neural network technique(CENT) proposed by Ghasemi et al. in 2015107. Thecentral idea of CENT is to employ a charge equilibra-tion step109, which is also used in several advanced forcefields and allows the redistribution of the electrons overthe whole system to minimize the electrostatic energy.The total energy of CENT is based on a second orderTaylor expansion in the atomic partial charges qi and theclassical electrostatic energy of the charge density and isgiven by

ECENT =

Natoms∑i=1

(Ei,0 + χiqi +

1

2

(ηi +

2γii√π

)q2i

)

+

Natoms∑i>j

qiqjerf(γijRij)

Rij. (11)

The Ei,0 are the energies of the free atoms, the coeffi-cients of the linear terms are the atomic electronegativi-ties χi, and the ηi represent the element-dependent hard-ness values. The atomic electronegativities are expressedby individual atomic NNs as a function of the localatomic environments described by ACSFs (see Fig. 4a).The parameters γij depend on the width of the atomicGaussian charge densities.

The minimization of this energy expression results ina set of linear equations that in addition contains a con-straint for the total charge of the system. This needs tobe solved to yield the equilibrated atomic partial chargesin the system. The weights of the electronegativity NNsare optimized during the training process such that theerror of ECENT with respect to DFT reference energies isminimized. Like all fourth-generation MLPs, the CENTmethod allows to construct potentials, which are simul-taneously applicable to several global charge states, anddue to the charge-based energy expression Eq. 11 the

Page 11: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

11

Fourth-Generation Neural Network Potentials

{R } qtot

Qeq

{ACSFs}

{q }{R } tot

CENTa) b) c)BpopNN 4G-HDNNP

{χ }

{q }i

Etot

{q }i

Eelec Eshort

Etot

Etot

Eshort Eelec Eintra

i

Qeq

{ACSFs}

{χ }i

i

{ACSFs}

ii

{SOAP}

i{E } i{E }

q {q }{R } totii q

SCF-q

FIG. 4. Schematic structure of fourth-generation neural network potentials. In the CENT method107 the atomic electronega-tivities χi are expressed by atomic NNs as a function of the local atomic environments described by atom-centered symmetryfunctions (ACSF). They are then used in a charge equilibration (Qeq) step to provide globally dependent charges used to min-imize the energy in Eq. 11. In the Becke population neural network (BpopNN)108 a charge-dependent SOAP descriptor is usedto determine the atomic energies. Atomic charges are also used to compute electrostatic and intra-atomic energy terms. Whenused in predictions, the atomic charges are determined self-consistently (SCF-q) by minimizing the total energy as indicatedby the dotted line. In Fourth-generation high-dimensional neural network potentials (4G-HDNNP)38 the atomic charges aredetermined in a charge equilibration step like in CENT, but the training is done using reference charges as indicated by thedashed line. Apart from the calculation of the electrostatic energy, the charges are also used as additional non-local inputinformation for the atomic energies Ei provided by a second set of atomic neural networks to yield the short-range energy. Thetotal energy is the sum of the short-range and the electrostatic energy.

method works best for systems with predominantly ionicbonding.

2. Becke Population Neural Networks

Another fourth-generation NNP is the Becke popula-tion neural network (BpopNN) introduced in 2020 byXie, Persson and Small108. In this method, for the cho-sen global charge of the system the atomic populations,i.e., the partial charges, are adapted self-consistently ina process called SCF-q (see Figure 4b). Starting from aninitial guess of the charge distribution, the total energyof the system is minimized with respect to the atomicpopulations and calculated as

EBpopNN =

Natoms∑i=1

Ei(Di) + Eintra + Eelec , (12)

where the Ei represent the energy contributions of atomsi determined by atomic neural networks as a function of

both, the local atomic environments as well as the atomicpopulations, which are provided as input in form of mod-ified SOAP110 descriptors Di. The intra-atomic energyEintra is an element-specific quadratic function of theatomic partial charge. The electrostatic energy Eelec con-tains a damping function at very short interatomic dis-tance facilitating the representation of the atomic short-range energies.

In the training stage, the reference atomic populations,total energies and atomic forces are obtained from con-strained DFT calculations111 for various charge distri-butions and states. Then, the atomic NN weights andelement-pair dependent parameters κ(zi, zj) are adjustedsuch that the errors of the total energy, the atomic forcesand the population gradients is minimized with respectto the DFT data.

Page 12: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

12

3. Fourth-Generation High-Dimensional Neural NetworkPotentials

Very recently, a fourth generation of HDNNPs (4G-HDNNP) has been proposed36,38 combining the advan-tages of the CENT method (s. Sec. V B 1), i.e., the abil-ity to describe non-local charge transfer, global depen-dencies in the electronic structure and also the result-ing electrostatic interactions, and of second-generationHDNNPs (s. Sec. III B 1), which provide a very accuratedescription of local bonding. The schematic structure ofa 4G-HDNNP is shown in Fig. 4c. Like in CENT, theelectrostatic energy is computed from charges obtainedin a charge equilibration process based on environment-dependent electronegativities, which are represented byatomic NNs. In contrast to CENT, in the trainingprocess the electronegativities are not adjusted to yieldcharges minimizing the total energy, but the NN weightsare determined to reproduce a set of reference Hirshfeldcharges obtained in DFT calculations (dashed red line inFig. 4c). The long-range energy is then calculated eitherusing Coulomb’s law or, in case of periodic systems, usingan Ewald sum104.

The second energy contribution, the short-range en-ergy, is given as a sum of atomic energies like in second-generation HDNNPs, but in addition the atomic par-tial charges of the respective atoms are used as addi-tional input neurons to provide information about thelocal electronic structure. For the same reasons as inthird-generation HDNNPs, the short-range atomic NNsare trained in a second step after the electrostatic NNs,which is also required since the atomic charges used asadditional input for the short-range atomic NNs need tobe available.

The resulting 4G-HDNNPs are very generally applica-ble to many types of systems, from organic molecules toionic solids38. Like all fourth-generation potentials, theycan be constructed to simultaneously describe differentglobal charge states of a given system.

VI. DISCUSSION AND OUTLOOK

Starting with the applicability to high-dimensional sys-tems in second-generation potentials, MLPs for atomisticsimulations have attracted significant attention resultingin a rapid development of this field, which is now veryadvanced but has not yet reached full maturity. Whileinitially applications in materials science have been themain driving force for these developments23, MLPs arenow explored and applied to many different fields, and alot of effort is nowadays spent, e.g., in the construction ofbroadly applicable potentials for a wide range of organicmolecules25,71,88,112.

The development of MLPs is full of challenges, someof which have been solved while others require fur-ther research. The development of descriptors for high-dimensional systems considering all symmetries and in-

variances exactly, which in the initial years has been aformidable challenge, can now be considered as solved,and there are two alternative approaches based on pre-defined or learnable descriptors. Both are equally suitedfor high-quality potentials, but since the first message-passing networks have been introduced only a few yearsago, most applications published to date rely on prede-fined descriptors like ACSFs67, SOAP110 and many oth-ers63. Still, a yet unsolved problem is the combinatorialgrowth in the number of predefined descriptors with in-creasing number of elements in the system. First promis-ing steps, like combined descriptors65 or even combinedatomic NNs72,73 have been proposed, and also manyMPNNs make use of combined feature vectors24,88,91, butthe generality of these methods remains to be explored.

Another challenge, which is addressed by many groups,is the efficient construction of the reference data sets,which is of crucial importance for obtaining reliable po-tentials. Since the computationally most expensive partfor constructing NNPs is the generation of the referencedata by accurate electronic structure calculations, thedata sets should be as small as possible, while at thesame time they should cover a broad range of structuresto obtain transferable potentials. Nowadays, the conceptof active learning, which has been known in the ML fieldfor a long time113 and has first been used by Artrith andBehler for the construction of a HDNNP for copper114,is now broadly applied to establish an almost automaticframework for constructing reference data. The centralidea of active learning is to identify important missingstructures without an explicit comparison to costly elec-tronic structure calculations, but to use an estimate forthe uncertainty of a prediction to select new data points.This can, e.g., be done by comparing the energies orforces obtained from different NNPs trained to the samedata set, and many different variants for the iterativeconstruction of the reference data by active learning havebeen proposed to date115–122.

Another approach to improve the quality of MLPswith only a moderate amount of high-level data is ∆-learning123–125. Here, assuming that the difference be-tween two PESs obtained with different electronic struc-ture methods is smooth and thus relatively straightfor-ward to represent by ML, a baseline potential is obtainedusing an approximate method, and the difference to thehigh-level method is learned by the NNP.

Apart from the optimized selection of the underlyingdata74, another main strategy to generate more accu-rate and transferable potentials is to include the un-derlying physical laws. An example is the inclusion oflong-range electrostatic interactions in third and fourth-generation NNPs. While most fourth-generation NNPsemploy predefined descriptors, also novel types of mes-sage passing methods are just emerging aiming to de-scribe non-local effects97,98,126. Apart from electrostat-ics, also dispersion interactions, which are weak but canbe important in large systems, have been included be-yond the local atomic environments in NNPs25,106. More

Page 13: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

13

recently, information beyond charges such as atomic spinmoments126,127 for magnetic systems, and external elec-tric field128 to study molecular spectra have also beenconsidered.

A new direction of the field is to combine NNPs withinformation from electronic structure methods of differ-ent levels of complexity. There are several examples likethe combination of NNs with density-functional tightbinding129 or Huckel theory130, and the representationof symmetry-adapted atomic-orbital features131 by mes-sage passing techniques. Furthermore, highly accuratemolecular electronic wavefunctions can be learned byNNs, which include the representation of electrons in themolecule132,133. This trend of including or generatingmore information at the electronic structure or quantummechanical level can be expected to continue in the yearsto come.

In summary, NNPs have become an indispensable toolfor atomistic simulations in many fields of chemistry,molecular biology and materials science. They have beenproven to be particularly useful for large-scale molec-ular dynamics simulations in extensive sampling prob-lems, which require long simulations of extended systems.Since they are able to describe all types of bonding withthe same level of accuracy, they are even applicable tosystems, which are difficult to describe by conventionaltypes of potentials. Still, a main limitation of NNPs, andMLPs in general, is the restricted transferability beyondthe underlying training set, but it can be anticipated thatfuture generations of potentials will allow to overcomethis limitation by the inclusion of physical knowledge andlaws while maintaining the general applicability, which isa major advantage of the family of MLPs.

DISCLOSURE STATEMENT

The authors are not aware of any affiliations, mem-berships, funding, or financial holdings that might beperceived as affecting the objectivity of this review.

ACKNOWLEDGMENTS

Financial support by the Deutsche Forschungsgemein-schaft (DFG) is gratefully acknowledged (BE3264/13-1,project number 411538199).

1C. Editors, Machine Learning: An Overview of Artificial Intel-ligence (CreateSpace Independent Publishing Platform, 2018).

2E. Alpaydin, Introduction to Machine Learning (MIT press,2020).

3A. C. Mater and M. L. Coote, “Deep learning in chemistry,” J.Chem. Inf. Model. 59, 2545–2559 (2019).

4P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano,I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, andV. Robles, “Machine learning in bioinformatics,” Brief. Bioin-form. 7, 86–112 (2005).

5O. A. von Lilienfeld and K. Burke, “Retrospective on a decadeof machine learning for chemical discovery,” Nat. Commun. 11,4895 (2020).

6P. Mamoshina, A. Vieira, E. Putin, and A. Zhavoronkov, “Ap-plications of deep learning in biomedicine,” Molecular Pharma-ceutics 13, 1445–1454 (2016).

7M. W. Libbrecht and W. S. Noble, “Machine learning applica-tions in genetics and genomics,” Nat. Rev. Genet. 16, 321–332(2015).

8B. Selvaratnam and R. T. Koodali, “Machine learning in exper-imental materials chemistry,” Catal. Today 371, 77–84 (2021).

9A. Agrawal and A. Choudhary, “Perspective: Materials infor-matics and big data: Realization of the “fourth paradigm” ofscience in materials science,” APL Materials 4, 053208 (2016).

10C. M. Handley and P. L. A. Popelier, “Potential energy surfacesfitted by artificial neural networks,” J. Phys. Chem. A 114,3371–3383 (2010).

11J. Behler, “Neural network potential-energy surfaces in chem-istry: a tool for large-scale simulations,” Phys. Chem. Chem.Phys. 13, 17930–17955 (2011).

12J. Behler, “Perspective: Machine learning potentials for atom-istic simulations,” J. Chem. Phys. 145, 170901 (2016).

13P. O. Dral, “Quantum chemistry in the age of machine learning,”J. Phys. Chem. Lett. 11, 2336–2347 (2020).

14V. L. Deringer, M. A. Caro, and G. Csanyi, “Machine learninginteratomic potentials as emerging tools for materials science,”Adv. Mater. 31, 1902765 (2019).

15F. Noe, A. Tkatchenko, K.-R. Muller, and C. Clementi,“Machine learning for molecular simulation,” Ann. Rev. Phys.Chem. 71, 361–390 (2020).

16T. Zubatiuk and O. Isayev, “Development of multimodal ma-chine learning potentials: Toward a physics-aware artificial in-telligence,” Acc. Chem. Res. 54, 1575–1585 (2021).

17J. Zhang, Y.-K. Lei, Z. Zhang, J. Chang, M. Li, X. Han, L. Yang,Y. I. Yang, and Y. Q. Gao, “A perspective on deep learning formolecular modeling and simulations,” J. Phys. Chem. A 124,6745–6763 (2020).

18O. T. Unke, S. Chmiela, H. E. Sauceda, M. Gastegger,I. Poltavsky, K. T. Schutt, A. Tkatchenko, and K.-R.Muller, “Machine learning force fields,” Chem. Rev. (2021),10.1021/acs.chemrev.0c01111.

19P. Friederich, F. Hase, J. Proppe, and A. Aspuru-Guzik,“Machine-learned potentials for next-generation matter simu-lations,” Nat. Mater 20, 750–761 (2021).

20A. Grisafi, J. Nigam, and M. Ceriotti, “Multi-scale approachfor the prediction of atomic scale properties,” Chem. Sci. 12,2078–2090 (2021).

21T. B. Blank, S. D. Brown, A. W. Calhoun, and D. J. Doren,“Neural network models of potential energy surfaces,” J. Chem.Phys. 103, 4129–4137 (1995).

22S. Lorenz, A. Groß, and M. Scheffler, “Representing high-dimensional potential-energy surfaces for reactions at surfacesby neural networks,” Chem. Phys. Lett. 395, 210–215 (2004).

23J. Behler and M. Parrinello, “Generalized neural-network repre-sentation of high-dimensional potential-energy surfaces,” Phys.Rev. Lett. 98, 146401 (2007).

24K. T. Schutt, H. E. Sauceda, P.-J. Kindermans, A. Tkatchenko,and K.-R. Muller, “SchNet - a deep learning architecture formolecules and materials,” J. Chem. Phys. 148, 241722 (2018).

25O. T. Unke and M. Meuwly, “PhysNet: A neural networkfor predicting energies, forces, dipole moments, and partialcharges,” J. Chem. Theory Comput. 15, 3678–3693 (2019).

26A. P. Bartok, M. C. Payne, R. Kondor, and G. Csanyi, “Gaus-sian approximation potentials: The accuracy of quantum me-chanics, without the electrons,” Phys. Rev. Lett. 104, 136403(2010).

27A. P. Bartok and G. Csanyi, “Gaussian approximation poten-tials: A brief tutorial introduction,” Int. J. Quant. Chem. 115,1051–1057 (2015).

28S. Chmiela, H. E. Sauceda, I. Poltavsky, K.-R. Muller, andA. Tkatchenko, “sGDML: Constructing accurate and data ef-ficient molecular force fields using machine learning,” Comp.Phys. Comm. 240, 38–45 (2019).

Page 14: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

14

29A. P. Thompson, L. P. Swiler, C. R. Trott, S. M. Foiles, andG. J. Tucker, “Spectral neighbor analysis method for auto-mated generation of quantum-accurate interatomic potentials,”J. Comp. Phys. 285, 316–330 (2015).

30M. A. Wood and A. P. Thompson, “Extending the accuracyof the SNAP interatomic potential form,” J. Chem. Phys. 148,241721 (2018).

31A. V. Shapeev, “Moment tensor potentials: A class of system-atically improvable interatomic potentials,” Multiscale Model.Simul. 14, 1153–1173 (2016).

32R. Drautz, “Atomic cluster expansion for accurate and transfer-able interatomic potentials,” Phys. Rev. B 99, 014104 (2019).

33A. Allen, G. Dusson, C. Ortner, and G. Csanyi, “Atomic per-mutationally invariant polynomials for fitting molecular forcefields,” Mach. Learn. Sci. Techn. 2, 025017 (2021).

34R. M. Balabin and E. I. Lomakina, “Support vector machineregression (LS-SVM)-an alternative to artificial neural networks(ANNs) for the analysis of quantum chemistry data?” Phys.Chem. Chem. Phys. 13, 11710 (2011).

35G. Cybenko, “Approximation by superpositions of a sigmoidalfunction,” Math. Control Sign. Systems 2, 303–314 (1989).

36T. W. Ko, J. A. Finkler, S. Goedecker, and J. Behler, “General-purpose machine learning potentials capturing nonlocal chargetransfer,” Acc. Chem. Res. 54, 808–817 (2021).

37J. Behler, “Four generations of high-dimensional neural networkpotentials,” Chem. Rev., DOI:10.1021/acs.chemrev.0c00868(2021), 10.1021/acs.chemrev.0c00868.

38T. W. Ko, J. A. Finkler, S. Goedecker, and J. Behler, “A fourth-generation high-dimensional neural network potential with ac-curate electrostatics including non-local charge transfer,” Nat.Commun. 12, 398 (2021).

39M. Rupp, A. Tkatchenko, K.-R. Muller, and O. A. von Lilien-feld, “Fast and accurate modeling of molecular atomization en-ergies with machine learning,” Phys. Rev. Lett. 108, 058301(2012).

40M. Zeng, J. N. Kumar, Z. Zeng, R. Savitha, V. R. Chan-drasekhar, and K. Hippalgaonkar, “Graph convolutional neu-ral networks for polymers property prediction. arxiv:1811.06231[cond-mat.mtrl-sci],” (2018).

41P. Omprakash, B. Manikandan, A. Sandeep, R. Shrivastava,P. Viswesh, and D. B. Panemangalore, “Graph representationallearning for bandgap prediction in varied perovskite crystals,”Comput. Mater. Sci. 196, 110530 (2021).

42L. A. Miccio and G. A. Schwartz, “From chemical structureto quantitative polymer properties prediction through convolu-tional neural networks,” Polymer 193, 122341 (2020).

43J. Behler, “First principles neural network potentials for reactivesimulations of large molecular and condensed systems,” Angew.Chem. Int. Ed. 56, 12828 (2017).

44D. F. R. Brown, M. N. Gibbs, and D. C. Clary, “Combiningab initio computations, neural networks, and diffusion MonteCarlo: An efficient method to treat weakly bound molecules,”J. Chem. Phys. 105, 7597–7604 (1996).

45K. T. No, B. H. Chang, S. Y. Kim, M. S. Jhon, and H. A.Scheraga, “Description of the potential energy surface of thewater dimer with an artificial neural network,” Chem. Phys.Lett. 271, 152–156 (1997).

46A. C. P. Bittencourt, F. V. Prudente, and J. D. M. Vianna,“The fitting of potential energy and transition moment functionsusing neural networks: transition probabilities in OH (A2

∑+

to X2∏

),” Chem. Phys. 297, 153–161 (2004).47H. M. Lee and L. M. Raff, “Cis → trans, trans → cis isomer-

izations and N-O bond dissociation of nitrous acid (HONO)on an ab initio potential surface obtained by novelty samplingand feed-forward neural network fitting,” J. Chem. Phys. 128,194310 (2008).

48J. Behler, K. Reuter, and M. Scheffler, “Nonadiabatic effectsin the dissociation of oxygen molecules at the Al(111) surface,”Phys. Rev. B 77, 115421 (2008).

49I. Goikoetxea, J. Beltran, J. Meyer, J. I. Juaristi, M. Alducin,and K. Reuter, “Non-adiabatic effects during the dissociativeadsorption of O2 at Ag(111)? a first-principles divide and con-quer study,” New J. Phys. 14, 013050 (2012).

50S. Manzhos and K. Yamashita, “A model for the dissociative ad-sorption of N2O on Cu(100) using a continuous potential energysurface,” Surf. Sci. 604, 554–560 (2010).

51K.-W. Cho, K. T. No, and H. A. Scheraga, “A polarizable forcefield for water using an artificial neural network,” J. Mol. Struct.641, 77–91 (2002).

52H. Gassner, M. Probst, A. Lauenstein, and K. Hermansson,“Representation of intermolecular potential functions by neuralnetworks,” J. Phys. Chem. A 102, 4596–4605 (1998).

53S. Hobday, R. Smith, and J. Belbruno, “Applications of neuralnetworks to fitting interatomic potential functions,” ModellingSimul. Mater. Sci. Eng. 7, 397–412 (1999).

54S. Manzhos and T. Carrington, Jr, “Using neural networks torepresent potential surfaces as sums of products,” J. Chem.Phys. 125, 194105 (2006).

55S. Manzhos and T. Carrington, Jr, “Using redundant co-ordinates to represent potential energy surfaces with lower-dimensional functions,” J. Chem. Phys. 127, 014103 (2007).

56S. Manzhos and T. Carrington, Jr, “Using neural networks, opti-mized coordinates, and high-dimensional model representationsto obtain a vinyl bromide potential surface,” J. Chem. Phys.129, 224104 (2008).

57M. Malshe, R. Narulkar, L. M. Raff, M. Hagan, S. Bukkap-atnam, P. M. Agrawal, and R. Komanduri, “Development ofgeneralized potential-energy surfaces using many-body expan-sions, neural networks, and moiety energy approximations,” J.Chem. Phys. 130, 184102 (2009).

58A. Brown, B. J. Braams, K. Christoffel, Z. Jin, and J. M.Bowman, “Classical and quasiclassical spectral analysis of CH+

5using an ab initio potential energy surface,” J. Chem. Phys.119, 8790–8793 (2003).

59B. J. Braams and J. M. Bowman, “Permutationally invariant po-tential energy surfaces in high dimensionality,” Int. Rev. Phys.Chem. 28, 577–606 (2009).

60B. Jiang and H. Guo, “Permutation invariant polynomial neuralnetwork approach to fitting potential energy surfaces,” J. Chem.Phys. 139, 054112 (2013).

61J. Li, B. Jiang, and H. Guo, “Permutation invariant polynomialneural network approach to fitting potential energy surfaces. ii.four-atom systems,” J. Chem. Phys. 139, 204103 (2013).

62J. Behler, S. Lorenz, and K. Reuter, “Representing molecule-surface interactions with symmetry-adapted neural networks,”J. Chem. Phys. 127, 014705 (2007).

63M. F. Langer, A. Goessmann, and M. Rupp, “Representa-tions of molecules and materials for interpolation of quantum-mechanical simulations via machine learning. arxiv:2003.12081[physics.comp-ph],” (2020).

64L. Himanen, M. O. J. Jagera, E. V. Morooka, F. F. Canova,Y. S.Ranawat, D. Z. Gao, P. Rinke, and A. S. Foster, “Dscribe:Library of descriptors for machine learning in materials science,”Comp. Phys. Comm. 247, 106949 (2020).

65M. Gastegger, L. Schwiedrzik, M. Bittermann, F. Berzsenyi,and P. Marquetand, “WACSF - weighted atom-centered sym-metry functions as descriptors in machine learning potentials,”J. Chem. Phys. 148, 241709 (2018).

66Y. Zhang, C. Hu, and B. Jiang, “Embedded atom neural net-work potentials: efficient and accurate machine learning witha physically inspired representation,” J. Phys. Chem. Lett. 10,4962–4967 (2019).

67J. Behler, “Atom-centered symmetry functions for constructinghigh-dimensional neural network potentials,” J. Chem. Phys.134, 074106 (2011).

68J. Behler, “Constructing high-dimensional neural network po-tentials: A tutorial review,” Int. J. Quantum Chem. 115, 1032–1050 (2015).

Page 15: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

15

69A. Singraber, T. Morawietz, J. Behler, and C. Dellago, “Par-allel multi-stream training of high-dimensional neural networkpotentials,” J. Chem. Theory Comput. 15, 3075–3092 (2019).

70A. Khorshidi and A. A. Peterson, “Amp: A modular approachto machine learning in atomistic simulations,” Comp. Phys.Comm. 207, 310–324 (2016).

71J. S. Smith, O. Isayev, and A. E. Roitberg, “ANI-1: An exten-sible neural network potential with DFT accuracy at force fieldcomputational cost,” Chem. Sci. 8, 3192–3203 (2017).

72M. Liu and J. R. Kitchin, “SingleNN: A modified Behler-Parrinello neural network with shared weights for atomistic sim-ulations with transferability,” J. Phys. Chem. C 124, 17811–17818 (2020).

73T. A. Profitt and J. K. Pearson, “A shared-weight neural net-work architecture for predicting molecular properties,” Phys.Chem. Chem. Phys. 21, 26175 (2019).

74G. Imbalzano, A. Anelli, D. Giofre, S. Klees, J. Behler, andM. Ceriotti, “Automatic selection of atomic fingerprints and ref-erence configurations for machine-learning potentials,” J. Chem.Phys. 148, 241730 (2018).

75M. W. Mahoney and P. Drineas, “CUR matrix decompositionsfor improved data analysis,” PNAS 106, 697–702 (2009).

76B. Selvaratnam, R. T. Koodali, and P. Miro, “Application ofsymmetry functions to large chemical spaces using a convolu-tional neural network,” J. Chem. Inf. Model. 160, 1928–1935(2020).

77Jovan Jose K.V., N. Artrith, and J. Behler, “Construction ofhigh-dimensional neural network potentials using environment-dependent atom pairs,” J. Chem. Phys. 136, 194111 (2012).

78K. Yao, J. E. Herr, S. N. Brown, and J. Parkhill, “Intrinsicbond energies from a bonds-in-molecules neural network,” J.Phys. Chem. Lett. 8, 2689–2694 (2017).

79K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A.von Lilienfeld, K.-R. Muller, and A. Tkatchenko, “Machinelearning predictions of molecular properties: Accurate many-body potentials and nonlocality in chemical space,” J. Phys.Chem. Lett. 6, 2326–2331 (2015).

80Z. L. Glick, D. P. Metcalf, A. Koutsoukas, S. A. Spronk, D. L.Cheney, and C. D. Sherrill, “AP-Net: An atomic-pairwise neu-ral network for smooth and transferable interaction potentials,”J. Chem. Phys. 153, 044112 (2020).

81L. Zhang, J. Han, H. Wang, R. Car, and W. E, “Deep potentialmolecular dynamics: A scalable model with the accuracy ofquantum mechanics,” Phys. Rev. Lett. 120, 143001 (2018).

82J. Han, L. Zhang, R. Car, and W. E, “Deep potential: a gen-eral representation of a many-body potential energy surface,”Commun. Comput. Phys. 23, 629–639 (2018).

83L. Zhang, J. Han, H. Wang, W. A. Saidi, R. Car, and W. E,“End-to-end symmetry preserving inter-atomic potential energymodel for finite and extended systems,” Proceedings of the 32ndInternational Conference on Neural Information Processing Sys-tems; , 4441–4451 (2018).

84M. Daw, S. Foiles, and M. Baskes, “The embedded-atommethod: A review of theory and applications,” Mater. Sci. Rep.9, 251 (1993).

85D. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre,R. Gomez-Bombarelli, T. Hirzel, A. Aspuru-Guzik, andR. P. Adams, “Convolutional networks on graphs for learningmolecular fingerprints,” Proceedings of the 28th InternationalConference on Neural Information Processing Systems 2,2224–2232 (2015).

86D. Rogers and M. Hahn, “Extended-connectivity fingerprints,”J. Chem. Inf. Model. 50, 742–754 (2010).

87J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E.Dahl, “Neural message passing for quantum chemistry,” Pro-ceedings of the 34th International Conference on Machine Learn-ing 70, 1263–1272 (2017).

88K. T. Schutt, F. Arbabzadah, S. Chmiela, K. R. Muller, andA. Tkatchenko, “Quantum-chemical insights from deep tensorneural networks,” Nat. Commun. 8, 13890 (2017).

89K. Schutt, P.-J. Kindermans, H. E. S. Felix, S. Chmiela,A. Tkatchenko, and K.-R. Muller, “SchNet: A continuous-filter convolutional neural network for modeling quantum inter-actions,” Advances in Neural Information Processing Systems30 (NIPS 2017) , 991–1001 (2017).

90N. Lubbers, J. S. Smith, and K. Barros, “Hierarchical modelingof molecular energies using a deep neural network,” J. Chem.Phys. 148, 241715 (2018).

91R. Zubatyuk, J. S. Smith, J. Leszczynski, and O. Isayev, “Ac-curate and transferable multitask prediction of chemical prop-erties with an atoms-in-molecules neural network,” Sci. Adv. 5,eaav6490 (2019).

92L. Kronik and A. Tkatchenko, “Understanding molecular crys-tals with dispersion-inclusive density functional theory: Pair-wise corrections and beyond,” Acc. Chem. Res. 47, 3208–3216(2014).

93S. Houlding, S. Y. Liem, and P. L. A. Popelier, “A polarizablehigh-rank quantum topological electrostatic potential developedusing neural networks: Molecular dynamics simulations on thehydrogen fluoride dimer,” Int. J. Quantum Chem. 107, 2817–2827 (2007).

94C. M. Handley and P. L. A. Popelier, “Dynamically polarizablewater potential based on multipole moments trained by machinelearning,” J. Chem. Theory Comput. 5, 1474–1489 (2009).

95B. Nebgen, N. Lubbers, J. S. Smith, A. Sifain, A. Lokhov,O. Isayev, A. Roitberg, K. Barros, and S. Tretiaka, “Trans-ferable dynamic molecular charge assignment using deep neuralnetworks,” J. Chem. Theory Comput. 14, 4687–4698 (2018).

96M. Gastegger, J. Behler, and P. Marquetand, “Machine learn-ing molecular dynamics for the simulation of infrared spectra,”Chem. Sci. 8, 6924 (2017).

97R. Zubatyuk, J. S. Smith, B. T. Nebgen, S. Tretiak,and O. Isayev, “Teaching a neural network to attachand detach electrons from molecules. chemrxiv 12725276.”https://doi.org/10.26434/chemrxiv.12725276.v1 (2020).

98D. P. Metcalf, A. Jiang, S. A. Spronk, D. L. Cheney, andC. D. Sherrill, “Electron-passing neural networks for atomiccharge prediction in systems with arbitrary molecular charge,”J. Chem. Inf. Model. 61, 115—-122 (2021).

99B. Cuevas-Zuvirıa and L. F. Pacios, “Machine learning of an-alytical electron density in large molecules through message-passing,” J. Chem. Inf. Model. (2021).

100Y. Wang, J. Fass, C. D. Stern, K. Luo, and J. Chodera,“Graph nets for partial charge prediction. arxiv:1909.07903[physics.comp-ph],” (2019).

101J. Wang, D. Cao, C. Tang, L. Xu, Q. He, B. Yang, X. Chen,H. Sun, and T. Hou, “Deepatomiccharge: a new graph con-volutional network-based architecture for accurate prediction ofatomic charges,” Brief. Bioinform. 22, bbaa183 (2021).

102N. Artrith, T. Morawietz, and J. Behler, “High-dimensionalneural-network potentials for multicomponent systems: Appli-cations to zinc oxide,” Phys. Rev. B 83, 153101 (2011).

103T. Morawietz, V. Sharma, and J. Behler, “A neural net-work potential-energy surface for the water dimer based onenvironment-dependent atomic energies and charges,” J. Chem.Phys. 136, 064103 (2012).

104P. P. Ewald, “Die Berechnung optischer und elektrostatischerGitterpotentiale,” Ann. Phys. 64, 253–287 (1921).

105S. Grimme, “Semiempirical gga-type density functional con-structed with a long-range dispersion correction,” J. Comput.Chem 27, 1787–1799 (2006).

106K. Yao, J. E. Herr, D. W. Toth, R. Mckintyre, and J. Parkhill,“The TensorMol-0.1 model chemistry: A neural network aug-mented with long-range physics,” Chem. Sci. 9, 2261–2269(2018).

107S. A. Ghasemi, A. Hofstetter, S. Saha, and S. Goedecker, “In-teratomic potentials for ionic systems with density functionalaccuracy based on charge densities obtained by a neural net-work,” Phys. Rev. B 92, 045131 (2015).

Page 16: arXiv:2107.03727v1 [physics.chem-ph] 8 Jul 2021

16

108X. Xie, K. A. Persson, and D. W. Small, “Incorporating elec-tronic information into machine learning potential energy sur-faces via approaching the ground-state electronic energy as afunction of atom-based electronic populations,” J. Chem. The-ory Comput. 16, 4256–4270 (2020).

109A. K. Rappe and W. A. Goddard, III, “Charge equilibrationfor molecular dynamics simulations,” J. Phys. Chem. 95, 3358(1991).

110A. P. Bartok, R. Kondor, and G. Csanyi, “On representingchemical environments,” Phys. Rev. B 87, 184115 (2013).

111B. Kaduk, T. Kowalczyk, and T. V. Voorhis, “Constraineddensity functional theory,” Chem. Rev. 112, 321–370 (2011).

112C. Devereux, J. Smith, K. Davis, K. Barros, R. Zubatyuk,O. Isayev, and A. Roitberg, “Extending the applicability of theANI deep learning molecular potential to sulfur and halogens,”ChemRxiv.11819268.v1 (2020).

113H. S. Seung, M. Opper, and H. Sompolinsky, “Query by com-mittee,” Proceedings of the 5th annual workshop on computa-tional learning theory , 287–294 (1992).

114N. Artrith and J. Behler, “High-dimensional neural network po-tentials for metal surfaces: A prototype study for copper,” Phys.Rev. B 85, 045439 (2012).

115E. V. Podryabinkin and A. V. Shapeev, “Active learning of lin-early parametrized interatomic potentials,” Comp. Mater. Sci.140, 171–180 (2017).

116T. D. Loeffler, S. Manna, T. K. Patra, H. Chan, B. Narayanan,and S. Sankaranarayanan, “Active learning a neural networkmodel for gold clusters & bulk from sparse first principles train-ing data,” Chem. Cat. Chem. 12, 4796–4806 (2020).

117L. Zhang, D.-Y. Lin, H. Wang, R. Car, and W. E, “Activelearning of uniformly accurate interatomic potentials for mate-rials simulation,” Phys. Rev. Mater. 3, 023804 (2019).

118G. Sivaraman, A. N. Krishnamoorthy, M. Baur, C. Holm,M. Stan, G. Csanyi, C. Benmore, and A. Vazquez-Mayagoitia,“Machine-learned interatomic potentials by active learning:amorphous and liquid hafnium dioxide,” NPJ Comput. Mater.6, 1–8 (2020).

119J. S. Smith, B. Nebgen, N. Lubbers, O. Isayev, and A. E.Roitberg, “Less is more: Sampling chemical space with activelearning,” J. Chem. Phys. 148, 241733 (2018).

120C. Schran, J. Behler, and D. Marx, “Automated fitting of neu-ral network potentials at coupled cluster accuracy: Protonatedwater clusters as testing ground,” J. Chem. Theory Comput.16, 88–99 (2020).

121Q. Lin, L. Zhang, Y. Zhang, and B. Jiang, “Searching configura-tions in uncertainty space: Active learning of high-dimensionalneural network reactive potentials,” J. Chem. Theory Comput.(2021).

122N. Bernstein, G. Csanyi, and V. L. Deringer, “De novo ex-ploration and self-guided learning of potential-energy surfaces,”NPJ Comput. Mater. 5, 1 (2019).

123G. Sun and P. Sautet, “Toward fast and reliable potential en-ergy surfaces for metallic Pt clusters by hierarchical delta neuralnetworks,” J. Chem. Theory Comput. 15, 5614–5627 (2019).

124R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von Lilien-feld, “Big data meets quantum chemistry approximations: Thedelta-machine learning approach,” J. Chem. Theory Comput.11, 2087––2096 (2015).

125P. O. Dral, A. Owens, A. Dral, and G. Csanyi, “Hierarchicalmachine learning of potential energy surfaces,” J. Chem. Phys.152, 204110 (2020).

126O. T. Unke, S. Chmiela, M. Gastegger, K. T. Schutt, H. E.Sauceda, and K.-R. Muller, “SpookyNet: Learning forcefields with electronic degrees of freedom and nonlocal effects.arxiv:2105.00304 [physics.chem-ph],” (2021).

127M. Eckhoff and J. Behler, “High-dimensional neural networkpotentials for magnetic systems using spin-dependent atom-centered symmetry functions. arxiv:2104. [physics.comp-ph],”(2021).

128M. Gastegger, K. T. Schutt, and K.-R. Muller, “Machinelearning of solvent effects on molecular spectra and reactions.arxiv:2010.14942 [physics.chem-ph],” (2020).

129H. Li, C. Collins, M. Tanha, G. J. Gordon, and D. J. Yaron,“A density functional tight binding layer for deep learning ofchemical hamiltonians,” J. Chem. Theory Comput. 14, 5764(2018).

130T. Zubatyuk, B. Nebgen, N. Lubbers, J. S. Smith, R. Zubatyuk,G. Zhou, C. Koh, K. Barros, O. Isayev, and S. Tretiak, “Ma-chine learned Huckel theory: Interfacing physics and deep neuralnetworks. arxiv:1909.12963v1 [cond-mat.dis-nn],” (2019).

131Z. Qiao, M. Welborn, A. Anandkumar, F. R. Manby, and T. F.Miller, III, “OrbNet: Deep learning for quantum chemistry us-ing symmetry-adapted atomic-orbital features,” J. Chem. Phys.153, 124111 (2020).

132D. Pfau, J. S. Spencer, Alexander, G. D. G. Matthews, andW. M. C. Foulkes, “Ab initio solution of the many-electronSchrodinger equation with deep neural networks,” Phys. Rev.Res. 2, 033429 (2020).

133J. Hermann, Z. Schatzle, and F. Noe, “Deep-neural-networksolution of the electronic Schrodinger equation,” Nat. Chem.12, 891–897 (2020).