arXiv:2111.01112v1 [cond-mat.dis-nn] 1 Nov 2021

11
Application of the Variational Autoencoder to Detect the Critical Points of the Anisotropic Ising Model Anshumitra Baul Department of Physics and Astronomy, Louisiana State University, Baton Rouge, Louisiana 70803, USA Nicholas Walker Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley, California 94720, USA Juana Moreno and Ka-Ming Tam Department of Physics and Astronomy, Louisiana State University, Baton Rouge, Louisiana 70803, USA and Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA (Dated: November 2, 2021) We generalize the previous study on the application of variational autoencoders to the two- dimensional Ising model to a system with anisotropy. Due to the self-duality property of the system, the critical points can be located exactly for the entire range of anisotropic coupling. This presents an excellent test bed for the validity of using a variational autoencoder to characterize an anisotropic classical model. We reproduce the phase diagram for a wide range of anisotropic couplings and temperatures via a variational autoencoder without the explicit construction of an order parameter. Considering that the partition function of (d + 1)-dimensional anisotropic models can be mapped to that of the d-dimensional quantum spin models, the present study provides numerical evidence that a variational autoencoder can be applied to analyze quantum systems via Quantum Monte Carlo. I. INTRODUCTION Machine learning (ML) has become an indispensable tool to reach the boundaries of scientific understanding in the age of big data. An overflow of information is being analyzed using ML to quantify patterns in a large variety of fields from social networking, object and image recognition, advertising, finance, engineering, medicine, biological physics, and astrophysics among others [1]. Machine learning is a data modeling approach which employs algorithms which favor strategies driven by sta- tistical analysis and based on pattern extraction and in- ference. ML algorithms, such as deep learning, provide new advances to understand physical data. Opportuni- ties for scientific investigations are being devised partic- ularly in numerical studies, which naturally involve large data sets and complex systems, where explicit algorithms are often challenging. A concerted effort to address large data problems in statistical mechanics and many-body physics using the ML approach is emerging [2–9]. The foundation of ML is deeply connected with statistical physics, and hence is fruitful to combine ML techniques with numerical methods which involve the prediction of phase transition regions. Scaling and renormalization are the core principles to understand macroscopic phenom- ena from microscopic properties. The way forward for machines to learn from large data sets would incorporate conceptually similar principles [9, 10]. Changes in the macroscopic properties of a physical system occur at phase transitions, which often involve a symmetry breaking process [11]. The theory of phase transitions and symmetry breaking was formulated by Landau as a phenomenological model, and was later de- vised from microscopic principles using the renormaliza- tion group [12]. Phases can be identified by an order parameter, which is zero in the disordered phase and fi- nite in the ordered phase. The order parameter is de- termined by symmetry considerations of the underlying Hamiltonian. There are states of matter where the or- der parameter can only be defined in a complicated non- local way. These systems include topological states such as quantum spin liquids [13]. A major goal of the ML approach in complex statistical mechanics models or in strongly correlated systems is to detect the phase transi- tions from the data itself without explicitly constructing any order parameter [2]. The development artificial neural networks to detect phase transitions is a major advance in the area of ML applications in physics. In earlier works, artificial neural networks were based on supervised learning algorithms [2, 9]. Labeled data is used to train the supervised learn- ing algorithm, from which the algorithm learns to assign labels to the input data points [14, 15]. Apart from supervised learning, another major type of ML is unsu- pervised learning for which the input data has no label. Conventional unsupervised learning algorithms, such as principal component analysis [16], find structure in un- labeled data without involving any artificial neural net- work. Here, the data is classified into clusters and labels can then be assigned to the data points accordingly [16]. Autoencoder is a new direction to utilize artificial neu- ral networks in unsupervised machine learning. The very first versions of the autoencoder were being used for di- mensional reduction of data before feeding its output into other ML algorithms [17, 18]. It is created by encoding an artificial neural network, which outputs a latent rep- resentation of the given data, and is utilized as the input of decoding neural network that tries to accurately re- construct the input data from the latent representation arXiv:2111.01112v1 [cond-mat.dis-nn] 1 Nov 2021

Transcript of arXiv:2111.01112v1 [cond-mat.dis-nn] 1 Nov 2021

Application of the Variational Autoencoder to Detect the Critical Points of theAnisotropic Ising Model

Anshumitra BaulDepartment of Physics and Astronomy, Louisiana State University, Baton Rouge, Louisiana 70803, USA

Nicholas WalkerLawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley, California 94720, USA

Juana Moreno and Ka-Ming TamDepartment of Physics and Astronomy, Louisiana State University, Baton Rouge, Louisiana 70803, USA and

Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA(Dated: November 2, 2021)

We generalize the previous study on the application of variational autoencoders to the two-dimensional Ising model to a system with anisotropy. Due to the self-duality property of the system,the critical points can be located exactly for the entire range of anisotropic coupling. This presents anexcellent test bed for the validity of using a variational autoencoder to characterize an anisotropicclassical model. We reproduce the phase diagram for a wide range of anisotropic couplings andtemperatures via a variational autoencoder without the explicit construction of an order parameter.Considering that the partition function of (d+ 1)-dimensional anisotropic models can be mapped tothat of the d-dimensional quantum spin models, the present study provides numerical evidence thata variational autoencoder can be applied to analyze quantum systems via Quantum Monte Carlo.

I. INTRODUCTION

Machine learning (ML) has become an indispensabletool to reach the boundaries of scientific understandingin the age of big data. An overflow of information isbeing analyzed using ML to quantify patterns in a largevariety of fields from social networking, object and imagerecognition, advertising, finance, engineering, medicine,biological physics, and astrophysics among others [1].

Machine learning is a data modeling approach whichemploys algorithms which favor strategies driven by sta-tistical analysis and based on pattern extraction and in-ference. ML algorithms, such as deep learning, providenew advances to understand physical data. Opportuni-ties for scientific investigations are being devised partic-ularly in numerical studies, which naturally involve largedata sets and complex systems, where explicit algorithmsare often challenging. A concerted effort to address largedata problems in statistical mechanics and many-bodyphysics using the ML approach is emerging [2–9]. Thefoundation of ML is deeply connected with statisticalphysics, and hence is fruitful to combine ML techniqueswith numerical methods which involve the prediction ofphase transition regions. Scaling and renormalization arethe core principles to understand macroscopic phenom-ena from microscopic properties. The way forward formachines to learn from large data sets would incorporateconceptually similar principles [9, 10].

Changes in the macroscopic properties of a physicalsystem occur at phase transitions, which often involvea symmetry breaking process [11]. The theory of phasetransitions and symmetry breaking was formulated byLandau as a phenomenological model, and was later de-vised from microscopic principles using the renormaliza-

tion group [12]. Phases can be identified by an orderparameter, which is zero in the disordered phase and fi-nite in the ordered phase. The order parameter is de-termined by symmetry considerations of the underlyingHamiltonian. There are states of matter where the or-der parameter can only be defined in a complicated non-local way. These systems include topological states suchas quantum spin liquids [13]. A major goal of the MLapproach in complex statistical mechanics models or instrongly correlated systems is to detect the phase transi-tions from the data itself without explicitly constructingany order parameter [2].

The development artificial neural networks to detectphase transitions is a major advance in the area of MLapplications in physics. In earlier works, artificial neuralnetworks were based on supervised learning algorithms[2, 9]. Labeled data is used to train the supervised learn-ing algorithm, from which the algorithm learns to assignlabels to the input data points [14, 15]. Apart fromsupervised learning, another major type of ML is unsu-pervised learning for which the input data has no label.Conventional unsupervised learning algorithms, such asprincipal component analysis [16], find structure in un-labeled data without involving any artificial neural net-work. Here, the data is classified into clusters and labelscan then be assigned to the data points accordingly [16].

Autoencoder is a new direction to utilize artificial neu-ral networks in unsupervised machine learning. The veryfirst versions of the autoencoder were being used for di-mensional reduction of data before feeding its output intoother ML algorithms [17, 18]. It is created by encodingan artificial neural network, which outputs a latent rep-resentation of the given data, and is utilized as the inputof decoding neural network that tries to accurately re-construct the input data from the latent representation

arX

iv:2

111.

0111

2v1

[co

nd-m

at.d

is-n

n] 1

Nov

202

1

2

[19, 20].A major shortcoming of autoencoder is the possibility

of sharp changes in the latent representation with respectto small differences in the input data. Ideally, the latentrepresentation should be a smooth function of the in-put data. The variational autoencoder (VAE) representsthe latent representation in terms of probability distri-butions instead of a fixed set of numbers [21, 22]. Thisprobabilistic latent representation allows a smooth latentrepresentation. Since 2013, VAEs have been developedto become one of the most successful unsupervised learn-ing algorithms [21]. Promising results are being shownin both encoding and reconstructing data [21–23].

VAEs are being successfully applied recently to detectphase transitions in classical spin models [23–25]. Theinput data sets are given by Monte Carlo method andthen unsupervised machine learning, such as the VAE isused for deciphering and distinguishing different physicsin the input data sets. After successfully applying to clas-sical models, a natural question arises, whether such anapproach remains viable for quantum models. In particu-lar, can a VAE be viable to distinguish different quantumphases and transition regions based on the obtained datafrom quantum Monte Carlo. Recently, various modelsfrom statistical mechanics, in particular the Ising modeland the Potts model have been investigated [26–28]. Inthis work, we investigate a rather simple quantum model,one-dimensional transverse field Ising model(TFIM), toaddress the capability and the limitations of the autoen-coder. Given that, the critical line of the model can becalculated analytically due to self-dual property [29–31].The TFIM is an excellent test bed to address the variousaspects of VAEs in quantum models.

This paper is organized as follows. In section II, webriefly describe the transverse field Ising model (TFIM)and the Suzuki-Trotter formulation by mapping it to theanisotropic Ising model. In section III, the Monte Carlomethod and the VAE are presented. The results fromthe VAE are described in the section IV. We concludeand discuss the implication and possible future applica-tions of the method developed in this study in Section V.The self-duality of the anisotropic two-dimensional Isingmodel and the detail of the VAE are discussed in theappendix.

II. TRANSVERSE FIELD ISING MODEL

A. Model

We consider an Ising model with transverse field [32–34]. The Hamiltonian is given as

H = −∑<i,j>

Jijσzi σ

zj − Γ

∑i

σxi , (1)

where σα(α = x, y, z) are the Pauli matrices, which obey

the commutation relation, [σαi , σβj ] = 2ιδijεαβγσ

γi . Jij is

the coupling between the spins at sites i and j. Onlynearest neighbors coupling is considered in this study.Γ is the transverse field applied in the x-direction. ι isan imaginary number, and σz has the eigenvalues as ±1.Their eigenvectors are symbolically denoted by | ↑>. and| ↓>, that is

| ↑>=

(10

)(2)

and

| ↓>=

(01

). (3)

The order parameter is given by the average magnetiza-tion m =

∑i < σzi > /N (N being the total number of

sites) which characterizes the phase transition betweenparamagnetic and ferromagnetic phases. Without speci-fication, we consider only the one dimensional TFIM withcoupling limited to the nearest neighbors.

B. Suzuki-Trotter Formalism

Instead of working with the quantum spins di-rectly, we follow the standard procedure of mappinga d-dimensional quantum Hamiltonian into a d +1-dimensional effective classical Hamiltonian by theSuzuki-Trotter transformation [35–37]. We apply this tothe TFIM system. We first define the longitudinal spincoupling terms and the transverse filed terms as follow

H0 ≡ −∑<i,j>

Jijσzi σ

zj . (4)

V ≡ = −Γ∑i=1

σxi .

H = V +H0.

The partition function of H reads

Z = Tr(e−β(H0+V )), (5)

β is the inverse temperature. The Trotter formula gives,

exp(A1 +A2) = limM→∞

[exp(A1/M)exp(A2/M)]M (6)

when [A1, A2] 6= 0. Using the trotter formula we have

Z =∑i

limM→∞

〈si|[exp(−βH0/M)exp(−βV/M)]M |si〉,

(7)where si is the i-th spin configuration of the whole sys-tem, and the above summation runs over all the 2N possi-ble configurations denoted by i. M identifies the numberof identity operators. The identity operator formed fromthe spin operators is given as,

I=

2N∑i

|si,k〉〈si,k| (8)

=

1∑σi,k=−1

|σi,k, ..., σN,k〉〈σi,k, ..., σN,k|

3

where k = 1, 2, ...,M . Hence, it is the product of Mexponentials in Z.

Z = limM→∞

Tr

M∏k=1

〈σi,k, ..., σN,k| (9)

exp(−βH0

M)exp(−βV

M)|σi,k+1, ...σN,k+1〉

(10)

Applying the periodic boundary conditions impliesσN+1,p = σ1,p. After evaluating the expression of thepartition function,

Z = CNM2 Trσ(−βHeff [{σ]}), (11)

where C = 12 sinh 2βΓ

M and the effective classical Hamil-tonian

Heff ({σ}) =N∑

<i,j>

M∑k=1

[−JijM

σi,kσj,k (12)

− δij(2β)

ln (cothβΓ

M)σi,kσi,k+1].

σi,k involved are the eigenvalues of σz and hence there isno non-commuting part in Heff . The effective Hamilto-nian shows the system of spins in the d + 1-dimensionallattice and one extra label k for each spin variable. Eachsingle quantum spin variable σi in the original Hamil-tonian, now we have an array of M number of classicalspins σi,k. Hence, the partition function of the 1D quan-tum Ising model is mapped to that of the 2D Ising model.This new (time-like) dimension along which these classi-cal spins are spaced is called the Trotter dimension.

In this paper we assume the model only has nearestneighbor coupling on a square lattice. The coupling alongthe x-direction and y-direction are denoted as Jx and Jyrespectively. We also define Kx = 1/(βJx) and Ky =1/(βJy). We set N = M in this study.

For the two-dimensional classical Ising model, the crit-ical points can be obtained due to the self-dual property.The detail of the dual transformation is shown in theappendix A.

III. METHODOLOGY

A. Monte Carlo Sampling

The spin configurations are generated by using the sin-gle spin-flip Metropolis algorithm. In a single spin flip,first we flip the spin of a single lattice site, and thencalculate the change in energy ∆E. After that, we usethe resulting change in energy to calculate the Metropoliscriterion exp(−∆E/T ). If a randomly generated uniformnumber is in the range of [0, 1), and is smaller or equalto the Metropolis criterion, the configuration is acceptedas the new configuration. The code for the simulation

is written in Python using the NumPy library [38–41].We note that using generative neural network, insteadof Monte Carlo method, for sampling has also been pro-posed recently for the 1D quantum spin models [42, 43].

B. Variational Autoencoder

Variational autoencoder(VAE) is under the category ofgenerative models. New data can be produced by learn-ing the distribution of the input data [21, 44]. We usean encoder-decoder architecture where the encoder mapsthe input to a latent representation in term of some cho-sen distributions. The latent distribution is mapped backto reconstruct the input using the decoder. The latentrepresentation in a well-trained model can be used to gen-erate new samples resembling the original training data[45].

In this study, we only consider a multi-dimensionalGaussian distribution for the latent space representation.We briefly explain the main concepts of the VAE in thefollowing.Encoder: Encoder is a neural network that takes the

higher dimensional input data into a lower dimensionallatent space. For example, an image of the spin configura-tion of size 32×32 = 1024 lattice points, can be convertedinto a vector of dimension 8. The encoder is a methodfor dimensionality reduction. The neural network repre-senting the encoder maps each sample to a distributionand is also called a probabilistic encoder of VAE. The en-coder is denoted as qφ(z|x), which is a distribution thatmaps an input sample x (the Ising spin configurations) toprovide a latent representation z (the parameters in themulti-dimensional Gaussian distribution). φ is a set oflearnable parameters in the neural network that is variedto produce outputs from the encoder.Latent Space: The latent space is the input for the

decoder network, and it is outputted from the encodernetwork. For the VAE, the latent space is represented bymulti-dimensional Gaussian distribution. As the Gaus-sian distribution is completely specified by its mean andstandard deviations, the dimension of the latent spaceis two times as that of the dimension of the multi-dimensional Gaussian distribution. An encoded sampleis denoted by z.

The latent space is regularized and then get penal-ized for deviating from the prior multi-dimensional Gaus-sian distribution by the Kullback-Leibler (KL) diver-gence term [46].Decoder: From the fig. 1 one can see that the de-

coder in VAE converts the compressed samples in thelatent space back to the original input samples [47] . Itis represented as pθ(x|z), a distribution to produce recon-structed samples x′ conditioned on latent representationsz. θ is a learnable parameter in the neural network thatcan be varied to produce different outputs from the de-coder. A sampling distribution provides the input to thedecoder which is described by the latent representation

4

obtained from the encoder.

Encoder Decoder

Latent Space(multi-dimensionalGaussian)

Variables sampledfrom the latent space

Input DataReconstructedInput data

FIG. 1. Diagram depicting the structure of the VAE. Theleft hand side is the encoding part and the right hand sideis the decoding part. x is the input Ising spin configuration,qφ(z|x) is the encoder neural network, µ and σ are the latentmeans and standard deviations of the latent space distribu-

tion, z. pθ(x′|z) is the decoder neural network and x′ is the

reconstructed Ising spin configuration.

Loss Functions in VAE: The VAE contained twosets of trainable parameters, θ and φ for the neural net-works of the encoder and decoder respectively. They aretrained by minimizing the loss function. The loss func-tions in VAE consist of two terms. The first one measuresthe “similarity” of the inputs and the reconstructed out-puts. The second one measures the difference betweenthe designated priori distribution, chosen to be multi-dimensional Gaussian distribution, and the actual distri-bution of the compressed inputs.

For the first term of the loss function, the standard re-construction loss measures the error between the samplesgenerated by the decoder and the original input samples.It is measured by the binary crossentropy between theencoder input and decoder output in our study. This isexpressed as LRC = −Ez∼qφ(z|x)[log pθ(x|z)] The expec-tation, E, is over the representations, z with respect tothe encoder’s distribution.

The second is the Kullback-Leibler divergence (KLD)of the latent representations. KLD measures the diver-gence between the chosen latent representation p(z) andthe approximated distribution from the output of the en-coder qφ(z|x).

The KLD is defined as

LKLD = DKL [qφ(z|x)||p(z)] = −∑z

qφ(z|x) log

(qφ(z|x)

p(z)

).

(13)It is minimized to optimize the latent representation fromthe encoder qφ(z|x) to resemble the latent representationsfrom p(z).

The total loss is the sum of the reconstruction loss andthe KLD [48],

L(φ, θ;x, z) = −Ez∼qφ(z|x) [log pθ(x|z)]+DKL [qφ(z|x)||p(z)] = LRC + LKLD. (14)

The two losses in the VAE together are optimized simul-taneously to describe the latent state for an observationwith distributions close to the prior approximated distri-bution but also deviates to describe the salient featuresof the input.

The linear combination of the reconstruction loss andthe KLD is often denoted as a variational lower boundor evidence lower bound loss function as both the recon-struction loss and the KLD are non-negative. Minimizingthe loss, minθ,φL(θ, φ;x, z), maximizes the lower boundof the probability of generating new samples [45].

C. β-total correlation VAE

A further refinement of the VAE can be achieved bydecomposing the KLD into three parts. The three partsdescribing the index-code mutual information, total cor-relation, and dimension-wise Kulback-Leibler divergence,which are characterized by parameters often denoted asα, β, and γ respectively. β is the most important one forobtaining good results [48, 49].

A β- total correlation VAE (β-TCVAE) is defined asthe VAE with the α = γ = 1 and tuning β as a parameter.The detail is given in the appendix B. It is well suitedfor representation learning of patterns [49]. We fix theparameters of the decomposition with α = γ = 1 andβ = 8 in this study, these values were found to be goodfor finding the phase diagram of the 2D isotropic Isingmodel [25, 50].

Our goal is to map the raw Ising spin configurationsto a reduced set of descriptors that discriminate betweenthe samples using the criterion inferred by the β-totalcorrelation VAE [25, 50]. The encoder and decoder areimplemented as deep convolution neural networks (CNN)to preserve the spatially dependent 2-dimensional struc-ture of the Ising spin configurations [51]. Scaled expo-nential linear unit (SELU) activation function is used ineach convolution layer. The output of the final convo-lution layer is flattened, and fed into two 2-dimensionaldense layers. Then, it is used as the input layer for thedecoder CNN and reshaped to match the structure ofthe output from the final convolution layer in the en-coder CNN. Hence, the decoder network is simply theopposite of the encoder network, incorporated with con-volution transpose layers in favor of standard convolutionlayers [50–52]. The final output layer from the decodernetwork is reproduced from the original input configu-rations obtained from the encoder network which uses asigmoid activation function. The loss term consists of thereconstruction loss and the β-total correlation KLD (β-TCKLD) term with α = γ = 1 and β = 8 [46, 50]. Weemploy minibatch stratified sampling on the given datawhile training.

To optimize the loss, Nesterov-accelerated AdaptiveMoment Estimation(Nadam) was used and it efficientlyminimizes the loss during the training of the β-TCVAEmodel [50, 53]. The default parameters provided by the

5

Keras library and a learning rate of 0.00001 were chosen.The training is performed over 100 epochs with a batchsize of 33 × 33 = 1089 for both lattice size N = 64 andN = 128 with a number of sample 1024 per phase point.The reduced descriptors of the 2D Ising spin configura-tion are given by the latent variables [50]. The β-TCVAEmodel used in this work was implemented using the KerasML library with TensorFlow as a supporting backhand[54, 55].

D. Principal Component Analysis on the LatentSpace

The principal components analysis (PCA) is applied onthe latent means and standard deviations of the Ising spinconfigurations obtained after fitting the β -TCVAE, inde-pendently to produce linear transformations of the Gaus-sian parameters that discriminate between the samplesusing the scikit-learn package [16, 56]. The PCA per-forms an orthogonal transformation into a new basis setof linearly uncorrelated features, principal components.Each principal component encompasses the largest pos-sible variance across the sample space under an orthogo-nality constraint [16].

The latent representation characterizes the structureof the Ising configurations, but the principal componentsof the latent representation shows greater discrimina-tion between the different structural characteristics of theconfigurations compared to the raw variable representa-tions [50]. The rationale of using the PCA is to providea more compact representation which characterizes thedifferent phases of the Ising model. As we will show inthe results section, the first and the second dominatedcomponents already show good signals to distinguish dif-ferent phases of the anisotropic Ising model.

IV. RESULTS

As explained in the previous section, for the VAE mod-els, the samples were drawn from a multi-dimensionalGaussian distribution parametrized by vectors of meansand standard deviations denoted as µi and σi respec-tively, where i is the index of the dimension of the dis-tribution. All of the plots in this paper were generatedwith the MatPlotLib package using a perceptually uni-form colormap [57]. In each plot, the coloration of eachsquare pixel on the diagram represents the ensemble aver-age value of the principal components of the latent spacemean or variance of the multi-dimensional Gaussian func-tion. We study two different system sizes N = 64, and128. We will focus on the first two principal componentsof the latent variance and the second principal compo-nent of the latent mean. We denote them as τ0, τ1, and ν1

respectively. The first principal component of the latentmean, ν0 does not capture a clear distinction between theferromagnetic and paramagnetic phases of the system.

ν1 are plotted in fig. 2. It is apparent that ν1 is aresemblance of the magnetization, m, of snapshots of theIsing spin configuration. We note that it is not to be ex-pected that any of the latent variables to have the samevalue of any physical quantities, such as magnetization.Nonetheless, from the plot of ν1, it clearly discriminatesthe ferromagnetic from paramagnetic phases, separatingthem by the phase transition line. The phase transitionline in white corresponds to the analytical solution, eq.A14 in the appendix calculated in the thermodynamiclimit. Since the magnetization can be seen as the orderparameter for the 2-dimensional Ising model, a reason-able representation of the order parameter is seen to bepossibly extracted from the VAE. We note that the sim-ulations and the VAE are done on finite size systems, atrue broken symmetry does not occur, that is the reasonof the seemingly random values of ν1 in the ferromag-netic phase. Usually, the amplitude of the magnetizationis considered for finite size simulation, as we will showother latent variables from the VAE has similar structureas the amplitude of the magnetization. As the magneti-zation is a linear feature of the Ising spin configuration,a simpler linear model would be sufficient for extractingthe magnetization.

The first principal component of the latent varianceτ0 are plotted in fig 3. The white line is the analyti-cal solution for the phase transition points. One can findthat the value of τ0 remains very small in the upper rightregion of the figures. This can be considered as the re-flection of the small changes of amplitude of the energyor amplitude of the magnetization in the paramagneticphase. Once the systems approach the critical line fromthe upper right region of the figure, the value of τ0 in-crease sharply. This behavior is again consistent with theamplitude of energy or magnetization.

In particular, we consider the case for the isotropiclimit (Kx = Ky), that is the classical (Γ = 0) limit. Thecritical point is given as Kc(= Kx = Ky) = 2

ln[1+√

2]∼

2.2721 [58]. From the fig. 3, we see a sharp change inthe τ0 around the value of Kc. This result is consistentwith the accomplishments of prior published works forthe isotropic Ising model on a square lattice [23–25, 50].

Given that the paramagnetic samples are essentiallynoise due to entropic contributions. Therefore, these hap-pens to be easy to discriminate from the rest of the sam-ples using a β-TCAVE model. In view of the fact, thesamples with ν1 values corresponding to nearly zero mag-netization and rather high values for τ0 resemble Gaus-sian noise with no notable order preference. An interest-ing question is to look for quantity which may resembleto response function, such as susceptibility and heat ca-pacity.

We plot the second principal component of the latentvariance τ1 in fig. 4. The phase transition line is againincluded to show the transition in the thermodynamic

6

FIG. 2. The second principal component of the latent mean, ν1, with respect to the different Kx and Ky (1/(βJx) and 1/(βJy))for the 2D square lattice. The left panel and right panel are for the system sizes N = M = 64 and N = M = 128, respectively.

FIG. 3. The first principal component of the latent variance, τ0, with respect to the different Kx and Ky (1/(βJx) and 1/(βJy))for the 2D square lattice. The left panel and right panel are for the system sizes N = M = 64 and N = M = 128, respectively.

limit for comparison. The values of τ1 are much largeralong the analytical critical line than the other combina-tions of Kx and Ky. Thus τ1 bears a strong resemblanceto the magnetic susceptibility.

From the plots of τ0, τ1, and ν1, we find that the VAEis capable to generate quantities which resembles to themagnetization, amplitude of magnetization/energy, and

magnetic susceptibility/heat capacity, respectively. Fromthese quantities we can draw the phase transition linebetween the paramagnetic phase and the ferromagneticphase.

The quality of fitting the Ising spin configurations tothe present VAE model can be quantified by the val-ues of the loss functions. The three losses represented

7

FIG. 4. The second principal component of the latent variance, τ1, with respect to the different Kx and Ky (1/(βJx) and1/(βJy)) for the 2D square lattice. The left panel and right panel are for the system sizes N = M = 64 and N = M = 128,respectively.

FIG. 5. β-TCVAE loss and the reconstruction loss settle to about 0.5 and the latent loss settles close to 0 for 100 epoch cycles.The left panel and right panel are for the square lattice of size N = M = 64 and N = M = 128, respectively.

in fig 5 are the VAE loss, reconstruction loss, and thelatent loss. The reconstruction loss converges to about0.5. The latent loss is obtained from the β-TCKLD term.This loss converges quickly to a value near to 0. The to-tal β-TCVAE loss for the 2-dimensional anisotropic Isingmodel for both lattice size N = 64 and N = 128 settles

quickly around 0.5.

8

V. DISCUSSION AND CONCLUSIONS

We use the β-TCVAE to extract structural informa-tion from raw Ising spin configurations. It exposes in-teresting derived descriptors of the configurations usedto identify the second order phase transition line amongother regions. The analysis here shows the interpreta-tion of the extracted sample space as represented by thelatent variables. This is done by studying the behaviorof the latent variable mappings of the Ising spin configu-rations with respect to the anisotropic coupling (Jx, Jy)associated with the temperatures.

We find that ν1, the second principal component of thelatent mean, is considered to reflect the magnetization ofthe 2-dimensional anisotropic Ising model, hence it is in-terpreted as an indicator for the magnetization exhibitedby the configurations. In contrast, τ0 and τ1, which arethe first two principal components of the latent variance,can be seen as an indicator of the amplitude of magneti-zation or energy and the magnetic susceptibility or heatcapacity. Thus both τ0 and τ1 can provide a reasonableestimate of the second order phase transition line.

Since the d + 1-dimensional anisotropic Ising modelis equivalent to the d-dimensional quantum spin systemthrough the Suzuki-Trotter transformation. This methodcan be trivially extended to other 1D quantum systems[59].

Moreover, methods for strongly correlated systems,such as the dynamical mean field theory (DMFT) andtheir cluster generalizations–the dynamical cluster ap-proximation (DCA) and the cellular dynamical meanfield theory (CDMFT)–have very similar data structurein their Hirsch-Fye impurity solver [60–62]. It is notori-ously difficult to obtain the putative quantum criticalpoint from the paramagnetic solution of the Hubbardmodel as there is no simple quantity to track the tran-sition [63, 64]. Therefore, the method developed herewould be an important tool for analyzing the data fromthe DMFT, DCA, and CDMFT.

There are many opportunities for further developingthis method, not only investigating more complex sys-tems, but also by introducing improvements which arebeyond the scope of this work. Finite size scaling turnsto be an important approach to address the limitationsin solving finite-sized systems for investigating regionsnear critical phenomena [65, 66]. A correspondence es-tablished between the VAE encodings of different systemsizes is a challenging argument, as different VAE struc-tures need specific training for each system size, which inturn demands for different hyperparameters and trainingiterations, counts to obtain equivalent analysis and hencesimilar answers [65]. Numerical difficulties tend to arisewhile performing finite-size scaling analysis, because thevariation of predicted properties with respect to the sys-tem size is hard to isolate from the systemic variation dueto different neural networks trained with different hyper-parameters is being used to extract the specific macro-scopic properties. However, this would play a significant

role towards improving VAE characterization of criticalphenomena.

Another interesting direction is to use the generativeadversarial network (GAN) [67] instead of VAE as thegenerative model, promising results have been obtainedfor the isotropic 2D Ising model [68].

VI. ACKNOWLEDGMENT

This manuscript is based upon work supported by NSFDMR-1728457. This work used the high performancecomputational resources provided by the LouisianaOptical Network Initiative (http://www.loni.org), andHPC@LSU computing. NW research work at LouisianaState University was supported by NSF DMR-1728457.JM is partially supported by the U.S. Department of En-ergy, Office of Science, Office of Basic Energy Sciencesunder Award Number DE-SC0017861.

Appendix A: Self-Duality of the two-dimensionalIsing Model

In this appendix, we summarize the derivation of thecritical points for the anisotropic 2-dimensiaonl Isingmodel via the self-duality property of the model [29, 69,70]. We closely follow the lecture notes by Muramatsu inthe following [70].

Considering the partition function of the Ising model(where K = 1

βJ )

Z =∑{Sj}

eK∑<j,l> SjSl =

∑{Sj}

∏<j,l>

eKSjSl

=∑{Sj}

∏<j,l>

1∑r=0

Cr(K)(SjSl)r (A1)

where C0(K) = coshK and C1(K) = sinhK. Applyingthe simple transformation, for each bond < j, l >, a newZ2 variable, r is introduced. We label the new variableas rµ with µ ≡ (i, < i, j >), labelling it with the site ifrom which the bond < i, j > emanates. The partitionfunction thus follows

Z =∑{Sj}

∑{rµ}

∏<j,l>

Crµ(K)∏i

S∑<i,j> rµ

i (A2)

grouping all products of spins on site i together,∑<i,j> rµ, contains all four contributions, resulted due

to the bonds connected to site i. Further, we performexplicitly the sum over all spin configurations

Z =∑{rµ}

∏<j,l>

Crµ(K)∏i

∑Si=±1

S∑<i,j> rµ

i

=∑{rµ}

∏<j,l>

Crµ(K)∏i

2δ[ mod 2(∑<i,j>

rµ)] (A3)

9

We define a dual lattice, where the vertices of the duallattice are set in the center of the plaquettes defined bythe original lattice. We have vanishing contributions dueto the presence of the Kronecker delta in many configu-rations. Defining the new Z2 variables σi = ±1 on thesites of the dual lattice. We can associate for each linkof the original lattice, there is a pair of σi’s (e.g. on thesites i and j on the dual lattice). Therefore, the variablerµ is expressed as:

rµ =1

2(1− σiσj) (A4)

where sites i and j on the dual lattice are those wherethe link crosses rµ. The sum of rµ is over the four nearestneighbors of a site i, we have∑

<i,j>

rµ = 2− 1

2(σ1σ2 + σ2σ3 + σ3σ4 + σ4σ1) (A5)

There are 24 possible configurations for the four variablesσ1, ..., σ4. They are grouped in four cases and all the caseslead to an even number for the sum of rµ over the nearestneighbor bonds. The choice of variables needs to satisfythe δ−function. The partition function becomes

Z =1

22N∑{σi}

∏<j,l>

C[(1−σiσj)/2](K), (A6)

where N is the number of sites on the lattice and theproduct is now over the bonds associated in the duallattice. The expression of the partition function showsthat the weight for each configuration of the σ’s is givenby the coefficients C(K). Hence, introducing C(K) to aform that resembles a Boltzmann weight.

Cr(K)= cosh(K)[1 + r(tanh(K)− 1)] (A7)

= cosh(K) exp(ln[1 + r(tanh(K)− 1)])

= cosh(K) exp(r ln tanh(K))

= cosh(K) exp[1

2(1− σiσj) ln tanh(K)]

= [cosh(K) sinh(K)]1/2 exp(−1

2ln tanhKσiσj)

The partition function becomes

Z = 1/2(sinh 2K)−N∑{σi}

exp(K∑<j,l>

σjσl). (A8)

There are 2N bonds and defining the new coupling con-stant as

K ≡ − ln tanh(K/2) (A9)

(where K = 1βJ ) coupling for the Ising model on the dual

lattice. The Ising model is self-dual since the duality

transformations brings it into itself. We consider the freeenergy per site

f(Z) = − 1

NlnZ. (A10)

According to the relation between the partition functionsof the original and the dual model, we can write

f(K) = ln sinh(2K) + f(K). (A11)

This is a strong constraint on the free energy. Since,sinh(2K) is an analytic function, the equation has a sin-gularity in f(K), which corresponds to a singularity in

f(K). K(K) is a monotonous function of K, hence it

holds Kc = Kc. We have

Kc =1

2ln(1 +

√2). (A12)

The self-duality has allowed us to calculate the exactvalue of the critical temperature in the 2-dimensionalisotropic Ising model(Kx = Ky), where Kx = 1

βJxand

Ky = 1βJy

. Generalizing the results obtained in the

isotropic to the anisotropic one, i.e. when couplingsKx 6= Ky in the respective directions. The anisotropiccase is as follows

Ky ≡ −1

2ln tanh(Kx), Kx ≡ −

1

2ln tanh(Ky). (A13)

Given Kx and Ky there is only one critical point, summa-rizing both the equations above with the following con-dition for a critical line separating the ordered from thedisordered phase in the anisotropic Ising model

sinh(2Kxc) sinh(2Kyc) = 1. (A14)

Appendix B: β-TCVAE Loss

The expression of the total loss for the VAE is givenby

L(φ, θ;x, z) = LRC + LKLD, (B1)

where the reconstruction error (RC) and the Kullback-Leibler divergence (KLD) are defined as

LRC = −Ez∼qφ(z|x) [log pθ(x|z)] (B2)

and

LKLD = DKL [qφ(z|x)||p(z)] (B3)

respectively. If the prior distribution of the latent rep-resentation is Gaussian, the VAE provides disentangledfactors in the latent representation, which means thesignificant dimension of the latent space is largely in-dependent of each other. In β-total correlation VAE(β-TCVAE), we try to improve the disentanglement offactors in the representation by decomposing the KLD

10

term and apply tuning parameters to them indepen-dently [49, 50]. Each training sample is identified witha unique integer index n ∈ 1, 2, .., N and assigned a uni-form random variable in this decomposition. The ag-gregated posterior, qφ(z) =

∑n qφ(z|n)p(n) captures the

aggregate structure of the latent variables under the dis-tribution of the input, where qφ(z|n) = qφ(z|xn) andqφ(z, n) = qφ(z|n)p(n) = 1

N qφ(z|n). The decompositionis given as

I(z;x) +DKL[qφ(z)||∏j

qφ(zj)]

+∑j

DKL[qφ(zj)||p(zj)] (B4)

The first term is the index-code mutual information (MI),I(z;x) = DKL[qφ(z, n)||qφ(z)p(n)], between the inputand the latent variable, which is based on the empiricalinput distribution qφ(z, n). The second term is a mea-

sure of the dependence between the latent variables andis called the total correlation(TC). It is important to pro-duce representations which penalize the total correlationand hence force the model towards discovering statisti-cally disentangled factors in the input distribution. Thethird term prevents the individual latent variables in therepresentation from deviating far from their priors. It iscalled the dimension-wise KLD [50]. After adding thetuning parameters to the decomposition, the β-TC mod-ified KLD term becomes :-

Lβ−TC = αI(z;x) + βDKL[qφ(z)||∏j

qφ(zj)]

+γ∑j

DKL[qφ(zj)||p(zj)] (B5)

Modulating only the β parameter shows the greatest ef-fect on disentanglement in the latent representation givenby empirical evidence [49].

[1] M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg,P. Sidike, M. S. Nasrin, B. C. V. Esesn, A. A. S. Awwal,and V. K. Asari, arXiv:1803.01164 (2018).

[2] J. Carrasquilla and R. G. Melko, Nat. Phys. 13, 431–434(2017).

[3] G. Pilania, J. E. Gubernatis, and T. Lookman, Phys.Rev. B 91, 214302 (2015).

[4] N. Walker, K.-M. Tam, B. Novak, and M. Jarrell, Phys.Rev. E 98, 053305 (2018).

[5] L. Wang, Phys. Rev. B 94, 195105 (2016).[6] W. Hu, R. R. P. Singh, and R. T. Scalettar, Phys. Rev.

E 95, 062122 (2017).[7] S. J. Wetzel and M. Scherzer, Physical Review B 96,

184410 (2017).[8] L. Huang and L. Wang, Phys. Rev. B 95, 035105 (2017).[9] G. Torlai and R. G. Melko, Phys. Rev. B 94, 165134

(2016).[10] P. Mehta and D. J. Schwab, arXiv:1410.3831 (2014).[11] L. D. Landau, Phys. Z. Sowjet. 11, 26 (1937).[12] M. E. Fisher, Rev. Mod. Phys. 46, 597 (1974).[13] L. Savary and L. Balents, Rep. Prog. Phys. 80, 016502

(2016).[14] J. Schmidhuber, Neural Networks 61, 85 (2015).[15] T. O. Ayodele, New advances in machine learning 3, 19

(2010).[16] K. Pearson, London Edinburgh Philos. Mag. J. Sci. 2,

559 (1901).[17] Y. Wang, H. Yao, and S. Zhao, Neurocomputing 184,

232 (2016).[18] J. Wang, H. He, and D. V. Prokhorov, Procedia Comput.

Sci. 13, 120 (2012).[19] J. Almotiri, K. Elleithy, and A. Elleithy, in 2017 IEEE

Long Island Systems, Applications and Technology Con-ference (LISAT) (IEEE, 2017) pp. 1–5.

[20] E. Plaut, arXiv:1804.10253 (2018).[21] D. P. Kingma and M. Welling, arXiv:1312.6114 (2014).[22] D. P. Kingma and M. Welling, arXiv:1906.02691 (2019).[23] S. J. Wetzel, Phys. Rev. E 96, 022140 (2017).

[24] C. Alexandrou, A. Athenodorou, C. Chrysostomou,and S. Paul, The European Physical Journal B 93,10.1140/epjb/e2020-100506-5 (2020).

[25] N. Walker, K.-M. Tam, and M. Jarrell, Sci. Rep. 10(2020).

[26] D. Kim and D.-H. Kim, Phys. Rev. E 98, 022138 (2018).[27] A. Morningstar and R. G. Melko, J. Mach. Learn. Res.

18, 17 (2018).[28] K. Shiina, H. Mori, Y. Okabe, and H. K. Lee, Sci. Rep.

10 (2020).[29] H. A. Kramers and G. H. Wannier, Phys. Rev. 60, 263

(1941).[30] G. H. Wannier, Rev. Mod. Phys. 17, 50 (1945).[31] R. J. Baxter, Exactly Solved Models in Statistical Me-

chanics (Elsevier, 2016).[32] P. Pfeuty, Ann. Phys. (N. Y.) 57, 79 (1970).[33] R. B. Stinchcombe, J. Phys. C: Solid State Phys. 6, 2459

(1973).[34] P. M. Chaikin and T. C. Lubensky, Principles of Con-

densed Matter Physics (Cambridge University Press,1995).

[35] M. Suzuki, Quantum Monte Carlo Methods in Equilib-rium and Nonequilibrium Systems, Vol. 74 (Springer Sci-ence & Business Media, 2012).

[36] M. Suzuki, Prog. Theor. Phys. 46, 1337 (1971).[37] B. K. Chakrabarti and A. Das, in Quantum Annealing

and Other Optimization Methods (Springer, 2005) pp. 1–36.

[38] G. Van Rossum and F. L. Drake Jr, Python referencemanual (Centrum voor Wiskunde en Informatica Ams-terdam, 1995).

[39] S. van der Walt, S. C. Colbert, and G. Varoquaux, Com-put. Sci. Eng. 13, 22 (2011).

[40] D. D. Team, Dask: Library for dynamic task scheduling(2016).

[41] S. K. Lam, A. Pitrou, and S. Seibert, in Proceedings of theSecond Workshop on the LLVM Compiler Infrastructurein HPC , LLVM ’15 (ACM, New York, NY, USA, 2015)

11

pp. 7:1–7:6.[42] K. A. Nicoli, S. Nakajima, N. Strodthoff, W. Samek, K.-

R. Muller, and P. Kessel, Phys. Rev. E 101 (2020).[43] J. Vielhaben and N. Strodthoff, Physical Review E 103

(2021).[44] C. Doersch, arXiv:1606.05908 (2016).[45] A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and

O. Winther, in International conference on machinelearning (PMLR, 2016) pp. 1558–1566.

[46] S. Kullback and R. A. Leibler, Ann. Math. Stat. 22, 79(1951).

[47] L. Weng, lilianweng.github.io/lil-log (2018).[48] C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Wat-

ters, G. Desjardins, and A. Lerchner, arXiv preprintarXiv:1804.03599 (2018).

[49] R. T. Chen, X. Li, R. Grosse, and D. Duvenaud, arXivpreprint arXiv:1802.04942 (2018).

[50] N. Walker, Identifying structure transitions using ma-chine learning methods, louisiana state university thesis(2020).

[51] W. Zhang, K. Itoh, J. Tanida, and Y. Ichioka, Appl. Opt.29, 4790 (1990).

[52] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochre-iter, arXiv:1706.02515 (2017).

[53] S. Ruder, arXiv preprint arXiv:1609.04747 (2016).[54] L. Banarescu, C. Bonial, S. Cai, M. Georgescu, and

K. Griffitt, Networks 25, 211 (2016).[55] F. Chollet et al., Keras (2015).[56] L. van der Maaten and G. Hinton, J. Mach. Learn. Res.

9, 2579 (2008).

[57] J. D. Hunter, Comput. Sci. Eng. 9, 90 (2007).[58] L. Onsager, Phys. Rev. 65, 117 (1944).[59] H. Betsuyaku, Prog. Theor. Phys. 73, 319 (1985).[60] T. Maier, M. Jarrell, T. Pruschke, and M. H. Hettler,

Rev. Mod. Phys. 77, 1027 (2005).[61] A. Georges, G. Kotliar, W. Krauth, and M. J. Rozenberg,

Rev. Mod. Phys. 68, 13 (1996).[62] J. E. Hirsch and R. M. Fye, Phys. Rev. Lett. 56, 2521

(1986).[63] N. Vidhyadhiraja, A. Macridin, C. Sen, M. Jarrell, and

M. Ma, Phys. Rev. Lett. 102, 206407 (2009).[64] S. Kellar and K.-M. Tam, arXiv (2020),

arXiv:2008.12324 [cond-mat.str-el].[65] J. Cardy, Finite-size Scaling , Current physics (North-

Holland, 1988).[66] M. E. Fisher and M. N. Barber, Phys. Rev. Lett. 28,

1516 (1972).[67] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,

D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,Generative adversarial networks (2014), arXiv:1406.2661[stat.ML].

[68] N. Walker and K.-M. Tam, Mach. learn.: sci. technol. 2,025001 (2021).

[69] J. B. Kogut, Rev. Mod. Phys. 51, 659 (1979).[70] A. Muramatsu, The Ising Model, Duality, and Transfer

Matrix - Lattice Gauge Theory (2009).