Sum-Product Logic: Integrating Probabilistic Circuits into … · 2021. 7. 19. · Sum-Product...

Sum-Product Logic: Integrating Probabilistic Circuits into DeepProbLog

Arseny Skryagin 1 Karl Stelzner 1 Alejandro Molina 1 Fabrizio Ventola 1 Zhongjie Yu 1 Kristian Kersting 1

AbstractWe introduce Sum-Product Logic (SPLog), a deepprobabilistic logic programming language that in-corporates learning through predicates encodedas probabilistic circuits, specifically sum-productnetworks. We show how existing inference andlearning techniques can be adapted for the newlanguage. Our empirical illustrations demonstratethe benefits of supporting symbolic and deep rep-resentations, both neural and probabilistic circuitones for inference and (deep) learning from ex-amples. To the best of our knowledge, this workis the first to propose a framework where deepneural networks, probabilistic circuits, expressiveprobabilistic-logical modeling and reasoning areintegrated.

1. IntroductionIn the past three years, many deep probabilistic program-ming languages (DPPL) have been proposed, e.g., Edward(Tran et al., 2017), Pyro (Bingham et al., 2018) and Turing(Ge et al., 2018), among others. The main focus of theseworks was to leverage the expressive power of deep neuralnetworks within probabilistic programming systems. In par-ticular, DeepProblog (Manhaeve et al., 2018) targets similargoals in the relational setting, by allowing probabilistic pred-icates to be specified as (conditional) distributions definedvia deep neural networks. It further supports end-to-endtraining, program induction, and (sub-)symbolic representa-tion as well as reasoning.

However, in this setting, deep models are used purely as(unfaithful) conditional density estimators. This limits thetypes of inferences that are possible in DeepProbLog: Lack-ing any model for the inputs of the neural network, missingvalues can not be inferred or sampled. A possible remedyto these issues may be to use deep generative models such

1CS Department, TU Darmstadt, Darmstadt, Germany.Correspondence to: Arseny Skryagin <[email protected]>.

Working Notes of the ICML 2020 Workshop on “Bridge BetweenPerception and Reasoning: Graph Neural Networks and Beyond”.Copyright 2020 by the author(s).

as variational autoencoders (Kingma & Welling, 2014) ornormalizing flows (Rezende & Mohamed, 2015). Whilethese models specify joint probability distributions, infer-ence within them is highly intractable or limited to a re-stricted set of queries. If the goal of the overall system is toanswer a variety of probabilistic queries, the cost of com-puting them within these models can become prohibitive.

Sum-Product Networks (SPNs) (Poon & Domingos, 2011)is a probabilistic graphical model that allows us to do infer-ence, marginalize, and sample in linear time in the size ofthe model. The expressiveness and and speed make SPNsan ideal candidate for our purposes.

Consequently, we propose a novel deep probabilistic pro-gramming language, called Sum-Product Logic (SPLog). Itsmain components are sum-product predicates, which encodethe predicates as SPNs so that all conditional and marginalqueries on these predicates may be answered exactly inlinear time.

2. Sum-Product Logic (Programming)ProbLog. The ProbLog language (Kimmig et al., 2007)allows its users to write probabilistic logic programs, inwhich some logical facts are annotated with probabilities. Ituses sentential decision diagrams (SDD) (Darwiche, 2011)to represent propositional knowledge bases. In particularone can use the annotated disjunctions (AD) to assign log-ical rules a probability: p :: a(~x) :− b1, . . . ,bm. This ex-ample statement indicates that if the atoms b1, . . . ,bm aretrue, then the predicate a(x) is implied with probabilityp. This function opened the doors for further possibili-ties, beyond just specifying constant probabilities. Oneapproach is to delegate the choice of probability to an ex-ternal model which assigns them based on the predicate’sarguments~x. This is the approach taken in Hybrid ProbLog(Gutmann et al., 2010): here, primitive continuous distri-butions such as Gaussians are used to specify distributionsp(~X), enabling the definition of flexible predicates suchas: p(~X =~x) :: a(~x) :− b1, . . . ,bm. To employ more flex-ible distributions, DeepProbLog (Manhaeve et al., 2018)uses deep neural networks to provide conditional probabil-ities for predicates. In this case, predicates of the form:p(~Y =~y|~X =~x) :: a(~x,~y) :−b1, . . . ,bm. are specified, wherep(~Y =~y|~X =~x) is the probability the neural network assigns

Sum-Product Logic

the output~y when given the input~x. By implementing auto-matic differentiation in ProbLog, the network can be trainedjointly with the ProbLog model.

However, not all deep neural networks are safe to encodecalibrated distributions with complex dependency structuresnor do they encode joint distributions over inputs and out-puts. To elevate this problem, we now show how, analo-gously, ProbLog programs may refer to SPNs as externalprobabilistic models. This will give rise to the SPLog1

framework.

SPNs. Sum-Product Networks (SPNs)(Poon & Domingos,2011) are deep mixture models represented via a rooteddirected acyclic graph with a recursively defined structure.There are three types of nodes (same for the root node).Sum-nodes that represent a mixture over distributions, andthey have weighted edges pointing to their children as themixing weights. Product-nodes with edges pointing to theirchildren, representing decompositions via context-specificindependencies. Finally, we have the Leaf-nodes represent-ing parametric distributions (e.g Gaussian, Poisson, etc.).To compute probabilities with an SPN, we simply have tocompute the values of the nodes starting from the leaves.Since each leaf is a univariate distribution, one simply setsthe evidence on those distributions, obtains the probabilitiesand then evaluate the network bottom up. On product nodes,we simply multiply the values of the children nodes. Onsum nodes, we sum the weighted values of the childrennodes. The value at the root indicates the probability ofthe given configuration. To compute marginals, i.e., theprobability of partial configurations, it is necessary to sumout one or more variables. This can be achieved by settingthe probability at the leaves for those variables to 1 and thenproceed as before. Conditional probabilities can then becomputed as the ratio of partial configurations. To computeMPE states, we replace sum nodes by max nodes and thenevaluate the graph first with a bottom up pass, but insteadof weighted sums we pass along the weighted maximumvalue. Finally, in a top down pass, we select the paths thatlead to the maximum value, finding the MPE states of firedleaves. Differently to other probabilistic graphical models,sum-product networks (SPNs) support exact tractable infer-ence i.e. inference complexity is linear in the size of thegraph.

SPLog. An SPLog program is a ProbLog program thatis extended with a set of ground sum-product annotateddisjunctions (spADs) of the form:

spn(ma, ~Q,~E =~e) :: a(~e, ~q1); . . . ;a(~e, ~qn) :−b1, . . . ,bm,

where the bi are atoms, ~E =~e is a vector of random vari-ables representing evidence as the input of the SPN for pred-

1Code is available at https://github.com/askrix/Sum-Product-Logic

icate a, ~Q is the set of random variables being queried, andq1, . . . ,qn are vectors of its realisations. ma is the identifierof a SPN model which specifies a probability distributionover the set of variables ~X , where ~Q⊆ ~X ,~E ⊆ ~X ,~E ∩ ~Q = /0.We use the notation spn(ma, ~Q,~E =~e) to indicate that thisset of logical rules is true with the probabilities pma(~Q =~qi|~E =~e), which the SPN assigns to the query realisationsgiven the evidence. It is clear that SPLog directly inherits itssemantics, and to large extent also its inference mechanism,from (Deep)ProbLog.

Algorithm 1 SPLog - training procedureground = ProbLog.ground(q)sdd = SDD.create_from(ground)~x = ProbLog.extract_parameters(model)p, ∂ p

∂x = ProbLog.solve(sdd,~x)optimizer.backward(− 1

p+ε· ∂ p

∂x )

The approach to jointly train the parameters of probabilisticfacts and sum-product networks in the SPLog program isthe following. Similar to (Manhaeve et al., 2018), we usethe learning from entailment (i.e. learning from queries)setting. That is, for a given SPLog program with parametersX and a set Q of pairs (q, p) where q is a query and p itsdesired success probability, we compute for a loss functionL:

argmin~x1|Q|∑(q,p)∈Q L(PX =~x(q), p) .

Alg. 1 describes the training loop given an SPLog program,its SPN models and a query. Specifically, to compute thegradient of the SPN parameters with respect to the loss func-tion (step 4), we can apply automatic differentiation bothto the ProbLog program and to the SPNs. For the former,the algebraic extension of ProbLog (aProbLog) (Kimmiget al., 2011) is used. This associates to each probabilis-tic fact a value from an arbitrary commutative semiring.aProbLog uses a labeling function that explicitly associatesvalues from the chosen semiring with both facts and theirnegations combining these using semiring addition ⊕ andmultiplication⊗ on the SDD. For SPLog we are using gradi-ent semiring, whose elements are tuples of the form (p, ∂ p

∂x ),where p is a probability and ∂ p

∂x is the partial derivativeof the probability p with respect to a parameter x. Wewrite t(pi) :: fi for the learnable probability of a proba-bilistic fact. All this forms the basis for forward modeautomatic differentiation within ProbLog, and it results inthe partial derivatives of the loss with respect to the SPNprobabilities ∂L/∂ pma(~Q = ~qi|~E = ~e). For all technicaldetails we refer to (Manhaeve et al., 2018). In order to com-pute the gradients within the SPN, we can use backwardmode automatic differentiation as implemented in standarddeep learning frameworks. This provides the partial deriva-

Sum-Product Logic

tives of SPN outputs with respect to its parameters θ , i.e.,∂ pma(~Q = ~qi|~E =~e)/∂θ . By multiplying the two sets ofgradients, we obtain the partial derivatives of the loss withrespect to the SPN parameters ∂L/∂θ , which we use forgradient descent.

3. Empirical IllustrationsOur intention here is to investigate the benefits of SPLog.To this aim, we implemented everything using PyTorchbased on the SPFlow library (Molina et al., 2019) and theDeepProbLog code (Manhaeve et al., 2018) and ran twoexperiments on the MNIST data set.

Tractable inference of probabilistic circuits withinProbLog. To illustrate the benefit of combining ProbLogand SPNs, we revisited the MNIST Addition experimentfrom (Manhaeve et al., 2018) using SPNs instead of deepneural networks. The idea behind the experiments is toperform the addition of two digits represented by two ran-domly chosen images from the MNIST data set. One queryof the program contains the indices of the two MNIST im-ages as the inputs and the resulting sum as the label. Thecorresponding SPLog program is:

spn(mnist,[X],Y,[0,1,2,...,9]) :: dig(X,Y).add(X,Y,Z):-dig(X,X2),dig(Y,Y2),Z is X2+Y2.

That is, we specify the spAD as: spn(mdig,~X ,~Y ) ::dig(~x,0); . . . ;dig(~x,9), where ~X = x1, . . . ,xk is a vector ofground terms representing the inputs of the SPN for pred-icate dig while 0, . . . ,9 are possible classes of the MNISTdata set, i.e., we specify the spAD for an image of the digit5 as follows:

spn(mdig, 5 , [0, . . . ,9]) :: dig( 5 ,0), . . . ,dig( 5 ,9).

Because SPNs can handle incomplete data, we utilized thisadvantage by making the task more difficult. Namely, itwas performed with only partially available data, i.e., theright half of the pixels were removed from each image. Theresults proved that the model converges well as can bee seenin Figure 1.

End-to-end learning across deep neural networks, prob-abilistic circuits and ProbLog. Even though SPNs resolvethe issues of DeepProbLog we are addressing, the idea ofSPLog is not to replace DNNs. On the contrary, DNNs playan important role in our framework. Using a variationalauto-encoder based on a feed-forward DNN we have run theMNIST addition experiment in the following fashion. Theoriginal images were replaced by the codes produced bythe encoder network of the VAE which was trained jointlywithin our framework. The results are summarized in Fig.1.

Probabilistic Auto-Encoder. Furthermore, instead of mod-eling only the code P(Ci|code(X)) for the ProbLog, the

distribution of the original image and its code can be jointlymodeled with SPNs, i.e. P(Ci|X ,code(X)). This results ina novel auto-encoder equipped with tractable probabilisticmodels, namely Probabilistic Auto-Encoder (PAE). Next tothe AE, the PAE comprises two SPNs, one modeling thejoint distribution of the original image and its code, and theother modeling the code and its reconstructions. Adding thelower bound of the KL divergence (KLD) between thesetwo SPNs into the loss, which is calculated employing theGauss-Hermite quadrature based on the mean-field assump-tion (Hershey & Olsen, 2007), we keep the distributionsof the two SPNs close. Compared to a VAE, the PAE pro-vides several relaxations. Firstly, the isotropic multivariateGaussian assumption made on the code distribution can beextended to a mixture of Gaussians. Secondly, conditionalsamples of the code given an original image can be acquiredby one bottom-up pass and another top-down pass on theSPN, without the re-parameterization trick. Last but notleast, both encoder and decoder have a probabilistic densitymeasure, pointing out how likely the code or a reconstruc-tion meet the training data domain. The main advantageof the PAE lies in the utilization of the SPNs. Even if thegiven data set contains missing and/or noisy data, one cantrain the PAE by sampling the code from the first SPN. Theinitial results are summarized in Fig. 2. As in our first exper-iment, we removed the right half of the image pixels. Onlyin the rows 4, 7 and 8 with MPE completed images (secondcolumn) admit the shape of 0. Whereas the reconstructedimages from the MPE sampled code (third column) yieldbetter results.

4. Conclusion and Future WorkTriggered by the re-emerging research area of Hybrid AI—the computational and mathematical modeling of complexAI systems— in this work we sketched SPLog, a novel deepprobabilistic PL (DPPL), which combines deep probabilisticmodels—SPNs—with neural probabilistic logic program-ming —DeepProbLog. Specifically, SPLog incorporates theadvantages of SPNs into ProbLog and so facilitates PAE. Ithas shown good performance in solving challenging taskson MNIST data set. This paves the way towards a unifyingdeep programming language for System 2 approaches (Ben-gio, 2019).

The performed empirical illustrations open the door to fur-ther advancements, e.g. "reverse" Neural-Symbolic VisualQuestion Answering (NS-VQA) (Yi et al., 2018). Since onecan handle the non-probabilistic, i.e. true/false facts withinProbLog, one can consider the following example. After theMNIST addition model is trained as it is stated in 3, we canask for the visual solution of the given query. We provide anSPLog program that takes as an input query the result of theaddition of two single digits. Utilizing (conditional) sam-

Sum-Product Logic

Figure 1. Loss and accuracy of the addition experiment for (left) codes of full images and (right) partial images. The highest accuracyvalues for the joint model (left) are 74.31% for train (sky blue) and 75.03% for test (olive). For the marginal model (right), we get64.76% for train and 66.03% for test. Note that the accuracy in this task involves the prediction of two images, indicating a classificationaccuracy of more than 80% per image (best in color).

Figure 2. Initial results of PAE. First column represents “marginal-ized” images, second - image completion with MPE, third - recon-structions by decoder from MPE sampled code, the rest - recon-structions by decoder from conditional sampled code.

pling, the framework retrieves the images for all possiblesolutions in accordance with the commutative property of +.The output of such a program, given the query result(10),will then be

result(10) = 1 + 9 = 2 + 8 = . . .= 5 + 5

The Einsum Networks (EiNets) (Peharz et al., 2020) featurethe advancement in scalability of probabilistic circuits. Thisterm was introduced by (Van den Broeck et al., 2019) asthe umbrella term for tractable probabilistic models. TheEiNets are an important step stone towards (sub-)symbolicreasoning on “big” data sets like Celeba (Liu et al., 2015)and SVHN (Netzer et al., 2011) (format 1). We envisiontheir integration in our framework as interesting direction

for scaling up SPLog.

AcknowledgementsThe authors would like to thank Steven Lang for very valu-able discussions. AS and KK acknowledge support by theGerman Federal Ministry of Education and Research andthe Hessian Ministry of Science and the Arts within the Na-tional Research Center for Applied Cybersecurity ATHENE.FV and KK acknowledge the support by the German Re-search Foundation (DFG) as part of the Research TrainingGroup Adaptive Preparation of Information from Hetero-geneous Sources (AIPHES) under grant No. GRK 1994/1.KK also acknowledges the support of the Rhine-Main Uni-versities Network for “Deep Continuous-Discrete MachineLearning” (DeCoDeML) as well as the discussion withinthe AI lighthouse BMWi project SPAICER, 01MK20015E.ZY and KK acknowledge the funding of the German Fed-eral Ministry of Education and Research (BMBF) project“MADESI”, 01IS18043B.

ReferencesBengio, Y. From System 1 Deep Learning to System 2 Deep

Learning. Invited talk NeurIPS, 2019.

Bingham, E., Chen, J. P., Jankowiak, M., Obermeyer, F.,Pradhan, N., Karaletsos, T., Singh, R., Szerlip, P., Hors-fall, P., and Goodman, N. D. Pyro: Deep Universal Prob-abilistic Programming. In Journal of Machine LearningResearch, 2018.

Darwiche, A. SDD: A New Canonical Representation ofPropositional Knowledge Bases. In In Proceedings ofthe Twenty-Second International Joint Conference onArtificial Intelligence (IJCAI), pp. 819–826, 2011.

Ge, H., Xu, K., and Ghahramani, Z. Turing: a language

Sum-Product Logic

for flexible probabilistic inference. In International Con-ference on Artificial Intelligence and Statistics, AISTATS2018, 9-11 April 2018, Playa Blanca, Lanzarote, CanaryIslands, Spain, pp. 1682–1690, 2018. URL http://proceedings.mlr.press/v84/ge18b.html.

Gutmann, B., Jaeger, M., and De Raedt, L. ExtendingProbLog with Continuous Distributions. Inductive LogicProgramming, 2010.

Hershey, J. R. and Olsen, P. A. Approximating the kull-back leibler divergence between gaussian mixture mod-els. In 2007 IEEE International Conference on Acoustics,Speech and Signal Processing - ICASSP ’07, volume 4,pp. IV–317–IV–320, 2007.

Kimmig, A., De Raed, L., and Toivonen, H. ProbLog: Aprobabilistic Prolog and its application in link discovery.In IJCAI, pp. 2462–2467, 2007.

Kimmig, A., Van den Broeck, G., and De Raed, L. AnAlgebraic Prolog for Reasoning about Possible Worlds.In AAAI, 2011.

Kingma, D. P. and Welling, M. Auto-Encoding VariationalBayes. In ICLR, 2014.

Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning faceattributes in the wild. In Proceedings of InternationalConference on Computer Vision (ICCV), December 2015.

Manhaeve, R., Dumancic, S., Kimmig, A., Demeester, T.,and De Raedt, L. DeepProbLog: Neural ProbabilisticLogic Programming. In NeurIPS, pp. 3753–3763, 2018.

Molina, A., Vergari, A., Stelzner, K., Peharz, R., Sub-ramani, P., Di Mauro, N., Poupart, P., and Kersting,K. SPFlow: An Easy and Extensible Library for DeepProbabilistic Learning using Sum-Product Networks.arXiv:1901.03704, 2019.

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., andNg, A. Reading digits in natural images with unsuper-vised feature learning. NIPS, 01 2011.

Peharz, R., Lang, S., Vergari, A., Stelzner, K., Molina,A., Trapp, M., Van den Broeck, G., Kersting, K., andGhahramani, Z. Einsum Networks: Fast and ScalableLearning of Tractable Probabilistic Circuits. In ICML,2020.

Poon, H. and Domingos, P. Sum-Product Networks: A NewDeep Architecture. In UAI, pp. 337–346, 2011.

Rezende, D. J. and Mohamed, S. Variational Inference withNormalizing Flows. Proceedings of the 32nd Interna-tional Conference on Machine Learning, Lille, France,2015. JMLR: W&CP volume 37., 2015.

Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Mur-phy, K., and Blei, D. M. Deep Probabilistic Programming.In ICLR, 2017.

Van den Broeck, G., Di Mauro, N., and Vergari, A. TractableProbabilistic Models: Representations, algorithms, learn-ing, and applications. In Tutorial at UAI, Juli 2019.

Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., and Tenen-baum, J. B. Neural-symbolic vqa: Disentangling rea-soning from vision and language understanding. In Ad-vances in Neural Information Processing Systems, pp.1039–1050, 2018.

http://proceedings.mlr.press/v84/ge18b.html

http://proceedings.mlr.press/v84/ge18b.html

Sum-Product Logic: Integrating Probabilistic Circuits into … · 2021. 7. 19. · Sum-Product...

Documents

Transcript of Sum-Product Logic: Integrating Probabilistic Circuits into … · 2021. 7. 19. · Sum-Product...