Learning the Structure of Markov Logic Networks

Learning the Structure of Markov Logic

Networks

Stanley Kok & Pedro Domingos

Dept. of Computer Science and Eng.

University of Washington

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Motivation Statistical Relational Learning (SRL)

combines the benefits of: Statistical Learning: uses probability to handle

uncertainty in a robust and principled way Relational Learning: models domains with

multiple relations

Motivation Many SRL approaches combine a logical

language and Bayesian networks e.g. Probabilistic Relational Models

[Friedman et al., 1999]

The need to avoid cycles in Bayesian networks causes many difficulties [Taskar et al., 2002]

Started using Markov networks instead

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

This paper develops a fast algorithm that learns MLN structure Most powerful SRL learner to date

Markov Logic Networks

First-order KB: set of hard constraints Violate one formula, a world has zero probability

MLNs soften constraints OK to violate formulas The fewer formulas a world violates,

the more probable it is Gives each formula a weight,

reflects how strong a constraint it is

MLN Definition A Markov Logic Network (MLN) is a set of

pairs (F, w) where F is a formula in first-order logic w is a real number

Together with a finite set of constants,it defines a Markov network with One node for each grounding of each predicate

in the MLN One feature for each grounding of each formula F

in the MLN, with the corresponding weight w

Ground Markov Network

Student(STAN)

Professor(PEDRO)

AdvisedBy(STAN,PEDRO)

Professor(STAN)

Student(PEDRO)

AdvisedBy(PEDRO,STAN)

AdvisedBy(STAN,STAN)

AdvisedBy(PEDRO,PEDRO)

AdvisedBy(S,P) ) Student(S) ^ Professor(P)2.7

constants: STAN, PEDRO

MLN Model

Vector of value assignments to ground predicates

MLN Model

Partition function. Sums over all possiblevalue assignments to ground predicates

MLN Model

Weight of ith formula

MLN Model

Weight of ith formula

# of true groundings of ith formula

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

MLN Weight Learning

SLOW#P-complete

MLN Weight Learning

SLOW#P-completeSLOW

#P-complete

MLN Weight Learning R&D used pseudo-likelihood [Besag, 1975]

MLN Structure Learning

R&D “learned” MLN structure in two disjoint steps: Learn first-order clauses with an off-the-shelf

ILP system (CLAUDIEN [De Raedt & Dehaspe, 1997]) Learn clause weights by optimizing

pseudo-likelihood Unlikely to give best results because CLAUDIEN

find clauses that hold with some accuracy/frequency in the data

don’t find clauses that maximize data’s (pseudo-)likelihood

This paper develops an algorithm that: Learns first-order clauses by directly optimizing

pseudo-likelihood Is fast enough Performs better than R&D, pure ILP,

purely KB and purely probabilistic approaches

MLN Structure Learning

Structure Learning Algorithm

High-level algorithmREPEAT

MLN Ã MLN [ FindBestClauses(MLN)UNTIL FindBestClauses(MLN) returns NULL

FindBestClauses(MLN)Create candidate clausesFOR EACH candidate clause c

Compute increase in evaluation measureof adding c to MLN

RETURN k clauses with greatest increase

Structure Learning Evaluation measure Clause construction operators Search strategies Speedup techniques

Evaluation Measure

R&D used pseudo-log-likelihood

This gives undue weight to predicates with large # of groundings

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

weight given to predicate r

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r CLL: conditional log-likelihood

Clause Construction Operators Add a literal (negative/positive) Remove a literal Flip signs of literals Limit # of distinct variables to restrict

search space

Beam Search

Same as that used in ILP & rule induction Repeatedly find the single best clause

Shortest-First Search (SFS)

1. Start from empty or hand-coded MLN2. FOR L Ã 1 TO MAX_LENGTH3. Apply each literal addition & deletion to

each clause to create clauses of length L4. Repeatedly add K best clauses of length L

to the MLN until no clause of length L improves WPLL

Similar to Della Pietra et al. (1997), McCallum (2003)

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS) of adding c to MLN

Compute increase in WPLL (using L-BFGS)of adding c to MLN

SLOWMany candidates

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

SLOWMany candidates

NOT THAT FAST

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Experiments UW-CSE domain

22 predicates, e.g., AdvisedBy(X,Y), Student(X), etc. 10 types, e.g., Person, Course, Quarter, etc. # ground predicates ¼ 4 million # true ground predicates ¼ 3000 Handcrafted KB with 94 formulas

Each student has at most one advisor If a student is an author of a paper, so is her advisor

Cora domain Computer science research papers Collective deduplication of author, venue, title

Systems

MLN(SLB): structure learning with beam searchMLN(SLS): structure learning with SFS

Systems

MLN(SLB) MLN(SLS)

KB: hand-coded KBCL: CLAUDIENFO: FOILAL: Aleph

Systems

MLN(SLB) MLN(SLS)

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Systems

MLN(SLB) MLN(SLS)

NB: Naïve Bayes

BN: Bayesian

networks

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Methodology UW-CSE domain

DB divided into 5 areas: AI, Graphics, Languages, Systems, Theory

Leave-one-out testing by area Measured

average CLL of the ground predicates average area under the precision-recall curve

of the ground predicates (AUC)

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.390 0.397

UW-CSE-0.061 -0.088

-0.370

-0.166

UW-CSE

Timing MLN(SLS) on UW-CSE

Cluster of 15 dual-CPUs 2.8 GHz Pentium 4 machines

Without speedups: did not finish in 24 hrs With speedups: 5.3 hrs

8.46.5

Lesion Study Disable one speedup technique at a time; SFS

UW-CSE (one-fold)

all speedups

no clausesampling

no predicatesampling

don’t avoidredundancy

no looseconverg.

threshold

no weight thresholding

Future Work Speed up counting of # true

groundings of clause Probabilistically bound the loss in

accuracy due to subsampling Probabilistic predicate discovery

Conclusion Markov logic networks: a powerful combination

of first-order logic and probability Richardson & Domingos (2004) did not learn

MLN structure We develop an algorithm that automatically learns

both first-order clauses and their weights We develop speedup techniques to make our

algorithm fast enough to be practical We show experimentally that our algorithm

outperforms Richardson & Domingos Pure ILP Purely KB approaches Purely probabilistic approaches

(For software, email: koks@cs.washington.edu)

Learning the Structure of Markov Logic Networks

Documents

Transcript of Learning the Structure of Markov Logic Networks

Markov Logic Networks

Markov Logic Networks: A Step Towards a Unified Theory of Learning and Cognition

Learning the Structure of Markov Logic Networks

Chapter 11: Markov Logic Networks - Machine Learning

Improving Learning of Markov Logic Networks using · PDF fileImproving Learning of Markov Logic Networks ... Our proposed future work directions include testing BUSL in additional

Human Detection under Partial Occlusions using Markov Logic Networks

Markov Logic Networks.pdf

Discriminative Training of Markov Logic Networks

Efficient Weight Learning for Markov Logic Networks

Markov Logic: Combining Logic and Probability

Discriminative Structure and Parameter Learning for Markov Logic Networks

Logical and Probabilistic Knowledge …people.scs.carleton.ca/~bertossi/talks/semCogSci.pdfMarkov Logic Networks: (MLNs) [30, 10] • MLNs combine FO logic and Markov Networks (MNs)

Discriminative Learning with Markov Logic · PDF fileDiscriminative Learning with Markov Logic Networks ... and traditional ILP systems in term of predictive accuracy, ... maintaining

An Introduction to Markov Logic Networks in Knowledge Bases

1 Learning the Structure of Markov Logic Networks Stanley Kok.

Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u0866707.

Applications of Markov Logic

10-803 Markov Logic Networks Instructor: Pedro Domingos.

2011 Dierkes Markov Logic Networks

Markov Logic Networks - UMass Amherst · models, graphical models, rst-order logic, satisability , inductive logic programming, know-ledge-based model construction, Markov chain Monte