Markov Logic: A Representation Language for Natural Language Semantics
description
Transcript of Markov Logic: A Representation Language for Natural Language Semantics
Markov Logic:A Representation Language for
Natural Language Semantics
Pedro DomingosDept. Computer Science & Eng.
University of Washington
(Based on joint work with Stanley Kok,Matt Richardson and Parag Singla)
Overview
Motivation Background Representation Inference Learning Applications Discussion
Motivation
Natural language is characterized by Complex relational structure High uncertainty (ambiguity, imperfect knowledge)
First-order logic handles relational structure Probability handles uncertainty Let’s combine the two
Markov Logic[Richardson & Domingos, 2006]
Syntax: First-order logic + Weights Semantics: Templates for Markov nets Inference: Weighted satisfiability + MCMC Learning: Voted perceptron + ILP
Overview
Motivation Background Representation Inference Learning Applications Discussion
Markov Networks Undirected graphical models
B
DC
A
1( ) ( )c
c
P X XZ
3.7 if A and B
( , ) 2.1 if A and B
0.7 otherwise
2.3 if B and C and D( , , )
5.1 otherwise
A B
B C D
( )cX c
Z X Potential functions defined over cliques
Markov Networks Undirected graphical models
B
DC
A
Potential functions defined over cliques
Weight of Feature i Feature i
exp ( )i iX i
Z w f X
1( ) exp ( )i i
i
P X w f XZ
1 if A and B ( , )
0 otherwise
1 if B and C and D( , , )
0
f A B
f B C D
First-Order Logic
Constants, variables, functions, predicatesE.g.: Anna, X, mother_of(X), friends(X, Y)
Grounding: Replace all variables by constantsE.g.: friends (Anna, Bob)
World (model, interpretation):Assignment of truth values to all ground predicates
Overview
Motivation Background Representation Inference Learning Applications Discussion
Markov Logic Networks
A logical KB is a set of hard constraintson the set of possible worlds
Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible
Give each formula a weight(Higher weight Stronger constraint)
satisfiesit formulas of weightsexp)(worldP
Definition
A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number
Together with a set of constants,it defines a Markov network with One node for each grounding of each predicate in
the MLN One feature for each grounding of each formula F
in the MLN, with the corresponding weight w
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A) Smokes(B)
Cancer(B)
Suppose we have two constants: Anna (A) and Bob (B)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Suppose we have two constants: Anna (A) and Bob (B)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Suppose we have two constants: Anna (A) and Bob (B)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Suppose we have two constants: Anna (A) and Bob (B)
More on MLNs
MLN is template for ground Markov nets Typed variables and constants greatly reduce
size of ground Markov net Functions, existential quantifiers, etc. MLN without variables = Markov network
(subsumes graphical models)
Relation to First-Order Logic
Infinite weights First-order logic Satisfiable KB, positive weights
Satisfying assignments = Modes of distribution MLNs allow contradictions between formulas
Overview
Motivation Background Representation Inference Learning Applications Discussion
MPE/MAP Inference
Find most likely truth values of non-evidence ground atoms given evidence
Apply weighted satisfiability solver(maxes sum of weights of satisfied clauses)
MaxWalkSat algorithm [Kautz et al., 1997] Start with random truth assignment With prob p, flip atom that maxes weight sum;
else flip random atom in unsatisfied clause Repeat n times Restart m times
Conditional Inference
P(Formula|MLN,C) = ? MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms
First construct min subset of network necessary to answer query (generalization of KBMC)
Then apply MCMC (or other)
Ground Network Construction
Initialize Markov net to contain all query preds For each node in network
Add node’s Markov blanket to network Remove any evidence nodes
Repeat until done
Probabilistic Inference Recall
Exact inference is #P-complete Conditioning on Markov blanket is easy:
Gibbs sampling exploits this
exp ( )( | ( ))
exp ( 0) exp ( 1)
i ii
i i i ii i
w f xP x MB x
w f x w f x
1( ) exp ( )i i
i
P X w f XZ
exp ( )i i
X i
Z w f X
Markov Chain Monte Carlo
Gibbs Sampler 1. Start with an initial assignment to nodes
2. One node at a time, sample node given others
3. Repeat
4. Use samples to compute P(X) Apply to ground network Initialization: MaxWalkSat Can use multiple chains
Overview
Motivation Background Representation Inference Learning Applications Discussion
Learning Data is a relational database Closed world assumption (if not: EM) Learning parameters (weights)
Generatively: Pseudo-likelihood Discriminatively: Voted perceptron + MaxWalkSat
Learning structure Generalization of feature induction in Markov nets Learn and/or modify clauses Inductive logic programming with pseudo-
likelihood as the objective function
Generative Weight Learning
Maximize likelihood (or posterior) Use gradient ascent
Requires inference at each step (slow!)
Feature count according to data
Feature count according to model
log ( ) ( ) [ ( )]i Y ii
dP X f X E f Y
dw
Pseudo-Likelihood [Besag, 1975]
Likelihood of each variable given its Markov blanket in the data
Does not require inference at each step Widely used
( ) ( | ( ))x
PL X P x MB x
Most terms not affected by changes in weights After initial setup, each iteration takes
O(# ground predicates x # first-order clauses)
( ) ( 0) ( 0) ( 1) ( 1)i i i ix
nsat x p x nsat x p x nsat x
Optimization
where nsati(x=v) is the number of satisfied groundingsof clause i in the training data when x takes value v
Parameter tying over groundings of same clause Maximize using L-BFGS [Liu & Nocedal, 1989]
),(),()|(log yxnEyxnxyPw iwiwi
Gradient of Conditional Log Likelihood
# true groundings of
formula in DBExpected # of true
groundings – slow!
Approximate expected count by MAP count
Discriminative Weight Learning
Used for discriminative training of HMMs Expected count in gradient approximated by count in
MAP state MAP state found using Viterbi algorithm Weights averaged over all iterations
Voted Perceptron[Collins, 2002]
initialize wi=0 for t=1 to T do find the MAP configuration using Viterbi wi, = * (training count – MAP count) end for
HMM is special case of MLN Expected count in gradient approximated by count in
MAP state MAP state found using MaxWalkSat Weights averaged over all iterations
Voted Perceptron for MLNs[Singla & Domingos, 2004]
initialize wi=0 for t=1 to T do find the MAP configuration using MaxWalkSat wi, = * (training count – MAP count) end for
Overview
Motivation Background Representation Inference Learning Applications Discussion
Applications to Date
Entity resolution (Cora, BibServ) Information extraction for biology
(won LLL-2005 competition) Probabilistic Cyc Link prediction Topic propagation in scientific communities Etc.
Entity Resolution Most logical systems make unique names assumption What if we don’t? Equality predicate: Same(A,B), or A = B Equality axioms
Reflexivity, symmetry, transitivity For every unary predicate P: x1 = x2 => (P(x1) <=> P(x2)) For every binary predicate R:
x1 = x2 y1 = y2 => (R(x1,y1) <=> R(x2,y2)) Etc.
But in Markov logic these are soft and learnable Can also introduce reverse direction:
R(x1,y1) R(x2,y2) x1 = x2 => y1 = y2 Surprisingly, this is all that’s needed
Example: Citation Matching
Markov Logic Formulation: Predicates Are two bibliography records the same?
SameBib(b1,b2) Are two field values the same?
SameAuthor(a1,a2)SameTitle(t1,t2)SameVenue(v1,v2)
How similar are two field strings?Predicates for ranges of cosine TF-IDF score:TitleTFIDF.0(t1,t2) is true iff TF-IDF(t1,t2)=0TitleTFIDF.2(a1,a2) is true iff 0 <TF-IDF(a1,a2) < 0.2Etc.
Markov Logic Formulation: Formulas
Unit clauses (defaults):! SameBib(b1,b2)
Two fields are same => Corresponding bib. records are same:Author(b1,a1) Author(b2,a2) SameAuthor(a1,a2) => SameBib(b1,b2)
Two bib. records are same => Corresponding fields are same:Author(b1,a1) Author(b2,a2) SameBib(b1,b2) => SameAuthor(a1,a2)
High similarity score => Two fields are same:TitleTFIDF.8(t1,t2) =>SameTitle(t1,t2)
Transitive closure (not incorporated in experiments):SameBib(b1,b2) SameBib(b2,b3) => SameBib(b1,b3)
25 predicates, 46 first-order clauses
What Does This Buy You?
Objects are matched collectively Multiple types matched simultaneously Constraints are soft, and strengths can be
learned from data Easy to add further knowledge Constraints can be refined from data Standard approach still embedded
Example
Record Title Author Venue
B1 Object Identification using CRFs Linda Stewart PKDD 04
B2 Object Identification using CRFs Linda Stewart 8th PKDD
B3 Learning Boolean Formulas Bill Johnson PKDD 04
B4 Learning of Boolean Formulas William Johnson 8th PKDD
Subset of a Bibliography Database
Standard Approach [Fellegi & Sunter, 1969]
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
Sim(PKDD 04, 8th PKDD)
Sim(Object Identification using CRFs, Object Identification using CRFs)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Venue
record-match node
field-similarity node (evidence
node)
What’s Missing?
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
Sim(PKDD 04, 8th PKDD)
Sim(Object Identification using CRF, Object Identification using CRF)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Venue
If from b1=b2, you infer that “PKDD 04” is same as “8th PKDD”, how canyou use that to help figure out if b3=b4?
Merging the Evidence Nodes
Author
Still does not solve the problem. Why?
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
Sim(Object Identification using CRFs, Object Identification using CRFs)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Introducing Field-Match Nodes
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
b1.T=b2.T?
b1.V=b2.V?b3.V=b4.V?
b3.A=b4.A?
b3.T=b4.T?
b1.A=b2.A?
Sim(Object Identification using CRFs, Object Identification using CRFs)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
field-match node
Full representation in Collective Model
Sim(PKDD 04, 8th PKDD)
Flow of Information
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
b1.T=b2.T?
b1.V=b2.V?b3.V=b4.V?
b3.A=b4.A?
b3.T=b4.T?
b1.A=b2.A?
Sim(Object Identification using CRFs, Object Identification using CRFs)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Flow of Information
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
b1.T=b2.T?
b1.V=b2.V?b3.V=b4.V?
b3.A=b4.A?
b3.T=b4.T?
b1.A=b2.A?
Sim(Object Identification using CRFs, Object Identification using CRFs)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Flow of Information
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
b1.T=b2.T?
b1.V=b2.V?b3.V=b4.V?
b3.A=b4.A?
b3.T=b4.T?
b1.A=b2.A?
Sim(Object Identification using CRFs, Object Identification using CRFs)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Flow of Information
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
b1.T=b2.T?
b1.V=b2.V?b3.V=b4.V?
b3.A=b4.A?
b3.T=b4.T?
b1.A=b2.A?
Sim(Object Identification using CRF, Object Identification using CRF)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Flow of Information
b1=b2?
Sim(Linda Stewart, Linda Stewart)
b3=b4?
Author
Title
Venue
b1.T=b2.T?
b1.V=b2.V?b3.V=b4.V?
b3.A=b4.A?
b3.T=b4.T?
b1.A=b2.A?
Sim(Object Identification using CRFs, Object Identification using CRFs)
Sim(Bill Johnson, William Johnson)
Title
Author
Sim(Learning Boolean Formulas, Learning of Boolean Expressions)
Sim(PKDD 04, 8th PKDD)
Experiments
Databases: Cora [McCallum et al., IRJ, 2000]:
1295 records, 132 papers BibServ.org [Richardson & Domingos, ISWC-03]:
21,805 records, unknown #papers
Goal: De-duplicate bib.records, authors and venues Pre-processing: Form canopies [McCallum et al, KDD-00 ]
Compared with naïve Bayes (standard method), etc. Measured area under precision-recall curve (AUC) Our approach wins across the board
Results:Matching Venues on Cora
0.771
0.061
0.342
0.096 0.047
0.339 0.339
0
0.25
0.5
0.75
1
MLN(VP) MLN(ML) MLN(PL) KB CL NB BN
System
AUC
Overview
Motivation Background Representation Inference Learning Applications Discussion
Relation to Other Approaches
Representation Logical language
Probabilistic language
Markov logic First-order logic Markov nets
RMNs Conjunctive queries
Markov nets
PRMs Frame systems Bayes nets
KBMC Horn clauses Bayes nets
SLPs Horn clauses Bayes nets
Going Further
First-order logic is not enough We can “Markovize” other representations
in the same way Lots to do
Summary
NLP involves relational structure, uncertainty Markov logic combines first-order logic and
probabilistic graphical models Syntax: First-order logic + Weights Semantics: Templates for Markov networks
Inference: MaxWalkSat + KBMC + MCMC Learning: Voted perceptron + PL + ILP Applications to date: Entity resolution, IE, etc. Software: Alchemy
http://www.cs.washington.edu/ai/alchemy