Extending Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu V. Raghavan...
-
Upload
richard-moore -
Category
Documents
-
view
214 -
download
0
Transcript of Extending Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu V. Raghavan...
Extending Bayesian Logic Programsfor Plan Recognition and Machine Reading
Sindhu V. Raghavan
Advisor: Raymond Mooney
PhD ProposalMay 12th, 2011
1
$ cd my-dir$ cp test1.txt test-dir$ rm test1.txt
$ cd my-dir$ cp test1.txt test-dir$ rm test1.txtWhat task is the user performing?move-file
Which files and directories are involved?test1.tx and test-dir
Plan Recognition in Intelligent User Interfaces
2
Can the task be performed more efficiently?
Data is relational in nature - several files and directories and several relations between them
Characteristics of Real World Data
Relational or structured dataSeveral entities in the domainSeveral relations between entitiesDo not always follow the i.i.d assumption
Presence of noise or uncertaintyUncertainty in the types of entitiesUncertainty in the relations
3
Traditional approaches like first-order logic or probabilistic models can handle either structured data or uncertainty, but not both.
Statistical Relational Learning (SRL)
Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007]
– Combines strengths of both first-order logic and probabilistic models
SRL formalisms– Stochastic Logic Programs (SLPs) [Muggleton, 1996]
– Probabilistic Relational Models (PRMs) [Friedman et al., 1999]
– Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] – Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006]
4
Statistical Relational Learning (SRL)
Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007]
– Combines strengths of both first-order logic and probabilistic models
SRL formalisms– Stochastic Logic Programs (SLPs) [Muggleton, 1996]
– Probabilistic Relational Models (PRMs) [Friedman et al., 1999]
– Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] – Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006]
5
Bayesian Logic Programs (BLPs)
Integrate first-order logic and Bayesian networks
Why BLPs?Efficient grounding mechanism that includes only those
variables that are relevant to the queryEasy to extend by incorporating any type of logical
inference to construct networksWell suited for capturing causal relations in data
6
Objectives
7
BLPs for Plan RecognitionBLPs for Plan Recognition
BLPs for Machine ReadingBLPs for Machine Reading
Objectives
8
Plan recognition involves predicting the top-level plan of an agent based on its actions
Plan recognition involves predicting the top-level plan of an agent based on its actions
BLPs for Machine ReadingBLPs for Machine Reading
Objectives
9
BLPs for Plan RecognitionBLPs for Plan Recognition
Machine Reading involves automatic extraction of knowledge from natural language text
Machine Reading involves automatic extraction of knowledge from natural language text
Outline
MotivationBackground
First-order logicLogical AbductionBayesian Logic Programs (BLPs)
Completed WorkPart 1 – Extending BLPs for Plan RecognitionPart 2 – Extending BLPs for Machine Reading
Proposed WorkConclusions
10
First-order LogicTerms
Constants – individual entities like anna, bobVariables – placeholders for objects like X, Y
Predicates Relations over entities like worksFor, capitalOf
Literal – predicate or its negation applied to terms Atom – Positive literal like worksFor(X,Y)Ground literal – literal with no variables like
worksFor(anna,bob)
Clause – disjunction of literalsHorn clause has at most one positive literalDefinite clause has exactly one positive literal
11
First-order Logic
QuantifiersUniversal quantification - true for all objects in the domain
Existential quantification - true for some objects in the domain
Logical InferenceForward Chaining– For every implication pq, if p is true,
then q is concluded to be trueBackward Chaining – For a query literal q, if an implication
pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p
12
€
∀( )
€
∃( )
Logical AbductionAbduction
Process of finding the best explanation for a set of observations
GivenBackground knowledge, B, in the form of a set of (Horn) clauses in
first-order logicObservations, O, in the form of atomic facts in first-order logic
FindA hypothesis, H, a set of assumptions (atomic facts) that logically
entail the observations given the theory:
B H OBest explanation is the one with the fewest assumptions
13
Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001]
Set of Bayesian clauses a|a1,a2,....,an
Definite clauses that are universally quantifiedHead of the clause - aBody of the clause - a1, a2, …, an
Range-restricted, i.e variables{head} variables{body}Associated conditional probability table (CPT)
o P(head|body)
Bayesian predicates a, a1, a2, …, an have finite domainsCombining rule like noisy-or for mapping multiple CPTs into
a single CPT.14
€
⊆
Logical Inference in BLPs SLD Resolution
15
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
16
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
17
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
18
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
19
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
20
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
21
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
lives(james,Y) tornado(Y)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
22
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
lives(james,Y) tornado(Y)
lives(james,yorkshire)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
23
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
lives(james,Y) tornado(Y)
lives(james,yorkshire)
tornado(yorkshire)
Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution
24
BLP
lives(james,yorkshire).lives(stefan,freiburg).neighborhood(james).tornado(yorkshire).
burglary(X) | neighborhood(X).alarm(X) | burglary(X).alarm(X) | lives(X,Y), tornado(Y).
Queryalarm(james)
Proof
alarm(james)
burglary(james)
neighborhood(james)
lives(james,Y) tornado(Y)
lives(james,yorkshire)
tornado(yorkshire)
Example from Ngo and Haddawy, 1997
Bayesian Network Construction
25
alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
Each ground atom becomes a node (random variable) in the Bayesian network
Edges are added from ground atoms in the body of a clause to the ground atom in the head
Specify probabilistic parameters using the CPTs associated with Bayesian clauses
Bayesian Network Construction
26
alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
Each ground atom becomes a node (random variable) in the Bayesian network
Edges are added from ground atoms in the body of a clause to the ground atom in the head
Specify probabilistic parameters using the CPTs associated with Bayesian clauses
Bayesian Network Construction
27
alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
Each ground atom becomes a node (random variable) in the Bayesian network
Edges are added from ground atoms in the body of a clause to the ground atom in the head
Specify probabilistic parameters using the CPTs associated with Bayesian clauses
Bayesian Network Construction
28
alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
Each ground atom becomes a node (random variable) in the Bayesian network
Edges are added from ground atoms in the body of a clause to the ground atom in the head
Specify probabilistic parameters using the CPTs associated with Bayesian clauses
Use combining rule to combine multiple CPTs into a single CPT
lives(stefan,freiburg)✖
Probabilistic Inference and Learning
Probabilistic inferenceMarginal probability given evidenceMost Probable Explanation (MPE) given evidence
Learning [Kersting and De Raedt, 2008]
Parameterso Expectation Maximizationo Gradient-ascent based learning
Structureo Hill climbing search through the space of possible
structures
29
Part 1Extending BLPs for Plan Recognition
[Raghavan and Mooney, 2010]
30
Plan RecognitionPredict an agent’s top-level plans based on the observed
actionsAbductive reasoning involving inference of cause from
effectApplications
Story Understandingo Recognize character’s motives or plans based on its actions to
answer questions about the story
Strategic planningo Predict other agents’ plans so as to work co-operatively
Intelligent User Interfaceso Predict the task that the user is performing so as to provide
valuable tips to perform the task more efficiently
31
Related Work
First-order logic based approaches [Kautz and Allen, 1986; Ng and Mooney, 1992]
Knowledge base of plans and actions Default reasoning or logical abduction to predict the best plan
based on the observed actionsUnable to handle uncertainty in data or estimate likelihood of
alternative plans
Probabilistic graphical models [Charniak and Goldman, 1989; Huber et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005]
Encode the domain knowledge using Bayesian networks, abstract hidden Markov models, or statistical n-gram models
Unable to handle relational/structured data
32
Related WorkMarkov Logic based approaches [Kate and Mooney, 2009; Singla and
Mooney, 2011]
Logical inference in MLNs is deductive in nature MLN-PC [Kate and Mooney, 2009]
o Add reverse implications to handle abductive reasoningo Does not scale to large domains
MLN-HC [Singla and Mooney, 2011]
o Improves over MLN-PC by adding hidden causeso Still does not scale to large domains
MLN-HCAM [Singla and Mooney, 2011] uses logical abduction to constrain groundingo Incorporates ideas from the BLP approach for plan recognition
[Raghavan and Mooney, 2010]
o MLN-HCAM performs better than MLN-PC and MLN-HC
33
BLPs for Plan Recognition
Why BLPs ?Directed model captures causal relationships wellEfficient grounding process results in smaller networks, unlike
in MLNs
SLD resolution is deductive inference, used for predicting observed actions from top-level plans
Plan recognition is abductive in nature and involves predicting the top-level plan from observed actions
34
BLPs cannot be used as is for plan recognition
Extending BLPs for Plan Recognition
35
BLPsBLPs Logical Abduction
Logical Abduction
BALPsBALPs
BALPs – Bayesian Abductive Logic Programs
Logical Abduction in BALPs
Given A set of observation literals O = {O1, O2,….On} and a
knowledge base KB
Compute a set abductive proofs of O using Stickel’s abduction algorithm [Stickel 1988]
Backchain on each Oi until it is proved or assumed
A literal is said to be proved if it unifies with a fact or the head of some rule in KB, otherwise it is said to be assumed
Construct a Bayesian network using the resulting set of proofs as in BLPs.
36
Example – Intelligent User InterfacesTop-level plans predicates
copy-file, move-file, remove-file
Actions predicatescp, rm
Knowledge Base (KB)cp(filename,destdir) | copy-file(filename,destdir)cp(filename,destdir) | move-file(filename,destdir) rm(filename) | move-file(filename,destdir) rm(filename) | remove-file(filename)
Observed actionscp(Test1.txt, Mydir) rm(Test1.txt)
37
Abductive Inference
38
copy-file(Test1.txt,Mydir)
cp(Test1.txt,Mydir)
cp(filename,destdir) | copy-file(filename,destdir)
Assumed literalAssumed literal
Abductive Inference
39
copy-file(Test1.txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1.txt,Mydir)
cp(filename,destdir) | move-file(filename,destdir)
Assumed literalAssumed literal
Abductive Inference
40
copy-file(Test1.txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1.txt,Mydir)
rm(filename) | move-file(filename,destdir)
rm(Test1.txt)
Match existing assumptionMatch existing assumption
Abductive Inference
41
copy-file(Test1.txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1.txt,Mydir)
rm(filename) | remove-file(filename)
rm(Test1.txt)
remove-file(Test1)
Assumed literalAssumed literal
Structure of Bayesian network
42
copy-file(Test1.txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1.txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
Probabilistic Inference
Specifying probabilistic parametersNoisy-and
o Specify the CPT for combining the evidence from conjuncts in the body of the clause
Noisy-oro Specify the CPT for combining the evidence from
disjunctive contributions from different ground clauses with the same head
o Models “explaining away”Noisy-and and noisy-or models reduce the number of
parameters learned from data
43
Probabilistic Inference
44
copy-file(Test1,txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1,txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
4 parameters
4 parameters
Probabilistic Inference
45
copy-file(Test1,txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1,txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
2 parameters 2 parameters
θ1 θ2 θ3θ4
Noisy models require parameters linear in the number of parents
Probabilistic Inference
Most Probable Explanation (MPE)For multiple plans, compute MPE, the most likely
combination of truth values to all unknown literals given this evidence
Marginal ProbabilityFor single top level plan prediction, compute marginal
probability for all instances of plan predicate and pick the instance with maximum probability
When exact inference is intractable, SampleSearch [Gogate
and Dechter, 2007], an approximate inference algorithm for graphical models with deterministic constraints is used
46
Probabilistic Inference
47
copy-file(Test1,txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1,txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
Noisy-or Noisy-or
Probabilistic Inference
48
copy-file(Test1,txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1,txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
Noisy-or Noisy-or
Evidence
Probabilistic Inference
49
copy-file(Test1,txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1,txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
Noisy-or Noisy-or
Evidence
Query variables
Probabilistic Inference
50
copy-file(Test1,txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1,txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
Noisy-or Noisy-or
Evidence
Query variablesTRUE FALSEFALSE
MPE
Probabilistic Inference
51
copy-file(Test1,txt,Mydir)
cp(Test1.txt,Mydir)
move-file(Test1,txt,Mydir)
rm(Test1.txt)
remove-file(Test1)
Noisy-or Noisy-or
Evidence
Query variablesTRUE FALSEFALSE
MPE
Parameter Learning
Learn noisy-or/noisy-and parameters using the EM algorithm adapted for BLPs [Kersting and De Raedt, 2008]
Partial observability In plan recognition domain, data is partially observableEvidence is present only for observed actions and top-level
plans; sub-goals, noisy-or, and noisy-and nodes are not observed
Simplify learning problemLearn noisy-or parameters onlyUsed logical-and instead of noisy-and to combine evidence
from conjuncts in the body of a clause
52
Experimental Evaluation
Monroe (Strategic planning)
Linux (Intelligent user interfaces)
Story Understanding (Story understanding)
53
Monroe and Linux [Blaylock and Allen, 2005]
TaskMonroe involves recognizing top level plans in an
emergency response domain (artificially generated using HTN planner)
Linux involves recognizing top-level plans based on linux commands
Single correct plan in each example
Data
54
No. examples
Avg. observations/ example
Total top-level plan predicates
Total observed action predicates
Monroe 1000 10.19 10 30
Linux 457 6.1 19 43
Monroe and Linux
MethodologyManually encoded the knowledge base Learned noisy-or parameters using EMComputed marginal probability for plan instances
Systems comparedBALPsMLN-HCAM [Singla and Mooney, 2011]
o MLN-PC and MLN-HC do not run on Monroe and Linux due to scaling issues
Blaylock and Allen’s system [Blaylock and Allen, 2005]
Performance metricConvergence score - measures the fraction of examples for
which the plan schema was predicted correctly
55
Results on Monroe
56
94.2 *
Co
nve
rgen
ce S
core
BALPs MLN-HCAM Blaylock & Allen
* - Differences are statistically significant wrt BALPs
Results on Linux
57
Co
nve
rgen
ce S
core
BALPs MLN-HCAM Blaylock & Allen
36.1 *
* - Differences are statistically significant wrt BALPs
Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992]
TaskRecognize character’s top level plans based on actions
described in narrative textMultiple top-level plans in each example
Data25 examples in development set and 25 examples in test
set12.6 observations per example8 top-level plan predicates
58
Story UnderstandingMethodology
Knowledge base was created for ACCEL [Ng and Mooney, 1992]
Parameters set manuallyo Insufficient number of examples in the development set to learn
parameters
Computed MPE to get the best set of plans
Systems comparedBALPsMLN-HCAM [Singla and Mooney, 2011]
o Best performing MLN model
ACCEL-Simplicity [Ng and Mooney, 1992]
ACCEL-Coherence [Ng and Mooney, 1992]
o Specific for Story Understanding
59
Results on Story Understanding
60* - Differences are statistically significant wrt BALPs
Results on Story Understanding
61* - Differences are statistically significant wrt BALPs
* *
Summary of BALPs for Plan Recognition
Extend BLPs for plan recognition by employing logical abduction to construct Bayesian networks
Automatic learning of model parameters using EM
Empirical results on all benchmark datasets demonstrate advantages over existing methods
62
Part 2Extending BLPs for Machine Reading
63
Machine Reading
Machine reading involves automatic extraction of knowledge from natural language text
Information extraction (IE) systems extract factual information like entities and relations between entities that occur in text [Cohen, 1999; Craven et al., 2000; Bunescu and Mooney, 2007; Etzioni et al, 2008]
Extracted information can be used for answering queries automatically
64
Limitations of IE Systems
Extract information that is stated explicitly in text Natural language text is not necessarily completeCommonsense information is not explicitly stated in textWell known facts are omitted from the text
Missing information cannot be inferred from textHuman brain performs deeper inference using
commonsense knowledge IE systems have no access to commonsense knowledge
Errors in extraction process result in some facts not being extracted
65
Our Objective
Improve performance of IELearn general knowledge or commonsense information in
the form of rules using the facts extracted by the IE system
Infer additional facts that are not stated explicitly in text using the learned rules
66
Example
67
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’
Example
68
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Example
69
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
Query
isLedBy(X,Y)
Query
isLedBy(X,Y)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Example
70
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahatir-mohamad)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahatir-mohamad)
Query
isLedBy(X,Y)
Query
isLedBy(X,Y)X = malaysian
Y = mahathir-mohamadX = malaysian
Y = mahathir-mohamad
Example
71
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
Query
citizenOf(mahathir-mohamad,Y)
Query
citizenOf(mahathir-mohamad,Y)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Example
72
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor."
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahatir-mohamad)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahatir-mohamad)
Query
citizenOf(mahathir-mohamad,Y)
Query
citizenOf(mahathir-mohamad,Y)
Y = ?Y = ?
Related Work
Learning propositional rules [Nahm and Mooney, 2000]
Learn propositional rules from the output of an IE system on computer-related job postings
Perform deductive inference to infer new facts
Learning first-order rules [Carlson at el., 2010, Schoenmackers et al., 2010; Doppa et al., 2010]
Carlson et al. and Doppa et al. modify existing rule learners like FOIL and FARMER to learn probabilistic ruleso Perform deductive inference to infer additional facts
Schoenmackers et al. develop a new first-order rule learner based on statistical relevanceo Use MLN based approach to infer additional facts
73
Our Approach
Use an off-the shelf IE system to extract factsLearn commonsense knowledge from the extracted
facts in the form of first-order-rulesInfer additional facts based on the learned rules
using BLPsPure logical deduction results in inferring a large number of
facts Inference using BLPs is probabilistic in nature, i.e inferred facts
are assigned probabilities. Probabilities can be used to filter out facts with high confidence from the rest
Efficient grounding mechanism in BLPs enables our approach to scale to natural language text
74
Extending BLPs for Machine Reading
SLD resolution does not result in inference of new factsGiven a query, SLD resolution will generated deductive
proofs that prove the query
Employ forward chaining to infer additional factsForward chain on extracted facts to infer new facts
Use the deductive proofs from forward chaining to construct Bayesian network for probabilistic inference
75
Learning First-order Rules
76
“Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA…….’’
“Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA…….’’
Extracted facts
nationState(USA)Person(BarackObama)isLedBy(USA,BarackObama)hasBirthPlace(BarackObama,USA)citizenOf(BarackObama,USA)
Extracted facts
nationState(USA)Person(BarackObama)isLedBy(USA,BarackObama)hasBirthPlace(BarackObama,USA)citizenOf(BarackObama,USA)
Learning Patterns from Extracted Facts
nationState(USA) ∧ isLedBy(USA,BarackObama)
citizenOf(BarackObama,USA)
nationState(USA) ∧ isLedBy(USA,BarackObama)
hasBirthPlace(BarackObama,USA)
hasBirthPlace(BarackObama,USA)
citizenOf(BarackObama,USA)
.
.
.
.
77
Generalizing Patterns to Rules
78
nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y)
nationState(Y) ∧ isLedBy(Y,X) hasBirthPlace(X,Y)
hasBirthPlace(X,Y) citizenOf(X,Y)
nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y)
nationState(Y) ∧ isLedBy(Y,X) hasBirthPlace(X,Y)
hasBirthPlace(X,Y) citizenOf(X,Y)
Inductive Logic Programming (ILP)[Muggleton, 1992]
79
ILP Rule Learner
ILP Rule Learner
Target relationcitizenOf(X,Y)
Positive instancescitizenOf(BarackObama, USA)citizenOf(GeorgeBush, USA)citizenOf(IndiraGandhi,India)
.
.
Negative instancescitizenOf(BarackObama, India)citizenOf(GeorgeBush, India)citizenOf(IndiraGandhi,USA)
.
.
KBhasBirthPlace(BarackObama,USA)person(BarackObama)nationState(USA)nationState(India)
.
.
RulesnationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y)
..
RulesnationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y)
..
Example
80
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Extracted facts
nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
Learned rule
nationState(B) ∧ isLedBy(B,A) citizenOf(A,B)
Learned rule
nationState(B) ∧ isLedBy(B,A) citizenOf(A,B)
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’
“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’
Inference of Additional Facts
81
nationState(B) ∧ isLedBy(B,A) citizenOf(A,B)nationState(B) ∧ isLedBy(B,A) citizenOf(A,B)
nationState(malaysian)nationState(malaysian) isLedBy(malaysian,mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)
citizenOf(mahathir-mohamad, malaysian)citizenOf(mahathir-mohamad, malaysian)
Experimental Evaluation
DataDARPA’s intelligent community (IC) data setConsists of news articles on politics, terrorism, and others
international events2000 documents, split into training and test in the ratio 9:1Approximately 30,000 sentences
Information Extraction Information extraction using SIRE, IBM’s information
extraction system [Florin et al., 2004]
82
Experimental EvaluationLearning first-order rules using LIME [McCreath and Sharma,
1998]
Learns rules from only positive instancesHandles noise in data
ModelsBLP-Pos-Only
o Learn rules using only positive instancesBLP-Pos-Neg
o Learn rules using both positive and negative instanceso Apply closed world assumption to automatically generate
negative instancesBLP-All
o Includes rules learned from BLP-Pos-Only and BLP-Pos-Neg
83
Experimental Evaluation
Specifying probabilistic parametersManually specified parametersLogical-and model to combine evidence from conjuncts in
the body of the clauseSet noisy-or parameters to 0.9Set priors to maximum likelihood estimates
84
Experimental EvaluationMethodology
Learned rules for 13 relations including hasCitizenship, hasMember, attendedSchool
Baseline o Perform pure logical deduction to infer all possible
additional factsBLP-0.9
o Facts inferred using the BLP approach with marginal probability greater or equal to 0.9
BLP-0.95o Facts inferred using the BLP approach with marginal
probability greater or equal to 0.95
85
Performance Evaluation
Precision Evaluate if the inferred facts are correct, i.e if they can be
inferred from the documento No ground truth information availableo Perform manual evaluationo Sample 20 documents randomly from the test set and
evaluate inferred facts manually for all models
86
Precision (preliminary results)
Model BLP-ALL BLP-Pos-Neg BLP-Pos-Only
Baseline – Purely logical approach [No. facts inferred]
3813 75 1849
Precision 24.94 16.00 31.80
87
Precision (preliminary results)
Model BLP-ALL BLP-Pos-Neg BLP-Pos-Only
Baseline – Purely logical approach [No. facts inferred]
3813 75 1849
Precision 24.94 16.00 31.80
BLP-0.9 [No. facts inferred] 1208 66 1158
Precision 34.68 13.63 35.49
88
Precision (preliminary results)
Model BLP-ALL BLP-Pos-Neg BLP-Pos-Only
Baseline – Purely logical approach [No. facts inferred]
3813 75 1849
Precision 24.94 16.00 31.80
BLP-0.9 [No. facts inferred] 1208 66 1158Precision 34.68 13.63 35.49
BLP-0.95 [No. of facts inferred] 56 1 0Precision 91.07 100 Nil
89
Performance Evaluation
Estimated recallFor each target relation, eliminate instances of it from the
extracted facts in the test setTry to infer the eliminated instances correctly based on the
remaining facts using the BLP approach o For each threshold point (0.1 to 1), compute fraction of
eliminated instances that were inferred correctlyo True recall cannot be calculated because of lack of
ground truth
90
Estimated Recall (preliminary results)
91
Est
imat
ed r
ecal
l
Confidence threshold
Estimated Recall (preliminary results)
92
Est
imat
ed r
ecal
l
Confidence threshold
Estimated Recall (preliminary results)
93
Est
imat
ed r
ecal
l
Confidence threshold
Estimated Recall (preliminary results)
94
Est
imat
ed r
ecal
l
Confidence threshold
First-order Rules from LIME
politicalParty(A) ^ employs(A,B) hasMemberPerson(A,B)
building(B) ^ eventLocation(A,B) ^ bombing(A) thingPhysicallyDamaged(A,B)
employs(A,B) hasMember(A,B)
citizenOf(A,B) hasCitizenship(A,B)
nationState(B) ^ person(A) ^ employs(B,A) hasBirthPlace(A,B)
95
Proposed WorkLearning parameters automatically from data
EM or gradient-ascent based algorithm adapted for BLPs by Kersting and De Raedt [2008]
EvaluationHuman evaluation for inferred facts using Amazon
Mechanical Turk [Callison-Burch and Dredze, 2010]
Use inferred facts to answer queries constructed for evaluation in the DARPA sponsored machine reading project
Comparison to existing approaches like MLNs
Develop a structure learner for incomplete and uncertain data
96
Future ExtensionsMultiple predicate learning
Learn rules for multiple relations at the same time
Discriminative parameter learning for BLPsEM and gradient-ascent based algorithms for learning BLP
parameters optimize likelihood of the dataNo algorithm that learns parameters discriminatively for
BLPs has been developed to date
Comparison of BALPs to other probabilistic logics for plan recognitionComparison to PRISM [Sato, 1995], Poole’s Horn Abduction
[Poole, 1993], Abductive Stochastic Logic Programs [Tammadoni-Nezhad et al., 2006]
97
Conclusions
Extended BLPs to abductive plan recognitionEnhance BLPs with logical abduction, resulting in
Bayesian Abductive Logic Programs (BALPs)BALPs outperform state-of-the-art approaches to plan
recognition on several benchmark data sets
Extended BLPs to machine readingPreliminary results on the IC data set are encouragingPerform an extensive evaluation in immediate futureDevelop a first-order rule learner that learns from
incomplete and uncertain data in future
98
Questions
99
Backup
100
Timeline Additional experiments comparing MLN-HCAM and
BALPs for plan recognition [Target completion – August 2011] Use Sample Search to perform inference in MLN-HCAM
Evaluation of BLPs for Machine Reading [Target completion – November 2011]
Learn parameters automatically Evaluation using Amazon Mechanical Turk Evaluation of BLPs for answering queries Comparison of BLPs with MLNs for machine reading Developing structure learner from positive and uncertain
examples [Target completion – May 2012]
Proposed defense and graduation [August 2012] 101
Completeness in First-order Logic
Completeness - If a sentence is entailed by a KB, then it is possible to find the proof that entails it
Entailment in first-order logic is semidecidable, i.e it is not possible to know if a sentence is entailed by a KB or not
Resolution is complete in first-order logic If a set of sentences is unsatisfiable, then it is possible to
find a contradiction
102
Forward chaining
For every implication pq, if p is true, then q is concluded to be true
Results in addition of a new fact to KBEfficient, but incompleteInference can explode and forward chaining may
never terminateAddition of new facts might result in rules being satisfied
It is data-driven, not goal-drivenMight result in irrelevant conclusions
103
Backward chaining
For a query literal q, if an implication pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p
Efficient, but not completeMay never terminate, might get stuck in infinite
loopExponentialGoal-driven
104
Herbrand Model Semantics
Herbrand universeAll constants in the domain
Herbrand baseAll ground atoms atoms over Herbrand universe
Herbrand interpretationA set of ground atoms from Herbrand base that are true
Herbrand modelHerbrand interpretation that satisfies all clauses in the
knowledge base
105
Advantages of SRL models over vanilla probabilistic models
Compactly represent domain knowledge in first-order logic
Employ logical inference to construct ground networks
Enables parameter sharing
106
Parameter sharing in SRL
107
father(john)father(john)
parent(john)parent(john)
father(mary)father(mary)
parent(mary)parent(mary)
father(alice)father(alice)
parent(alice)parent(alice)
dummydummy
θ1θ1 θ2θ2 θ3θ3
Parameter sharing in SRL
father(X) parent(X)
108
father(john)father(john)
parent(john)parent(john)
father(mary)father(mary)
parent(mary)parent(mary)
father(alice)father(alice)
parent(alice)parent(alice)
dummydummy
θθ θθ θθ
θθ
Noisy-and Model
Several causes ci have to occur simultaneously if event e has to occur
ci fails to trigger e with probability pi
inh accounts for some unknown cause due to which e has failed to trigger
P(e) = (I – inh) Πi(1-pi)^(1-ci)
109
Noisy-or Model
Several causes ci cause event e has to occur
ci independently triggers e with probability pi
leak accounts for some unknown cause due to which e has triggered
P(e) = 1 – [(I – inh) Πi (1-pi)^(1-ci)]
Models explaining away If there are several causes of an event, and if there is
evidence for one of the causes, then the probability that the other causes have caused the event goes down
110
Noisy-and And Noisy-or Models
111
alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
Noisy-and And Noisy-or Models
112alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
dummy1 dummy2
Noisy/logical-and
Noisy/logical-andNoisy/logical-and
Noisy-or
Inference in BLPs[Kersting and De Raedt, 2001]
Logical inferenceGiven a BLP and a query, SLD resolution is used to construct
proofs for the query
Bayesian network constructionEach ground atom is a random variableEdges are added from ground atoms in the body to the ground
atom in headCPTs specified by the conditional probability distribution for the
corresponding clauseP(X) = P(Xi | Pa(Xi))
Probabilistic inferenceMarginal probability given evidenceMost Probable Explanation (MPE) given evidence
113€
i
∏
Learning in BLPs[Kersting and De Raedt, 2008]
Parameter learningExpectation Maximization Gradient-ascent based learningBoth approaches optimize likelihood
Structure learningHill climbing search through the space of possible
structures Initial structure obtained from CLAUDIEN [De Raedt and
Dehaspe, 1997]
Learns from only positive examples
114
Expectation Maximization for BLPs/BALPs
• Perform logical inference to construct a ground Bayesian network for each example
• Let r denote rule, X denote a node, and Pa(X) denote parents of X
• E Step
• The inner sum is over all groundings of rule r
• M Step
115
*
*
* From SRL tutorial at ECML 07 115
Decomposable Combining Rules
Express different influences using separate nodes
These nodes can be combined using a deterministic function
116
Combining Rules
117alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
dummy1 dummy2
Logical-and
Logical-andLogical-and
Noisy-or
Decomposable Combining Rules
118alarm(james)
burglary(james)
neighborhood(james)
lives(james,yorkshire) tornado(yorkshire)
dummy1 dummy2
Logical-and
Logical-andLogical-and
Noisy-or Noisy-or
dummy1-new dummy2 -new
Logical-or
BLPs vs. PLPs
Differences in representation In BLPs, Bayesian atoms take finite set of values, but in
PLPs, each atom is logical in nature and it can take true or false
Instead of having neighborhood(x) = bad, in PLPs, we have neighborhood(x,bad)
To compute probability of a query alarm(james), PLPs have to construct one proof tree for all possible values for all predicates
Inference is cumbersome
BLPs subsume PLPs
119
BLPs vs. Poole's Horn Abduction
Differences in representationFor example, if P(x) and R(x) are two competing
hypothesis, then either P(x) could be true or R(x) could be true
Prior probabilities of P(x) and R(x) should sum to 1 Restrictions of these kind are are not these in BLPsPLPs and hence BLPs are more flexible and have a richer
representation
120
BLPs vs. PRMs
BLPs subsume PRMsPRMs use entity-relationship models to represent
knowledge and they use KBMC-like construction to construct a ground Bayesian networkEach attribute becomes a random variable in the ground
network and relations over the entities are logical constraints In BLP, each attribute becomes a Bayesian atom and
relations become logical atoms Aggregator functions can be transformed into combining
rules
121
BLPs vs. RBNs
BLPs subsume RBNs In RBNs, each node in BN is a predicate and
probability formulae are used to specify probabilitiesCombining rules can be used to represent these
probability formulae in BLPs.
122
BALPs vs. BLPs
Like BLPs, BALPs use logic programs as templates for constructing Bayesian networks
Unlike BLPs, BALPs uses logical abduction instead of deduction to construct the network
123
Monroe [Blaylock and Allen, 2005]
TaskRecognize top level plans in an emergency response
domain (artificially generated using HTN planner)Plans include set-up-shelter, clear-road-wreck, provide-
medical-attentionSingle correct plan in each exampleDomain consists of several entities and sub-goalsTest the ability to scale to large domains
DataContains 1000 examples
124
Monroe
MethodologyKnowledge base constructed based on the domain knowledge
encoded in plannerLearn noisy-or parameters using EMCompute marginal probability for instances of top level plans
and pick the one with the highest marginal probabilitySystems compared
o BALPso MLN-HCAM [Singla and Mooney, 2011]
o Blaylock and Allen’s system [Blaylock and Allen, 2005]
Convergence score - measures the fraction of examples for which the plan schema was predicted correctly
125
Learning Results - Monroe
126
MW MW-Start Rand-Start
Conv Score 98.4 98.4 98.4
Acc-100 79.16 79.16 79.86
Acc-75 46.06 44.63 44.73
Acc-50 20.67 20.26 19.7
Acc-25 7.2 7.33 10.46
Linux [Blaylock and Allen, 2005]
TaskRecognize top level plans based on Linux commandsHuman users asked to perform tasks in Linux and commands
were recordedTop-level plans include find-file-by-ext, remove-file-by-ext,
copy-file, move-fileSingle correct plan in each exampleTests the ability to handle noise in data
o Users indicate success even when they have not achieved the task correctly
o Some top-level plans like find-file-by-ext and file-file-by-name have identical actions
DataContains 457 examples
127
Linux
MethodologyKnowledge base constructed based on the knowledge of Linux
commandsLearn noisy-or parameters using EMCompute marginal probability for instances of top level plans
and pick the one with the highest marginal probabilitySystems compared
o BALPso MLN-HCAM [Singla and Mooney, 2011]
o Blaylock and Allen’s system [Blaylock and Allen, 2005]
Convergence score - measures the fraction of examples for which the plan schema was predicted correctly o
128
Learning Results - Linux
129
Acc
ura
cy
Partial Observability
MW MW-Start Rand-Start
Conv Score 39.82 46.6 41.57
Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992]
TaskRecognize character’s top level plans based on actions
described in narrative textLogical representation of actions literals providedTop-level plans include shopping, robbing, restaurant
dining, partying Multiple top-level plans in each exampleTests the ability to predict multiple plans
Data25 development examples25 test examples
130
Story UnderstandingMethodology
Knowledge base constructed for ACCEL by Ng and Mooney [1992]
Insufficient number of examples to learn parameterso Noisy-or parameters set to 0.9o Noisy-and parameters set to 0.9o Priors tuned on development set
Compute MPE to get the best set of plansSystems compared
o BALPso MLN-HCAM [Singla and Mooney, 2011]
o ACCEL-Simplicity [Ng and Mooney, 1992]
o ACCEL-Coherence [Ng and Mooney, 1992]
– Specific for Story Understanding
131
Other Applications of BALPs
Medical diagnosisTextual entailmentComputational biology
Inferring gene relations based on the output of micro-array experiments
Any application that requires abductive reasoning
132
ACCEL[Ng and Mooney, 1992]
First-order logic based system for plan recognition
Simplicity metric selects explanations that have the least number of assumptions
Coherence metric selects explanations that connect maximum number of observationsMeasures explanatory coherenceSpecific to text interpretation
133
System by Blaylock and Allen[2005]
Statistical n-gram models to predict plans based on observed actions
Performs plan recognition in two phasesPredicts the plan schema firstPredicts arguments based on the predicted schema
134
Machine Reading
Machine reading involves automatic extraction of knowledge from natural language text
Approaches to machine readingExtract factual information like entities and relations
between entities that occur in text [Cohen, 1999; Craven et al., 2000; Bunescu and Mooney, 2007; Etzioni et al, 2008]
Extract commonsense knowledge about the domain [Nahm and Mooney, 2000; Carlson at el., 2010, Schoenmackers et al., 2010; Doppa et al., 2010]
135
Natural Language Text
Natural language text is not necessarily completeCommonsense information is not explicitly stated in textWell known facts are omitted from the text
Missing information has to be inferred from textHuman brain performs deeper inference using
commonsense knowledge IE systems have no access to commonsense knowledge
136
IE systems are limited to extracting information stated explicitly in text
IC ontology
57 entities 79 relations
137
Estimated Precision (preliminary results)
Model BLP-ALL BLP-Pos-Neg BLP-Pos-Only
Baseline [No. facts inferred] 38,151 1,510 18,501
Estimated precision 24.94 (951/3813)*
16.00 (12/75)*
31.80 (588/1849)*
138
Total facts extracted – 12,254* - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled documents from test set for manual evaluation
Estimated Precision (preliminary results)
Model BLP-ALL BLP-Pos-Neg BLP-Pos-Only
Baseline [No. facts inferred] 38,151 1,510 18,501
Estimated precision 24.94 (951/3813)*
16.00 (12/75)*
31.80 (588/1849)*
BLP-0.9 [No. facts inferred] 12,823 880 11,717
Estimated precision 34.68 (419/1208)*
13.63 (9/66)*
35.49 (411/1158)*
139
Total facts extracted – 12,254* - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled documents from test set for manual evaluation
Estimated Precision (preliminary results)
Model BLP-ALL BLP-Pos-Neg BLP-Pos-Only
Baseline [No. facts inferred] 38,151 1,510 18,501Estimated precision 24.94
(951/3813)*16.00 (12/75)*
31.80 (588/1849)*
BLP-0.9 [No. facts inferred] 12,823 880 11,717Estimated precision 34.68
(419/1208)*13.63 (9/66)*
35.49 (411/1158)*
BLP-0.95 [No. of facts inferred] 826 119 49Estimated precision 91.07
(51/56)*100 (1/1)*
Nil(0/0)*
140
Total facts extracted – 12,254* - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled documents from test set for manual evaluation
Estimated Recall
141
“Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA……."
“Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA……."
Extracted facts
nationState(USA) hasBirthPlace(BarackObama,USA)Person(BarackObama) citizenOf(BarackObama,USA)isLedBy(USA,BarackObama)
Extracted facts
nationState(USA) hasBirthPlace(BarackObama,USA)Person(BarackObama) citizenOf(BarackObama,USA)isLedBy(USA,BarackObama)
Rule
hasBirthPlace(A,B) citizenOf(A,B)
Rule
hasBirthPlace(A,B) citizenOf(A,B)
Estimated Recall for citizenOf
142
Extracted facts
nationState(USA) hasBirthPlace(BarackObama,USA)Person(BarackObama) citizenOf(BarackObama,USA)isLedBy(USA,BarackObama)
Extracted facts
nationState(USA) hasBirthPlace(BarackObama,USA)Person(BarackObama) citizenOf(BarackObama,USA)isLedBy(USA,BarackObama)
hasBirthPlace(A,B) citizenOf(A,B)hasBirthPlace(A,B) citizenOf(A,B)
hasBirthPlace(BarackObama,USA)hasBirthPlace(BarackObama,USA)
citizenOf(BarackObama,USA)citizenOf(BarackObama,USA)Estim
ated Recall – 1/1
Estimated Recall – 1/1
Estimated Recall for hasBirthPlace
143
Extracted facts
nationState(USA) hasBirthPlace(BarackObama,USA)Person(BarackObama) citizenOf(BarackObama,USA)isLedBy(USA,BarackObama)
Extracted facts
nationState(USA) hasBirthPlace(BarackObama,USA)Person(BarackObama) citizenOf(BarackObama,USA)isLedBy(USA,BarackObama)
hasBirthPlace(A,B) citizenOf(A,B)hasBirthPlace(A,B) citizenOf(A,B)
Estimated Recall – 0/1
Estimated Recall – 0/1
??
Proposed WorkLearn first-order rules from incomplete and uncertain
dataFacts extracted from natural language text are incomplete
o Existing rule/structure learners assume that data is completeExtractions from IE system have associated confidence scores
o Existing structure learners cannot use these extraction probabilities
Absence of negative instanceso Most rule learners require both positive and negative
instanceso Closed world assumption does not hold for machine reading
144