Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of...

Statistical Relational Learning

Pedro Domingos

Dept. Computer Science & Eng.

University of Washington

Overview

Motivation Some approaches Markov logic Application: Information extraction Challenges and open problems

Motivation

Most learners only apply to i.i.d. vectors But we need to do learning and (uncertain)

inference over arbitrary structures:trees, graphs, class hierarchies,relational databases, etc.

All these can be expressed in first-order logic

Let’s add learning and uncertain inference to first-order logic

Some Approaches

Probabilistic logic [Nilsson, 1986] Statistics and beliefs [Halpern, 1990] Knowledge-based model construction

[Wellman et al., 1992] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Relational Markov networks [Taskar et al., 2002] Markov logic [Richardson & Domingos, 2004] Bayesian logic [Milch et al., 2005] Etc.

Markov Logic

Logical formulas are hard constraintson the possible states of the world

Let’s make them soft constraints:When a state violates a formula,It becomes less probable, not impossible

Give each formula a weight(Higher weight Stronger constraint)

More precisely:Consider each grounding of a formula

Example: Friends & Smokers

)()(),(,

)()(

ySmokesxSmokesyxFriendsyx

xCancerxSmokesx

1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Two constants: Anna (A) and Bob (B)

Markov Logic (Contd.)

Probability of a state x:

Most discrete statistical models are special cases (e.g., Bayes nets, HMMs, etc.)

First-order logic is infinite-weight limit

Weight of formula i No. of true groundings of formula i in x

iii xnw

ZxP )(exp

1)(

Key Ingredients

Logical inference:Satisfiability testing

Probabilistic inference:Markov chain Monte Carlo

Inductive logic programming:Search with clause refinement operators

Statistical learning:Weight optimization by conjugate gradient

Alchemy

Open-source software available at:

A new kind of programming language Write formulas, learn weights, do inference Haven’t we seen this before? Yes, but without learning and uncertain

inference

alchemy.cs.washington.edu

Example:Information Extraction

Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).

Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.

H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.

P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.

Segmentation





Author

Title

Venue

Entity Resolution





State of the Art

Segmentation HMM (or CRF) to assign each token to a field

Entity resolution Logistic regression to predict same field/citation Transitive closure

Alchemy implementation: Seven formulas

Types and Predicates

token = {Parag, Singla, and, Pedro, ...}field = {Author, Title, Venue}citation = {C1, C2, ...}position = {0, 1, 2, ...}

Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)


token = {Parag, Singla, and, Pedro, ...}field = {Author, Title, Venue, ...}citation = {C1, C2, ...}position = {0, 1, 2, ...}


Optional


Input






Output

Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))

Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)

Formulas

Formulas



Formulas

Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))


Results: Segmentation on Cora

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

Tokens

Tokens + Sequence

Tok. + Seq. + Period

Tok. + Seq. + P. + Comma

Results:Matching Venues on Cora

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

Similarity

Sim. + Relations

Sim. + Transitivity

Sim. + Rel. + Trans.

Challenges and Open Problems

Scaling up learning and inference Model design (aka knowledge engineering) Generalizing across domain sizes Continuous distributions Relational data streams Relational decision theory Statistical predicate invention Experiment design

Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of...

Documents

Transcript of Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of...