PhD in Computer Science and Engineering Bologna, April...
Transcript of PhD in Computer Science and Engineering Bologna, April...
![Page 1: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/1.jpg)
PhD in Computer Science and EngineeringBologna, April 2016
Machine Learning
Marco Lippi
Marco Lippi Machine Learning 1 / 73
![Page 2: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/2.jpg)
Statistical Relational Learning
Marco Lippi Machine Learning 2 / 73
![Page 3: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/3.jpg)
Structured and relational data
In real-world problems, machine learning has very often to dealwith highly structured and relational data.
Marco Lippi Machine Learning 3 / 73
![Page 4: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/4.jpg)
Why do we need SRL ?
Statistical learning assumes examples to be independent andidentically distributed (i.i.d.)
Traditional logic approaches (ILP) can handle relations butassume no noise or uncertainty in data
Many application domains need both uncertainly handling andknowledge representation !
Marco Lippi Machine Learning 4 / 73
![Page 5: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/5.jpg)
Why do we need SRL ?
General observations
Improve perfomance on certain tasks
Improve interpretability of results
Maybe computationally more expensive
Learning could be much harder
Marco Lippi Machine Learning 5 / 73
![Page 6: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/6.jpg)
Structured and relational data: tasks
Collective classification
Marco Lippi Machine Learning 6 / 73
![Page 7: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/7.jpg)
Structured and relational data: tasks
Node classification
User profiles in a social network
Gene functions in a regulatory network
Congestions in a transportation network
Service requests in p2p networks
Fault diagnosis in sensor networks
Hypertext categorization on the Internet
. . .
Marco Lippi Machine Learning 7 / 73
![Page 8: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/8.jpg)
Structured and relational data: tasks
Node classification
use attributes of each node
use attributes of neighbor nodes
use features coming from the graph structure
use labels of other nodes
In social networks, two important phenomena can happen
homophily → a link between individuals is correlated withthose individuals being similar in nature
co-citation regularity → similar individuals tend to berelated/connected to the same things
Marco Lippi Machine Learning 8 / 73
![Page 9: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/9.jpg)
Structured and relational data: tasks
Node classification
Iterative approaches:
1 label a few nodes, given existing labels and features
2 train using the newly labeled examples
3 repeat the procedure until all nodes are labeled
Random walk approaches:
the probability that a node v receives label y is that of arandom walk starting at v will end at a node labeled with y
could be solved by label propagation or graphregularization
Marco Lippi Machine Learning 9 / 73
![Page 10: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/10.jpg)
Structured and relational data: tasks
Edge classification
Classify properties of edges
viral marketing: influence of a user on his neighbors
argumentation mining: support/attack relationships
social networks: types of interatcions between users
In social networks there are two main theories:
balance → the friend of my friend is my friend, the enemy ofmy friend is my enemy
status → positive edge between vi and vj means vj has ahigher status than vi
Marco Lippi Machine Learning 10 / 73
![Page 11: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/11.jpg)
Structured and relational data: tasks
Link prediction (or Link mining)
Marco Lippi Machine Learning 11 / 73
![Page 12: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/12.jpg)
Structured and relational data: tasks
Link prediction
Friendship in a social network
Recommendation in a customer-product network
Interaction in a biological network
Link congestion in a transportation network
Link congestion in a p2p network
. . .
Marco Lippi Machine Learning 12 / 73
![Page 13: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/13.jpg)
Structured and relational data: tasks
Networks are very often dynamic:
nodes may change over time
links may change over time
node properties may change over time
edge properties may change over time
Shall we predict the evolution of the network ?E.g., given current links, which ones are likely to change ?
Marco Lippi Machine Learning 13 / 73
![Page 14: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/14.jpg)
Structured and relational data: tasks
Entity resolution
[Image from Entity Resolution Tutorial, VLDB2012, Lise Getoor]
Marco Lippi Machine Learning 14 / 73
![Page 15: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/15.jpg)
Structured and relational data: tasks
Entity resolution
Same paper in a bibliography corpus
IP aliases over the Internet
User matching across social networks
Name disambiguation in a collection of documents
Multiple records across databases
Marco Lippi Machine Learning 15 / 73
![Page 16: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/16.jpg)
Structured and relational data: tasks
Group detection
Marco Lippi Machine Learning 16 / 73
![Page 17: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/17.jpg)
Structured and relational data: tasks
Group detection
Communities in mobile networks
Terrorist cells in social networks
Correlation in international flight databases
Document ranking and hypertext connectivity
Information retrieval and document clustering
. . .
Marco Lippi Machine Learning 17 / 73
![Page 18: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/18.jpg)
Structured and relational data: tasks
Frequent subgraph discovery
Marco Lippi Machine Learning 18 / 73
![Page 19: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/19.jpg)
Structured and relational data: tasks
Structure learning
Marco Lippi Machine Learning 19 / 73
![Page 20: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/20.jpg)
Structured and relational data: tasks
Predicate invention
[Image from Khan et al., ILP 1998]
Marco Lippi Machine Learning 20 / 73
![Page 21: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/21.jpg)
Statistical Relational Learning (SRL)
The “relational revolution” of SRL stays at the intersection ofArtificial Intelligence, Logic and Statistics
Logic→ powerful and expressive formalism→ describes a domain in terms of quantified logic formulae→ allows to easily include background knowledge
Probabilistic graphical models→ represent dependencies between random variables→ naturally handle uncertainty in data→ i.e., Bayesian Networks, Markov Random Fields
Marco Lippi Machine Learning 21 / 73
![Page 22: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/22.jpg)
Logic
Propositional logic
each proposition is associated to a symbol
symbols have an associated truth value (true/false)
logical operators (connectives) and inference at propositionallevel
ExampleP = If it rains, Alice carries an umbrellaQ = It rainsR = Alice carries an umbrella
Can we infer (entail) R from P and Q ?
Marco Lippi Machine Learning 22 / 73
![Page 23: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/23.jpg)
Logic
First-Order Logic (FOL)
propositional logic is not enough expressive
FOL described domains with objects, properties, relations
it employs connectives as well as quantifiers
Constants: Alice, Italy, Snoopy, . . .Variables: x, y, z, . . .Functions: father of, . . .Predicates: blonde, friends, . . .Atoms: blonde(Alice), friends(x,father of(y))Literals: blonde(Alice), ¬blonde(Alice)Formulas: ∀ x,y,z friends(x,y) ∧ friends(y,z) => friends(x,z)
Marco Lippi Machine Learning 23 / 73
![Page 24: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/24.jpg)
First-Order Logic (FOL)
Inference in first-order logic is semi-decidable
Remark
Semi-decidability of logic entailment KB |= F implies that there isan algorithm which can always tell correctly when F is entailed byKB, but can provide either a negative answer or no answer at all inthe case that F is not entailed by KB.
Marco Lippi Machine Learning 24 / 73
![Page 25: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/25.jpg)
Inductive Logic Programming
A research area that started during the 70s
Logic programming (Prolog)
Key ideas:
describe the domain of interest in terms of logic facts
background knowledge is also encoded in the logic database
learn concepts (rules) directly from data
try to entail all positive examples and none of the negatives
Marco Lippi Machine Learning 25 / 73
![Page 26: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/26.jpg)
Inductive Logic Programming
ExampleFOIL – First Order Inductive Learner
General-to-specific learner
Starts from an empty clause
Iteratively adds literals
Exploits information gain
Example:strawberry(x)strawberry(x) :- red(x)strawberry(x) :- red(x), smaller than apple(x)strawberry(x) :- red(x), smaller than apple(x), bigger than ribes(x)
Marco Lippi Machine Learning 26 / 73
![Page 27: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/27.jpg)
Graphical models
A graphical model is a probabilistic model where a graph encodesthe conditional dependence between random variables
The model can be either directed or undirected:
directed → Bayesian networks
undirected → Markov networks
Marco Lippi Machine Learning 27 / 73
![Page 28: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/28.jpg)
Bayesian networks
The network structure is a directed acyclic graph
It represents the joint probability of random variables
Node have associated local conditional probability tables
[Image from Russel & Norvig]
Marco Lippi Machine Learning 28 / 73
![Page 29: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/29.jpg)
Bayesian networks
Given variables X1, . . . ,Xn, the joint probability is computed as:
P(X1, . . . ,Xn) =n∏
i=1
P(xi |pai )
where pai are the parents of node i
Use Bayes theorem to find the values of other variables
Marco Lippi Machine Learning 29 / 73
![Page 30: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/30.jpg)
Markov networks (or Markov random fields)
The network structure is an undirected graph
It represents the joint probability of random variables
Random variables have to satisfy Markov properties
Each clique has an associated potential function
P(X = x) =1
Z
∏C∈cl(G)
φC (xC )
[Image from washington.edu]
Marco Lippi Machine Learning 30 / 73
![Page 31: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/31.jpg)
Markov networks (or Markov random fields)
Markov properties:
1 Any two non-adjacent variables are conditionally independentgiven all other variables
2 A variable is conditionally independent of all other variablesgiven its neighbors
3 Any two subsets of variables are conditionally independentgiven a separating subset
[Image from washington.edu]
Marco Lippi Machine Learning 31 / 73
![Page 32: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/32.jpg)
Markov networks (or Markov random fields)
Any Markov network can be written as a log-linear model:
P(X = x) =1
Zexp
(∑k
wTk f (x{k})
)
Remark
A Markov network can represent dependencies which a Bayesiannetwork cannot (e.g., cyclic dependencies) and the other wayround (induced dependencies)
Marco Lippi Machine Learning 32 / 73
![Page 33: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/33.jpg)
Inference in graphical models
Probabilistic inference: inferring the posterior distribution ofunobserved variables, given observed ones
Exact methods (#P-complete):
variable elimination
clique tree propagation
recursive conditioning
. . .
Approximate methods:
loopy belief propagation
Markov Chain Monte Carlo
importance sampling
. . .
Marco Lippi Machine Learning 33 / 73
![Page 34: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/34.jpg)
Inference in graphical models
Example: Markov Chain Monte Carlo with Gibbs sampling
1 state ← random truth assigment
2 for i ← 1 to numSamples
3 for each variable x
4 sample x according to P(x |neighbors(x))
5 state ← state with new value of x
6 P(F )← fraction of states where F is true
[Slide from washington.edu]
Marco Lippi Machine Learning 34 / 73
![Page 35: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/35.jpg)
Statistical Relational Learning (SRL)
The SRL alphabet soup:
Stochastic Logic Programs (SLPs) [Muggleton, 1996]
Relational Bayesian Networks (RBNs) [Jaeger, 1997]
Bayesian Logic Programs (BLPs) [Kersting & De Raedt, 2001]
Probabilistic Relational Models (PRMs) [Friedman et al., 2001]
Relational Dependency Networks (RDNs) [Neville & Jensen, 2007]
Problog [De Raedt et al., 2007]
Type Extension Trees (TETs) [Jaeger & al., 2013]
Learning From Constraints (LFC) [Gori & al., 2014]
Markov Logic Networks (MLNs) [Domingos & Richardson, 2006]
. . .
Marco Lippi Machine Learning 35 / 73
![Page 36: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/36.jpg)
Markov Logic Networks
Marco Lippi Machine Learning 36 / 73
![Page 37: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/37.jpg)
Markov Logic Networks [Domingos & Richardson, 2006]
An MLN is a set of FOL formulae with attached weights
An example
1.2 Friends(x,y) ∧ WatchedMovie(x,m) => WatchedMovie(y,m)
2.3 Friends(x,y) ∧ Friends(y,z) => Friends(x,z)
0.8 LikedMovie(x,m) ∧ Friends(x,y) => LikedMovie(y,m)
. . . . . .
The higher the weight of a clause →→ The lower the probability for a world violating that clause
What is a world or Herbrand interpretation ?→ A truth assignment to all ground predicates
Marco Lippi Machine Learning 37 / 73
![Page 38: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/38.jpg)
Markov Logic Networks
Together with a (finite) set of (unique and possibly typed)constants, an MLN defines a Markov Network which contains:
1 a binary node for each predicate grounding in the MLN, withvalue 0/1 if the atom is false/true
2 an edge between two nodes appearing together in (at least)one formula on the MLN
3 a feature for each formula grounding in the MLN, whose valueis 0/1 if the formula is false/true, and whose weight is theweight of the formula
Marco Lippi Machine Learning 38 / 73
![Page 39: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/39.jpg)
Markov Logic Networks
Set of constants
people = {Alice,Bob,Carl,David}movie = {BladeRunner,ForrestGump,PulpFiction,TheMatrix}
Friends(Alice,Bob)
Friends(Bob,Carl)
WatchedMovie(Alice,TheMatrix)
WatchedMovie(Bob,TheMatrix)
Marco Lippi Machine Learning 39 / 73
![Page 40: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/40.jpg)
Link with graphical models
The semantics of MLNs induces a probability distribution over allpossible worlds. We indicate with X a set of random variablesrepresented in the model, then we have:
P(X = x) =exp
(∑Fi∈F wini (x)
)Z
being ni (x) the number of true groundings of formula i in world x .
The definition is similar to the joint probability distribution inducedby a Markov network and expressed with a log-linear model:
P(X = x) =exp
(∑j wj fj(x)
)Z
Marco Lippi Machine Learning 40 / 73
![Page 41: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/41.jpg)
Discriminative setting
Typically, some atoms are always observed (evidence X),while others are unknown at prediction time (query Y)
EVIDENCEFriends(Alice,Bob)
Friends(Bob,Carl)
WatchedMovie(Alice,PulpFiction)
WatchedMovie(David,BladeRunner)
. . .
QUERYLikedMovie(Alice,BladeRunner) ?LikedMovie(Alice,PulpFiction) ?LikedMovie(Bob,BladeRunner) ?LikedMovie(Bob,TheMatrix) ?
. . .
P(Y = y |X = x) =exp
(∑Fi∈Fy
wini (x , y))
Zx
Marco Lippi Machine Learning 41 / 73
![Page 42: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/42.jpg)
Discriminative setting
Three main MLN tasks:
1 Inference
2 Parameter Learning
3 Structure Learning
Popularity of Markov Logic:→ Alchemy, an open-source software
Marco Lippi Machine Learning 42 / 73
![Page 43: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/43.jpg)
Inference
In the discriminative setting, inference corresponds to finding themost likely interpretation (MAP – Maximum A Posteriori)
y∗ = argmaxy
P(Y = y |X = x)
#P-complete problem → approximate algorithms
MaxWalkSAT [Kautz et al., 1996], stochastic local search→ corresponds to minimizing the sum of weights of
unsatisfied clauses
Note: there are also algorithms for estimating the probability ofquery atoms to be true
Marco Lippi Machine Learning 43 / 73
![Page 44: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/44.jpg)
Inference: MaxWalkSAT [Kautz et al., 1996]
Main ideas:
start with a random truth value assignment
flip the atom giving the highest improvement (greedy)
can get stuck in local minima
sometimes perform a random flip
stochastic algorithm (many runs often needed)
need to build the whole ground network !
Marco Lippi Machine Learning 44 / 73
![Page 45: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/45.jpg)
Inference: MaxWalkSAT [Kautz et al., 1996]
Main drawbackMemory explosion: with N constants and c as highest clause arity,the ground network requires O(nc) memory
One possible solutionExploit sparseness by lazily grounding clauses (LazySAT) . . . But
still not enough !
Marco Lippi Machine Learning 45 / 73
![Page 46: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/46.jpg)
Lifted inference
Key ideas:
exploit symmetries
reason at first-order level
reason about groups of objects
scalable inference
Marco Lippi Machine Learning 46 / 73
![Page 47: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/47.jpg)
Inference
In many cases, one would prefer to have a probability distributionover query atoms, rather than just inferring the most likely state.
Several attempts:
Markov Chain Monte Carlo over the Markov Network inducedby the Markov Logic Networks, and the given set of constants
Markov Chain SAT (MC-SAT), combining Markov ChainMonte Carlo and WalkSAT
Marco Lippi Machine Learning 47 / 73
![Page 48: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/48.jpg)
Parameter Learning
Maximize the conditional likelihood of query predicates givenevidence ones: requires inference as subroutine !
∂
∂wilogP(Y = y |X = x) = ni − Ew [ni ]
Several different algorithms can be adopted to address the task:
Voted Perceptron
Contrastive Divergence
Diagonal Newton
(Preconditioned) Scaled Conjugate Gradient
Marco Lippi Machine Learning 48 / 73
![Page 49: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/49.jpg)
Structure Learning
Directly infer the rules from the data !
A classic task for Inductive Logic Programming (ILP), which forMLNs can be addressed together with parameter learning
Modified ILP algorithms (e.g., Aleph)
Bottom-Up Clause Learning
Iterated Local Search
Structural Motifs
Still an open problem !
Marco Lippi Machine Learning 49 / 73
![Page 50: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/50.jpg)
Structure Learning
Modified ILP algorithms
Two-step approach:
1 learn a set of clauses with FOIL
2 just learn the weights with Alchemy
It works quite well in practice, but it still relies on anon-probabilistic ILP framework !
Jointly learning clauses and weights should be better !
Marco Lippi Machine Learning 50 / 73
![Page 51: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/51.jpg)
Structure Learning
Bottom-Up Clause Learning, Iterated Local Search, StructuralMotifs
Start with unit clauses (single literal)
Add/Remove literal
Measure evaluation function (e.g. pseudo-likelihood)
Learn weights by counting groundings
Marco Lippi Machine Learning 51 / 73
![Page 52: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/52.jpg)
Applications
Hypertext Classification
Topic(page,topic)
HasWord(page,word)
Link(page,page)
HasWord(p,+w) => Topic(p,+t)
Topic(p,t) ∧ Link(p,q) => Topic(q,t)
Marco Lippi Machine Learning 52 / 73
![Page 53: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/53.jpg)
Applications
Information Retrieval
InQuery(word)
HasWord(page,word)
Link(page,page)
Relevant(page)
HasWord(p,+w) ∧ InQuery(w) => Relevant(p)
Relevant(p) ∧ Link(p,q) => Relevant(q)
Marco Lippi Machine Learning 53 / 73
![Page 54: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/54.jpg)
Applications
Entity Resolution
HasToken(token,field,record)
SameField(field,record,record)
SameRecord(record,record)
HasToken(+t,+f,r1) ∧ HasToken(+t,+f,r2) =>
SameField(f,r,s)
SameField(f,r1,r2) => SameRecord(r1,r2)
SameRecord(r1,r2) ∧ SameRecord(r2,r3) =>
SameRecord(r1,r3)
Marco Lippi Machine Learning 54 / 73
![Page 55: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/55.jpg)
Applications: a chemical OCR
From images to structured data. . .
IsOxygen(A1)
IsCarbon(A2)
IsCarbon(A3)
DoubleBond(A1,A2)
SingleBond(A1,A3)
. . .
http://mlocsr.dinfo.unifi.it
Marco Lippi Machine Learning 55 / 73
![Page 56: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/56.jpg)
System Architecture
Two-stage architecture
1 Image Processing level
2 Markov Logic level
Main idea→ extract low-level features→ translate them into Markov logic rules→ infer the final structure
Marco Lippi Machine Learning 56 / 73
![Page 57: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/57.jpg)
Original input image
Marco Lippi Machine Learning 57 / 73
![Page 58: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/58.jpg)
Output of Image Processing stage
Marco Lippi Machine Learning 58 / 73
![Page 59: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/59.jpg)
Output of Markov Logic stage
Marco Lippi Machine Learning 59 / 73
![Page 60: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/60.jpg)
Applications: opponent modeling
Opponent modeling in games
Games have always been a great opportunity for AI
incomplete information
uncertainty in data
a lot of background knowledge
A nice domain also for SRL
Marco Lippi Machine Learning 60 / 73
![Page 61: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/61.jpg)
Applications: opponent modeling
Opponent modeling in games
Understand an adversarial’s move given her past behavior:
describe the domain in terms of logic predicates
represent past played matches
try to learn strategies through probabilistic rules
Marco Lippi Machine Learning 61 / 73
![Page 62: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/62.jpg)
Handling continuous features
In many cases it is necessary to use continuous features
classic machine learning classifiers naturally employ them
they are often crucial in order to build accurate predictors
→ How to integrate them with first-order logic ?
Marco Lippi Machine Learning 62 / 73
![Page 63: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/63.jpg)
Markov logic networks with grounding-specific weights
Solution: grounding-specific weights [Lippi & Frasconi, 2009]
Instead of having a weight for a first-order logic rule, we allowdifferent weights for different ground clauses.
Introducing vector of features
Node(N12,$Features N12) ∧ Node(N23,$Features N23)
⇒ Link(N12,N23)
Node(N18,$Features N18) ∧ Node(N25,$Features N25)
⇒ Link(N18,N25)
The two rules will have different weights, which will be computedusing as feature vectors the “constants” with the dollar symbol ($).
Marco Lippi Machine Learning 63 / 73
![Page 64: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/64.jpg)
Markov logic networks with grounding-specific weights
Re-parametrize the MLN by computing each weight wi as afunction of variables of each specific grounding cij :
Standard MLN
P(Y = y |X = x) =exp(∑
Fi∈Fywini (x ,y)
)Zx
MLNs with grounding-specific weights
P(Y = y |X = x) =exp(∑
Fi∈Fy
∑j wi (cij ,θi )nij (x ,y)
)Zx
Instead of having a weight for a first-order logic rule, we allowdifferent weights for different ground clauses.
Marco Lippi Machine Learning 64 / 73
![Page 65: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/65.jpg)
Markov logic networks with grounding-specific weights
The weights wi (cij , θi ) can be computed in several ways
using neural networks, by taking as input an encoding of thegrounding cij (in principle any feature can be used !)
Inference algorithms do not change.
Learning algorithm can implement gradient descent:
∂P(y |x)
∂θk=∂P(y |x)
∂wi
∂wi
∂θk
where the first term is computed by MLN inferenceand the second one is computed by backpropagation.
Marco Lippi Machine Learning 65 / 73
![Page 66: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/66.jpg)
Other frameworks
Marco Lippi Machine Learning 66 / 73
![Page 67: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/67.jpg)
Stochastic Logic Programs (SLPs) [Muggleton, 1996]
Generalization of:
Hidden Markov Models
Probablistic Context-Free Grammars
Based on the concept of stochastic clause:
λ A← B1, . . . ,Bn
The weight λ represents the probability that a clause is used in aderivation, provided that head is true
If some rules have no λ, the SLP is called impure, and it can allowboth stochastic and deterministic derivations
Marco Lippi Machine Learning 67 / 73
![Page 68: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/68.jpg)
Relational Bayesian Networks (RBNs) [Jaeger, 1997]
Directed acyclic graph:
each node corresponds to a relation r in the domain
attached probability formula Fr (x1, . . . , xn) over the symbolsin the parent nodes of r
SemanticsFor every relation r , the RBN associates a probability distributionover interpretations of r , which is defined on the interpretations ofpa(r) within the underlying Bayesian network
Marco Lippi Machine Learning 68 / 73
![Page 69: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/69.jpg)
Probabilistic Relational Models (PRMs) [Friedman et al., 2001]
Probability distribution over a relational database:
description of the relational schema of the domain
probabilistic dependencies among its objects
Given a set of ground objects, a PRM specifies a joint probabilitydistribution over all instantiations of a relational schema.
Probabilities on unseen variables, given partial instantiations
[Image from mit.edu]
Marco Lippi Machine Learning 69 / 73
![Page 70: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/70.jpg)
Problog [De Raedt et al., 2007]
Probabilistic extension of Prolog
logic clauses with associated probability
probability of each clause is independent of the others
distribution over logic programs
P(L|T ) =∏cj∈L
pj∏cj /∈L
(1− pj)
Initially designed to answer probabilistic queries
Marco Lippi Machine Learning 70 / 73
![Page 71: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/71.jpg)
Type Extension Trees (TETs) [Jaeger & al., 2013]
Representation formalism for complex combinatorial features inrelational domains.
A TET is a tree which represents a feature or a set of features:
nodes contain literals, or conjunctions of literals
edges are labeled with sets of variables (possibly empty)
TETs are syntactically closely related to predicate logic formulae,where subtrees correspond to sub-formulae.
TETs have been used for feature discovery and for classification
Marco Lippi Machine Learning 71 / 73
![Page 72: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/72.jpg)
Learning From Constraints (LFC) [Gori & al., 2014]
Combining kernel machines and first-order logic
Translate logic clauses (background knowledge) into a semanticregularization term, in addition to classic regularization
minf
N∑i=1
V (f ) + R(f ) + S(f )
S(f) can encode generic constraints
logic rules transformed through p-norms
Marco Lippi Machine Learning 72 / 73
![Page 73: PhD in Computer Science and Engineering Bologna, April …lia.disi.unibo.it/~ml/teaching/lecture_6.pdf · could be solved by label propagation or graph ... 1 state random truth assigment](https://reader031.fdocuments.net/reader031/viewer/2022030422/5aaa00897f8b9a90188d8b2f/html5/thumbnails/73.jpg)
Conclusions and future work
SRL aims at combining logic and statistical learning→ crucial when dealing with relational data
The problem is far for being solved:
scalability
expressivity
more efficient approximation algorithms
extend to many other applications and contexts !
. . . How about incorporating deep learning ?
Marco Lippi Machine Learning 73 / 73