Approaches to the analysis and visualization of multi-modal and multi-relational networks
Active Learning for Multi-relational Data Construction
-
Upload
kanojikajino -
Category
Science
-
view
67 -
download
3
Transcript of Active Learning for Multi-relational Data Construction
Active Learning forMulti-relational Data Construction
Hiroshi Kajino1, Akihiro Kishimoto2, Adi Botea2
Elizabeth Daly2, Spyros Kotoulas2
1: The University of Tokyo, Japan, 2: IBM Research - Ireland
1
/28
■ Research focus: Manual RDF data construction□ Some data are difficult to extract automatically from docs
Q: How can we efficiently construct the dataset by hands?
■ Our solution: Active learning + multi-relational learning
□ Reduce the number of queries as much as possible
2
We develop a method to support hand RDF data annotation
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
/28
■ Outline□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations: – Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
3
/28
■ Outline□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations: – Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
4
/28
■ Multi-relational dataset (RDF format)□ Triple: t = (i, j, k)
• Entity: i, j ∈ E
• Relation: k ∈ R
□ Label:
• t is positive ⇔ Entity i is in relation k with entity j
• t is negative⇔ Entity i is not in relation k with entity j
□ Multi-relational dataset: (Δp, Δn)
Δp = {t ∈ Δ | t is positive}, Δn = {t ∈ Δ | t is negative}
• Assume: |Δp| ≪ |Δ|, some triples remain unlabeled
5
Multi-relational dataset consists of binary-labeled triples
Dog Animal
Human
is a part ofis the same as
is a part of
Set of all the triples
/28
■ Motivation of manual construction□ Knowledge base: Human knowledge encoded in RDF
Point: Commonsense knowledge rarely appears in docs
→ Difficult to extract it automatically from documents
□ Biological dataset:
→ Some unknown triples require experiments for labeling6
Some RDF datasets require hand annotation by nature
Dataset Positive triple exampleWordNet [Miller, 95] (dog, canine, synset), (dog, poodle, hypernym)
ConceptNet [Liu+,04] (saxophone, jazz, UsedFor), (learn, knowledge, MotivatedByGoal)
interact participate
Protein DNA Cell cycle
/28
■ Two problem formulations□ Inputs:
• Set of entities E, relations R, annotator O: Δ→{+1,-1}
□ Problem 1: Dataset construction problem• Output: Positive triples Δp
• Note: Positive triples are usually quite few
□ Problem 2: Predictive model construction problem• Output: Multi-relational model M: Δ→R
• Note: The model can predict labels of unlabeled triples
※ More direct formulation than Prob. 1 if the model is the goal
7
Two problem settings reflect different usages of a dataset
・ No error・ B times access
Degree of “positiveness”
/28
■ Outline□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations: – Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
8
/28
■ Active Multi-relational Data Construction□ Overview:
9
Our solution, AMDC, repeats learning and querying B times
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
Train the model using the current training dataset
/28
■ Active Multi-relational Data Construction□ Overview:
10
Our solution, AMDC, repeats learning and querying B times
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
AMDC is able to compute predictive score st (t ∈Δu):Larger/smaller st ⇔ model believes t is pos/neg
/28
■ Active Multi-relational Data Construction□ Overview:
11
Our solution, AMDC, repeats learning and querying B times
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
Compute query score qt (t ∈Δu) using stSmaller qt ⇔ t is informative for dataset construction
/28
■ Active Multi-relational Data Construction□ Overview:
12
Our solution, AMDC, repeats learning and querying B times
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
Training dataset (Δp, Δn)
/28
■ Active Multi-relational Data Construction□ Details:
• Query scores qt• Multi-relational model, predictive score st
13
We explain the details of AMDC in 2 parts
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
/28
■ Active Multi-relational Data Construction□ Details:
• Query scores qt• Multi-relational model, predictive score st
14
We explain the details of AMDC in 2 parts
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
/28
■ AMDC (1/2): Query scores□ Given: predictive score st, threshold 0
s.t. st > 0 (< 0)⇔ model believes t is positive (negative)
□ Query score qt (t ∈ Δ): Query the label on triples {t} w/ smallest qt
• Positiveness score (for Problem 1): qt := - st
Choose triples the model believes to be positive
• Uncertainty score (for Problem 2): qt = |st |
Choose triples that the model is uncertain
※ AMDC handles two problems just by switching the query score15
We employ two different query scores for the two problems
pos
neg
st
0
/28
■ Active Multi-relational Data Construction□ Details:
• Query scores qt• Multi-relational model, predictive score st
16
We explain the details of AMDC in 2 parts
Multi-relationalmodel Annotators1. Query labels of
informative triples
2. Return labels3. Update the dataset & retrain the model
/28
■ AMDC (2/2): Multi-relational model□ RESCAL [Nickel+,11]:
• Model:– ai ∈ RD : Latent vector of entity i
– Rk ∈ RD×D : Latent matrix of relation k
• Predictive score: st = aiT Rk aj
Large/small st⇔ t is likely to be positive/negative
□ Additional constraints: |ai| = 1, Rk = rotation matrix
• Reduce the degree of freedom
• Stabilize learning in case of small labels (at the beginning)
(→ experiments)17
We add two constraints to RESCAL to avoid overfitting
New
/28
■ AMDC (2/2): Optimization problem for learning
18
Pros Conspos AUC-loss
s(pos) > s(non-pos)- Robust to pos/neg ratio- Unlabeled triples are used
- Neg is not explicitly used- No threshold for pos/neg
neg AUC-losss(non-neg) > s(neg)
Neg triples are explicitly used(→ experiments)
No threshold between pos/neg
Classificationerror
s(pos) > 0s(neg) < 0
- Threshold between pos/neg→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
posst
unlabeledneg
/28
■ AMDC (2/2): Optimization problem for learning
19
Pros Conspos AUC-loss
s(pos) > s(non-pos)- Robust to pos/neg ratio- Unlabeled triples are used
- Neg is not explicitly used- No threshold for pos/neg
neg AUC-losss(non-neg) > s(neg)
Neg triples are explicitly used(→ experiments)
No threshold between pos/neg
Classificationerror
s(pos) > 0s(neg) < 0
- Threshold between pos/neg→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
neg
stpos
unlabeledpos
st
unlabeledneg
+
/28
■ AMDC (2/2): Optimization problem for learning
20
Pros Conspos AUC-loss
s(pos) > s(non-pos)- Robust to pos/neg ratio- Unlabeled triples are used
- Neg is not explicitly used- No threshold for pos/neg
neg AUC-losss(non-neg) > s(neg)
Neg triples are explicitly used(→ experiments)
No threshold between pos/neg
Classificationerror
s(pos) > 0s(neg) < 0
- Threshold between pos/neg→ Able to compute
the uncertainty score
- Non-robust to pos/neg ratio- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
neg
stpos
unlabeledpos
st
unlabeledneg
pos
neg
st
unlabeled = +
/28
■ AMDC (2/2): Optimization problem for learning
21
Pros Conspos AUC-loss
s(pos) > s(non-pos)- Robust to pos/neg ratio- Unlabeled triples are used
- Neg is not explicitly used- No threshold for pos/neg
neg AUC-losss(non-neg) > s(neg)
Neg triples are explicitly used(→ experiments)
No threshold between pos/neg
Classificationloss
s(pos) > 0s(neg) < 0
- Threshold between pos/neg
→ Able to compute the uncertainty score
- Non-robust to pos/neg ratio- Difficult to use unlabeled triples
Two objective functions are added to overcome the cons
min pos AUC-loss + neg AUC-loss + classification loss
New
New
pos
neg
st
unlabeled0
/28
■ AMDC (2/2): Optimization problem
□ Algorithm: Stochastic gradient descent
□ Parameters:
□ Hyperparameters: γ, γ’, Cn, Ce, D
At each iteration, we choose the best model by using a val set22
Margin-based loss functions are optimized using SGD
s(pos) > s(non-pos)
s(non-neg) > s(neg)
s(pos) > 0 s(neg) < 0
,
/28
■ Outline□ Problem settings:
• Multi-relational (RDF) data and their applications
• Two formulations: – Dataset construction problem
– Predictive model construction problem
□ Our solution (AMDC):
• Active learning
• Multi-relational learning
□ Experiments
23
/28
■ Experiments□ Purpose: Evaluate 3 contributions of AMDC in two problems
• Query scores (vs. AMDC + random query)
• Constraints on RESCAL (vs. AMDC - constraints)
• neg AUC-loss (vs. AMDC - neg-AUC)
□ Datasets:• Annotators are simulated
24
We evaluate 3 modifications using partial AMDCs
#(Entity) #(Relation) #(Pos) #(Neg)Kinships [Denham, 73] 104 26 10,790 270,426Nations [Rummel, 50-65] 125 57 2,565 8,626UMLS [McCray,03] 135 49 6,752 886,273
/28
■ Experiments (1/2): Dataset construction problemScore: %(pos triples collected by AMDC)
□ AMCD shows 2.4 – 19 times improvements over Random
□ Negative triples are helpful when they are abundant (K, U)
□ Effects of the constraints are incremental
25
AMDC has collected 2.4 – 19 times as many positive triples as baselines
10 trials, (Q, q) = (105, 103) ((2×103,102) for Nations)Nations
0 200 400 600 800 1000 1200 1400 1600#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Com
plet
ion
rate
AMDCAMDC randAMDC pos onlyAMDC no const
UMLS
0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
Com
plet
ion
rate
AMDCAMDC randAMDC pos onlyAMDC no const
Kinships
0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Com
plet
ion
rate
AMDCAMDC randAMDC pos onlyAMDC no const
No neg-AUC
Random
Full AMDCNo constraints
/28
■ Experiments (2/2): Predictive model construction problem
Score: ROC-AUC
□ AMDC often achieves better AUC than Random (K, U)
□ Negative triples are also helpful to improve ROC-AUC
□ Constraints work to prevent overfitting
26
AMDC has achieved the best predictive score
Kinships
0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
RO
C-A
UC
AMDCAMDC randAMDC pos onlyAMDC no const
Nations
0 200 400 600 800 1000 1200 1400 1600#(Queries)
0.4
0.5
0.6
0.7
0.8
0.9
1.0
RO
C-A
UC
AMDCAMDC randAMDC pos onlyAMDC no const
UMLS
0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
RO
C-A
UC
AMDCAMDC randAMDC pos onlyAMDC no const
10 trials, (Q, q) = (105, 103) ((2×103,102) for Nations)
No neg-AUC
RandomFull AMDC
No constraints
/28
■ Conclusions□ Manual RDF dataset construction is still demanding
• Some datasets require hand annotation by its nature
• Crowdsourcing provides an easy way of recruiting annotators
⇒ It's time to consider the manual construction problem!
□ AMDC = active learning + multi-relational learning• RESCAL-based multi-relational learning
□ 3 key contributions lead to better performance• Active learning significantly reduces the cost
• Constraints prevents overfitting
• Negative AUC-loss works better in case of skewed datasets27
We consider manual annotation problems of the RDF data
/28
Thank you!
28