Active Learning for Multi-relational Data Construction

Active Learning forMulti-relational Data Construction

Hiroshi Kajino1, Akihiro Kishimoto2, Adi Botea2

Elizabeth Daly2, Spyros Kotoulas2

1: The University of Tokyo, Japan, 2: IBM Research - Ireland

1

/28

■ Research focus: Manual RDF data construction□ Some data are difficult to extract automatically from docs

Q: How can we efficiently construct the dataset by hands?

■ Our solution: Active learning + multi-relational learning

□ Reduce the number of queries as much as possible

2

We develop a method to support hand RDF data annotation

Multi-relationalmodel Annotators1. Query labels of

informative triples

2. Return labels3. Update the dataset & retrain the model

/28

■ Outline□ Problem settings:

• Multi-relational (RDF) data and their applications

• Two formulations: – Dataset construction problem

– Predictive model construction problem

□ Our solution (AMDC):

• Active learning

• Multi-relational learning

□ Experiments

3

/28






• Active learning


□ Experiments

4

/28

■ Multi-relational dataset (RDF format)□ Triple: t = (i, j, k)

• Entity: i, j ∈ E

• Relation: k ∈ R

□ Label:

• t is positive ⇔ Entity i is in relation k with entity j

• t is negative⇔ Entity i is not in relation k with entity j

□ Multi-relational dataset: (Δp, Δn)

Δp = {t ∈ Δ | t is positive}, Δn = {t ∈ Δ | t is negative}

• Assume: |Δp| ≪ |Δ|, some triples remain unlabeled

5

Multi-relational dataset consists of binary-labeled triples

Dog Animal

Human

is a part ofis the same as

is a part of

Set of all the triples

/28

■ Motivation of manual construction□ Knowledge base: Human knowledge encoded in RDF

Point: Commonsense knowledge rarely appears in docs

→ Difficult to extract it automatically from documents

□ Biological dataset:

→ Some unknown triples require experiments for labeling6

Some RDF datasets require hand annotation by nature

Dataset Positive triple exampleWordNet [Miller, 95] (dog, canine, synset), (dog, poodle, hypernym)

ConceptNet [Liu+,04] (saxophone, jazz, UsedFor), (learn, knowledge, MotivatedByGoal)

interact participate

Protein DNA Cell cycle

/28

■ Two problem formulations□ Inputs:

• Set of entities E, relations R, annotator O: Δ→{+1,-1}

□ Problem 1: Dataset construction problem• Output: Positive triples Δp

• Note: Positive triples are usually quite few

□ Problem 2: Predictive model construction problem• Output: Multi-relational model M: Δ→R

• Note: The model can predict labels of unlabeled triples

※ More direct formulation than Prob. 1 if the model is the goal

7

Two problem settings reflect different usages of a dataset

・ No error・ B times access

Degree of “positiveness”

/28






• Active learning


□ Experiments

8

/28

■ Active Multi-relational Data Construction□ Overview:

9

Our solution, AMDC, repeats learning and querying B times


informative triples


Training dataset (Δp, Δn)

Train the model using the current training dataset

/28


10



informative triples



AMDC is able to compute predictive score st (t ∈Δu):Larger/smaller st ⇔ model believes t is pos/neg

/28


11



informative triples



Compute query score qt (t ∈Δu) using stSmaller qt ⇔ t is informative for dataset construction

/28


12



informative triples



/28

■ Active Multi-relational Data Construction□ Details:

• Query scores qt• Multi-relational model, predictive score st

13

We explain the details of AMDC in 2 parts


informative triples


/28



14



informative triples


/28

■ AMDC (1/2): Query scores□ Given: predictive score st, threshold 0

s.t. st > 0 (< 0)⇔ model believes t is positive (negative)

□ Query score qt (t ∈ Δ): Query the label on triples {t} w/ smallest qt

• Positiveness score (for Problem 1): qt := - st

Choose triples the model believes to be positive

• Uncertainty score (for Problem 2): qt = |st |

Choose triples that the model is uncertain

※ AMDC handles two problems just by switching the query score15

We employ two different query scores for the two problems

pos

neg

st

0

/28



16



informative triples


/28

■ AMDC (2/2): Multi-relational model□ RESCAL [Nickel+,11]:

• Model:– ai ∈ RD : Latent vector of entity i

– Rk ∈ RD×D : Latent matrix of relation k

• Predictive score: st = aiT Rk aj

Large/small st⇔ t is likely to be positive/negative

□ Additional constraints: |ai| = 1, Rk = rotation matrix

• Reduce the degree of freedom

• Stabilize learning in case of small labels (at the beginning)

(→ experiments)17

We add two constraints to RESCAL to avoid overfitting

New

/28

■ AMDC (2/2): Optimization problem for learning

18

Pros Conspos AUC-loss

s(pos) > s(non-pos)- Robust to pos/neg ratio- Unlabeled triples are used

- Neg is not explicitly used- No threshold for pos/neg

neg AUC-losss(non-neg) > s(neg)

Neg triples are explicitly used(→ experiments)

No threshold between pos/neg

Classificationerror

s(pos) > 0s(neg) < 0

- Threshold between pos/neg→ Able to compute

the uncertainty score

- Non-robust to pos/neg ratio- Difficult to use unlabeled triples

Two objective functions are added to overcome the cons

min pos AUC-loss + neg AUC-loss + classification loss

New

New

posst

unlabeledneg

/28


19







Classificationerror







New

New

neg

stpos

unlabeledpos

st

unlabeledneg

+

/28


20







Classificationerror







New

New

neg

stpos

unlabeledpos

st

unlabeledneg

pos

neg

st

unlabeled = +

/28


21







Classificationloss


- Threshold between pos/neg

→ Able to compute the uncertainty score




New

New

pos

neg

st

unlabeled0

/28

■ AMDC (2/2): Optimization problem

□ Algorithm: Stochastic gradient descent

□ Parameters:

□ Hyperparameters: γ, γ’, Cn, Ce, D

At each iteration, we choose the best model by using a val set22

Margin-based loss functions are optimized using SGD

s(pos) > s(non-pos)

s(non-neg) > s(neg)

s(pos) > 0 s(neg) < 0

,

/28






• Active learning


□ Experiments

23

/28

■ Experiments□ Purpose: Evaluate 3 contributions of AMDC in two problems

• Query scores (vs. AMDC + random query)

• Constraints on RESCAL (vs. AMDC - constraints)

• neg AUC-loss (vs. AMDC - neg-AUC)

□ Datasets:• Annotators are simulated

24

We evaluate 3 modifications using partial AMDCs

#(Entity) #(Relation) #(Pos) #(Neg)Kinships [Denham, 73] 104 26 10,790 270,426Nations [Rummel, 50-65] 125 57 2,565 8,626UMLS [McCray,03] 135 49 6,752 886,273

/28

■ Experiments (1/2): Dataset construction problemScore: %(pos triples collected by AMDC)

□ AMCD shows 2.4 – 19 times improvements over Random

□ Negative triples are helpful when they are abundant (K, U)

□ Effects of the constraints are incremental

25

AMDC has collected 2.4 – 19 times as many positive triples as baselines

10 trials, (Q, q) = (105, 103) ((2×103,102) for Nations)Nations

0 200 400 600 800 1000 1200 1400 1600#(Queries)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Com

plet

ion

rate

AMDCAMDC randAMDC pos onlyAMDC no const

UMLS

0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Com

plet

ion

rate


Kinships

0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Com

plet

ion

rate


No neg-AUC

Random

Full AMDCNo constraints

/28

■ Experiments (2/2): Predictive model construction problem

Score: ROC-AUC

□ AMDC often achieves better AUC than Random (K, U)

□ Negative triples are also helpful to improve ROC-AUC

□ Constraints work to prevent overfitting

26

AMDC has achieved the best predictive score

Kinships

0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RO

C-A

UC


Nations

0 200 400 600 800 1000 1200 1400 1600#(Queries)

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RO

C-A

UC


UMLS

0 2000 4000 6000 8000 10000 12000 14000 16000#(Queries)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

RO

C-A

UC


10 trials, (Q, q) = (105, 103) ((2×103,102) for Nations)

No neg-AUC

RandomFull AMDC

No constraints

/28

■ Conclusions□ Manual RDF dataset construction is still demanding

• Some datasets require hand annotation by its nature

• Crowdsourcing provides an easy way of recruiting annotators

⇒ It's time to consider the manual construction problem!

□ AMDC = active learning + multi-relational learning• RESCAL-based multi-relational learning

□ 3 key contributions lead to better performance• Active learning significantly reduces the cost

• Constraints prevents overfitting

• Negative AUC-loss works better in case of skewed datasets27

We consider manual annotation problems of the RDF data

/28

Thank you!

28

Active Learning for Multi-relational Data Construction

Science

Transcript of Active Learning for Multi-relational Data Construction