Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

37
Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang & Chris Meek Microsoft Research Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

description

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction. Kai-Wei Chang, Scott Wen-tau Yih , Bishan Yang & Chris Meek Microsoft Research. Knowledge Base. Captures world knowledge by storing properties of millions of entities, as well as relations among them. - PowerPoint PPT Presentation

Transcript of Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Page 1: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Kai-Wei Chang, Scott Wen-tau Yih, Bishan Yang & Chris MeekMicrosoft Research

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Page 2: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Useful resources for NLP applicationsโ€ข Semantic Parsing & Question Answering [e.g., Berant+,

2014] โ€ข Information Extraction [Riedel+, 2013]

Knowledge Base

FreebaseDBpedia

YAGONELL

OpenIE/ReVerb

โ€ข Captures world knowledge by storing properties of millions of entities, as well as relations among them

Page 3: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Knowledge base is never complete!โ€ข Extract previously unknown facts from new corporaโ€ข Predict new facts via inference

โ€ข Modeling multi-relational dataโ€ข Statistical relational learning [Getoor & Taskar, 2007]โ€ข Path ranking methods (e.g., random walk) [e.g., Lao+ 2011]โ€ขKnowledge base embeddingโ€ขVery efficientโ€ขBetter prediction accuracy

Reasoning with Knowledge Base

Page 4: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Each entity in a KB is represented by an vectorโ€ข Predict whether is true by โ€ข Linear: or Bilinear:

โ€ข Recent work on KB embeddingโ€ขRESCAL [Nickel+, ICML-11], SME [Bordes+, AISTATS-12], NTN [Socher+, NIPS-13], TransE [Bordes+, NIPS-13]โ€ข Train on existing facts (e.g., triples)โ€ข Ignore relational domain knowledge available in the KB (e.g., ontology)

Knowledge Base Embedding

Page 5: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Example โ€“ type constraint can be true only if

โ€ข Example โ€“ common sense can be true only if

Relational Domain Knowledge

Page 6: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข KB embedding via Tensor Decompositionโ€ข Entity vector, Relation matrix

โ€ข Relational domain knowledgeโ€ข Type information and constraintsโ€ขOnly legitimate entities are included in the loss

โ€ข Benefits of leveraging type informationโ€ข Faster model training timeโ€ขHighly scalable to large KBโ€ขHigher prediction accuracy

โ€ข Application to Relation Extraction

Typed Tensor Decomposition โ€“ TRESCAL

Page 7: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Introductionโ€ข KB embedding via Tensor Decompositionโ€ข Typed tensor decomposition (TRESCAL)โ€ข Experimentsโ€ข Discussion & Conclusions

Road Map

Page 8: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Collection of subj-pred-obj triples โ€“

Knowledge Base Representation (1/2)

Subject Predicate Object

Obama Born-in Hawaii

Bill Gates Nationality USA

Bill Clinton

Spouse-of Hillary Clinton

Satya Nadella

Work-at Microsoft

โ€ฆ โ€ฆ โ€ฆ

Page 9: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Collection of subj-pred-obj triples โ€“

Knowledge Base Representation (1/2)

Subject Predicate Object

Obama Born-in Hawaii

Bill Gates Nationality USA

Bill Clinton

Spouse-of Hillary Clinton

Satya Nadella

Work-at Microsoft

โ€ฆ โ€ฆ โ€ฆ

: # entities, : # relations

Page 10: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Knowledge Base Representation (2/2)

e1 ยซ en

e 1 ยซ

e n ฯ‡ฯ‡k ๐’ณ๐‘˜

: born-in

Hawaii

Obama 1

-th slice

Page 11: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Knowledge Base Representation (2/2)

e1 ยซ en

e 1 ยซ

e n ฯ‡ฯ‡k ๐’ณ๐‘˜

: born-in

Hawaii

Obama 1

-th slice

A zero entry means either:โ€ข Incorrect (false)โ€ข Unknown

Page 12: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Objective:

Tensor Decomposition Objective

~~ ร— ร—

๐’ณ๐‘˜ ๐€๐€๐‘‡โ„›๐‘˜

12 (โˆ‘

๐‘˜โ€–๐’ณ๐‘˜โˆ’๐€โ„› ๐‘˜๐€

๐‘‡โ€–๐น2 )+ 1

2 (โ€–๐ดโ€–๐น2+โˆ‘

๐‘˜โ€–โ„›๐‘˜โ€–๐น

2 )

RESCAL [Nickel+, ICML-11]

Reconstruction Error Regularization

-th relation

Page 13: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Measure the Degree of a Relationship

ร— ร—

๐€๐€๐‘‡โ„›born โˆ’ in

Hawaii

Obama

Page 14: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Introductionโ€ข KB embedding via Tensor Decompositionโ€ข Typed tensor decomposition (TRESCAL)โ€ขBasic ideaโ€ข Training procedureโ€ขComplexity analysisโ€ข Experimentsโ€ข Discussion & Conclusions

Road Map

Page 15: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Reconstruction error:

Typed Tensor Decomposition Objective

12โˆ‘๐‘˜ โ€–๐’ณ๐‘˜โˆ’๐€โ„›๐‘˜๐€

๐‘‡โ€–๐น2

~~ ร— ร—

๐’ณ๐‘˜ ๐€๐€๐‘‡โ„›๐‘˜

Page 16: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Reconstruction error:

Typed Tensor Decomposition Objective

12โˆ‘๐‘˜ โ€–๐’ณ๐‘˜โˆ’๐€โ„›๐‘˜๐€

๐‘‡โ€–๐น2

~~ ร— ร—

๐’ณ๐‘˜ ๐€๐€๐‘‡โ„›๐‘˜

Relation: born-in

Page 17: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Reconstruction error:

Typed Tensor Decomposition Objective

12โˆ‘๐‘˜ โ€–๐’ณ๐‘˜โˆ’๐€โ„›๐‘˜๐€

๐‘‡โ€–๐น2

~~ ร— ร—

๐’ณ๐‘˜ ๐€๐€๐‘‡โ„›๐‘˜

people Relation: born-in

Page 18: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Reconstruction error:

Typed Tensor Decomposition Objective

12โˆ‘๐‘˜ โ€–๐’ณ๐‘˜โˆ’๐€โ„›๐‘˜๐€

๐‘‡โ€–๐น2

~~ ร— ร—

๐’ณ๐‘˜ ๐€๐€๐‘‡โ„›๐‘˜

locations

people Relation: born-in

Page 19: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Reconstruction error:

Typed Tensor Decomposition Objective

12โˆ‘๐‘˜ โ€–๐’ณ๐‘˜

โ€ฒ โˆ’๐€๐‘˜๐‘™โ„›๐‘˜๐€๐‘˜๐‘Ÿ

๐‘‡โ€–๐น2

~~ ร— ร—

๐’ณ๐‘˜โ€ฒ ๐€๐‘˜๐‘™ ๐€๐‘˜๐‘Ÿ

๐‘‡โ„›๐‘˜

Page 20: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure โ€“ Alternating Least-Squares (ALS) Method

๐€โ†[โˆ‘๐‘˜ ๐’ณ๐‘˜๐€โ„›๐‘˜๐‘‡+๐’ณ๐‘˜

๐‘‡๐€โ„› ๐‘˜] [โˆ‘๐‘˜ ๐ต๐‘˜+๐ถ๐‘˜+๐œ†๐ˆ ]โˆ’1

where .

๐ฏ๐ž๐œ (โ„›๐‘˜ ) โ† (๐™T ๐™+๐œ†๐ˆ )โˆ’1๐™T ๐ฏ๐ž๐œ (๐’ณ๐‘˜ )

where is vectorization, and is the Kronecker product.

Fix , update

Fix , update

Page 21: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure โ€“ Alternating Least-Squares (ALS) Method

๐€โ†[โˆ‘๐‘˜ ๐’ณ๐‘˜๐€โ„›๐‘˜๐‘‡+๐’ณ๐‘˜

๐‘‡๐€โ„› ๐‘˜] [โˆ‘๐‘˜ ๐ต๐‘˜+๐ถ๐‘˜+๐œ†๐ˆ ]โˆ’1

where .

๐ฏ๐ž๐œ (โ„›๐‘˜ ) โ† (๐™T ๐™+๐œ†๐ˆ )โˆ’1๐™T ๐ฏ๐ž๐œ (๐’ณ๐‘˜ )

where is vectorization, and is the Kronecker product.

Fix , update

Page 22: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure โ€“ Alternating Least-Squares (ALS) Method

๐€โ†[โˆ‘๐‘˜ ๐’ณ๐‘˜๐€โ„›๐‘˜๐‘‡+๐’ณ๐‘˜

๐‘‡๐€โ„› ๐‘˜] [โˆ‘๐‘˜ ๐ต๐‘˜+๐ถ๐‘˜+๐œ†๐ˆ ]โˆ’1

where .

๐ฏ๐ž๐œ (โ„›๐‘˜ ) โ† (๐™T ๐™+๐œ†๐ˆ )โˆ’1๐™T ๐ฏ๐ž๐œ (๐’ณ๐‘˜ )

where is vectorization, and is the Kronecker product.

Page 23: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Procedure โ€“ Alternating Least-Squares (ALS) Method

๐€โ†[โˆ‘๐‘˜ ๐’ณ๐‘˜โ€ฒ ๐€๐‘˜๐‘Ÿ

โ„›๐‘˜T+๐’ณ๐‘˜

โ€ฒ T๐€๐‘˜๐‘™โ„›๐‘˜] [โˆ‘๐‘˜ ๐ต๐‘˜๐‘Ÿ

+๐ถ๐‘˜๐‘™+๐œ†๐ˆ ]โˆ’ 1

where .

๐ฏ๐ž๐œ (โ„›๐‘˜ ) โ† (๐€๐‘˜๐‘Ÿ

T ๐€๐‘˜๐‘Ÿโจ‚๐€๐‘˜๐‘™

T ๐€๐‘˜๐‘™+๐œ†๐ˆ )โˆ’๐Ÿ

ร— ๐ฏ๐ž๐œ (๐€๐‘˜๐‘™

T ๐’ณ๐‘˜โ€ฒ ๐€๐‘˜๐‘Ÿ

)

Page 24: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Without Type information (RESCAL): โ€ข : # entitiesโ€ข : # non-zero entriesโ€ข : # dimensions of projected entity vectors

โ€ข With Type information (TRESCAL): โ€ข : average # entities satisfying the type constraint

Complexity Analysis

Page 25: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Introductionโ€ข KB embedding via Tensor Decompositionโ€ข Typed tensor decomposition (TRESCAL)โ€ข Experimentsโ€ขKB Completionโ€ขApplication to Relation Extractionโ€ข Discussion & Conclusions

Road Map

Page 26: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข KB โ€“ Never Ending Language Learning (NELL)โ€ข Training: version 165โ€ขDeveloping: new facts between v.166 and v.533โ€ข Testing: new facts between v.534 and v.745

โ€ข Data statistics of the training set

Experiments โ€“ KB Completion

# Entities 753k

# Relation Types 229

# Entity Types 300

# Entity-Relation Triples 1.8M

Page 27: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Entity Retrieval: โ€ขOne positive entity with 100 negative entitiesโ€ข Relation Retrieval: โ€ข Positive entity pairs with equal number of negative pairs

โ€ข Baselines:

Tasks & Baselines

RESCAL[Nickel+, ICML-11]

๐‘’๐‘– ๐‘’ ๐‘—

๐‘Ÿ๐‘˜

TransE[Bordes+, NIPS-13]

Page 28: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Time Reduction

โ€ข Both models finish training in 10 iterations.โ€ข TRESCAL filters 96% entity triples with incompatible

types.

1

2

0 5 10 15 20 25

4.46

20.5

Model Training Time (hours)

4.6x speed-up

Page 29: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Training Time Reduction

โ€ข # iterations for TransE is set to 500 (the default value).

1

2

0 10 20 30 40 50 60 70 80 90 100

4.46

96

Model Training Time (hours)

21.5x speed-up

Page 30: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Entity Retrieval

1 2 358.0%

60.0%

62.0%

64.0%

66.0%

68.0%

70.0%

72.0%

67.56%

62.91%

69.26%

Mean Average Precision (MAP)

Page 31: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Relation Retrieval

1 2 368.0%

70.0%

72.0%

74.0%

76.0%

78.0%

70.71%

73.08%

75.70%

Mean Average Precision (MAP)

Page 32: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Experiments โ€“ Relation Extraction

Satya Nadella is the CEO of Microsoft.

(Satya Nadella , work-at, Microsoft)

Page 33: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Row: Entity Pairโ€ข Column:

Relation

Relation Extraction as Matrix Factorization[Riedel+ 13]

Fig.1 of [Riedel+ 13]

Page 34: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข Raw data: NY Times corpus & Freebaseโ€ข Entities in NY Times and Freebase are alignedโ€ข Raw tensor constructionโ€ข 80,698 entities & 1,652 relationsโ€ข Type information from Freebase & NERโ€ข Type constraints are derived from training data

โ€ข Task โ€“ identify FB relations of entity pairs in textโ€ข 10,000 entity pairs: 2,048 have both entities in FBโ€ข Evaluation metric โ€“ Weighted mean average precision (MAP) on 19 relations

Data & Task Description

Page 35: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Relation Extraction

1 2 3 4 50.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.490.52

0.58

0.70.72

Chart Title

โ€ข Evaluated using only 2,048 FB entity pairs

[updated version]

Page 36: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

Relation Extraction

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.330.36

0.39

0.47

0.57

Chart Title

โ€ข Evaluated using all 10,000 entity pairs

Page 37: Typed Tensor Decomposition of Knowledge Bases for  Relation Extraction

โ€ข TRESCAL: A KB embedding model via tensor decompositionโ€ข Leverages entity type constraintโ€ขFaster model training timeโ€ขHighly scalable to large KBโ€ขHigher prediction accuracyโ€ขApplication to relation extraction

โ€ข Challenges & Future Workโ€ขCapture more types of relational domain knowledge โ€ข Support more sophisticated inferential tasks

Conclusions