Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional...

17
Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University of Tokyo 08/08/2016 ACL 2016 in Berlin, Germany Session 1F: Noncompositionality

Transcript of Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional...

Page 1: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

Adaptive Joint Learning of Compositional and

Non-Compositional Phrase Embeddings

Kazuma Hashimoto & Yoshimasa Tsuruoka

University of Tokyo

08/08/2016 ACL 2016 in Berlin, Germany

Session 1F: Noncompositionality

Page 2: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Embedding meanings of phrases into a vector space

– e.g. Recurrent neural networks

Learning Phrase Embeddings

grab attentioncatch eye𝒗 catch eye

phrase embeddings

08/08/2016 ACL 2016 in Berlin, Germany

𝒗 catch 𝒗 eye

Composition function

𝒗 grab 𝒗 attention

𝒗 grab attention

word embeddings

Compositional semantics

2 / 17

Page 3: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Not all phrases are compositional

Compositional or Non-Compositional?

08/08/2016 ACL 2016 in Berlin, Germany

catch earcatch eye

𝒄 catch eye

𝒗 catch 𝒗 eye

Composition function

catch e-mail

grab attention

catch eyecatch attention

𝒏 catch_eye

Compositionalembedding

Non-compositionalembedding

Selecting one of them is

not always the best option!

3 / 17

Page 4: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

This Work: Adaptive Joint Learning

08/08/2016 ACL 2016 in Berlin, Germany

phrase

𝑝

Compositionality detection

𝛼 𝑝 ∈ 0, 1

Compositional embedding

Non-compositional embedding

𝛼 𝑝

1 − 𝛼 𝑝

phrase embedding

𝒄 𝑝

𝒏 𝑝

𝒗 𝑝

State-of-the-art results on Compositionality detection for Verb-Object (VO) pairs Transitive verb disambiguation

𝑝 = 𝑉𝑂

4 / 17

Page 5: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Very simple weighted addition of 𝒄 𝑝 and 𝒏 𝑝

– 𝑝 : phrase

– 𝛼 𝑝 : compositionality score

Adaptive Weighted Addition

08/08/2016 ACL 2016 in Berlin, Germany

compositional

embedding

weight vector

logistic function

non-compositional

embedding

feature vector

5 / 17

Page 6: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• All model parameters are jointly learned according to

the objective function for a specific task

– The compositionality levels are automatically

adjusted to the task

Overview of the General Training Process

08/08/2016 ACL 2016 in Berlin, Germany

A specifictask

Minimize the objective function

Backpropagation

6 / 17

Page 7: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Co-occurrence-based Subject-Verb-Object (SVO)

phrase embeddings (Hashimoto and Tsuruoka, 2015)

– Applying our joint learning method to VO phrases

• 𝑝 = 𝑉𝑂 in our method

Focusing on Verb-Object Compositionality

08/08/2016 ACL 2016 in Berlin, Germany

[importer make payment]

7 / 17

𝒄 𝑉𝑂

Page 8: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Indices (binary features) of

– Verb

– Object

– Verb-object pair

• Count-based features (Venkatapathy and Joshi, 2005)

– Frequency

– PMI

• Effective in detecting the compositionality of verb-

object pairs (McCarthy et al., 2007)

Simple Statistical Features for 𝛼 𝑉𝑂

08/08/2016 ACL 2016 in Berlin, Germany 8 / 17

Page 9: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Training data

– BNC

• SVO: 1.38M pairs, SVOPN 0.93M pairs

– Wikipedia

• SVO 23.6M pairs, SVOPN 17.3M pairs

– BNC-Wikipedia

• The concatenation of the BNC and Wikipedia data

• 25-dimensional embeddings (and matrices)

– Random initialization

– AdaGrad (Duchi et al., 2011)

Experimental Settings

08/08/2016 ACL 2016 in Berlin, Germany 9 / 17

Page 10: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Measuring compositionality levels of verb-object pairs

– Human rating: [1, 6]

• Higher score more compositional

–VJ’05: 765 pairs (Venkatapathy and Joshi, 2005)

–MC’07: 638 pairs (McCarthy et al., 2007)

Evaluation 1: Compositionality Detection

08/08/2016 ACL 2016 in Berlin, Germany

System

output

Spearman’srank correlation

Human

rating 𝛼 𝑉𝑂

10 / 17

Page 11: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Our method outperforms the state-of-the-art

supervised and WordNet-based methods

– Without explicit training data for the task

State of the Art Using 𝛼 𝑉𝑂

08/08/2016 ACL 2016 in Berlin, Germany

Ranking based onPMI and Frequency

Supervised learning with features

11 / 17

Page 12: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

Trends of 𝛼 𝑉𝑂 during the Training

08/08/2016 ACL 2016 in Berlin, Germany

• 𝛼 𝑉𝑂 converges according to its corresponding phrase

– Room for improvement when treating light verbs

Compositional

Non-Compositional

𝛼 𝑉𝑂

12 / 17

Page 13: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• The results match our intuition

– 𝛼 𝑉𝑂 is low for phrases which form idiomatic phrases

Which Verbs are Compositional?

08/08/2016 ACL 2016 in Berlin, Germany

Light verbs

13 / 17

Page 14: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Measuring similarities of transitive verbs paired with

the same subjects and objects

– Human rating: [1, 7]

• Higher score more similar

Evaluation 2: Verb Disambiguation

08/08/2016 ACL 2016 in Berlin, Germany

Transitive verb disambiguation task (GS’11)

phrase pair human rating system output

student write name

student spell name7 0.91

child show sign

child express sign6 0.85

system meet criterion

system visit criterion1 -0.1

Spearman’srank correlation

(Grefenstette and Sadrzadeh, 2011)

cosine distance betweenSVO embeddings

14 / 17

Page 15: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Outperforming the baseline and other methods

– Our adaptive joint learning method is more

effective than just using compositional embeddings

State of the Art Using Joint Embeddings

08/08/2016 ACL 2016 in Berlin, Germany

Baseline of𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

15 / 17

Page 16: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

Examples of Closest Neighbor Embeddings

08/08/2016 ACL 2016 in Berlin, Germany

should be higher

capturingidiomatic aspect

Baseline of 𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

16 / 17

Page 17: Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

• Adaptive joint learning of

– Compositional and non-compositional embeddings

– Compositionality detection function

• State-of-the-art results on compositionality

detection and transitive verb disambiguation tasks

• Future work:

– Context-sensitive compositionality detection

– General types of phrases

Summary

08/08/2016 ACL 2016 in Berlin, Germany 17 / 17