Download - Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings Kazuma Hashimoto & Yoshimasa Tsuruoka University

Adaptive Joint Learning of Compositional and

Non-Compositional Phrase Embeddings

Kazuma Hashimoto & Yoshimasa Tsuruoka

University of Tokyo

08/08/2016 ACL 2016 in Berlin, Germany

Session 1F: Noncompositionality

• Embedding meanings of phrases into a vector space

– e.g. Recurrent neural networks

Learning Phrase Embeddings

grab attentioncatch eye𝒗 catch eye

phrase embeddings


𝒗 catch 𝒗 eye

Composition function

𝒗 grab 𝒗 attention

𝒗 grab attention

word embeddings

Compositional semantics

2 / 17

• Not all phrases are compositional

Compositional or Non-Compositional?


catch earcatch eye

𝒄 catch eye

𝒗 catch 𝒗 eye

Composition function

catch e-mail

grab attention

catch eyecatch attention

𝒏 catch_eye

Compositionalembedding

Non-compositionalembedding

Selecting one of them is

not always the best option!

3 / 17

This Work: Adaptive Joint Learning


phrase

𝑝

Compositionality detection

𝛼 𝑝 ∈ 0, 1

Compositional embedding

Non-compositional embedding

𝛼 𝑝

1 − 𝛼 𝑝

phrase embedding

𝒄 𝑝

𝒏 𝑝

𝒗 𝑝

State-of-the-art results on Compositionality detection for Verb-Object (VO) pairs Transitive verb disambiguation

𝑝 = 𝑉𝑂

4 / 17

• Very simple weighted addition of 𝒄 𝑝 and 𝒏 𝑝

– 𝑝 : phrase

– 𝛼 𝑝 : compositionality score

Adaptive Weighted Addition


compositional

embedding

weight vector

logistic function

non-compositional

embedding

feature vector

5 / 17

• All model parameters are jointly learned according to

the objective function for a specific task

– The compositionality levels are automatically

adjusted to the task

Overview of the General Training Process


A specifictask

Minimize the objective function

Backpropagation

6 / 17

• Co-occurrence-based Subject-Verb-Object (SVO)

phrase embeddings (Hashimoto and Tsuruoka, 2015)

– Applying our joint learning method to VO phrases

• 𝑝 = 𝑉𝑂 in our method

Focusing on Verb-Object Compositionality


[importer make payment]

7 / 17

𝒄 𝑉𝑂

• Indices (binary features) of

– Verb

– Object

– Verb-object pair

• Count-based features (Venkatapathy and Joshi, 2005)

– Frequency

– PMI

• Effective in detecting the compositionality of verb-

object pairs (McCarthy et al., 2007)

Simple Statistical Features for 𝛼 𝑉𝑂

08/08/2016 ACL 2016 in Berlin, Germany 8 / 17

• Training data

– BNC

• SVO: 1.38M pairs, SVOPN 0.93M pairs

– Wikipedia

• SVO 23.6M pairs, SVOPN 17.3M pairs

– BNC-Wikipedia

• The concatenation of the BNC and Wikipedia data

• 25-dimensional embeddings (and matrices)

– Random initialization

– AdaGrad (Duchi et al., 2011)

Experimental Settings


• Measuring compositionality levels of verb-object pairs

– Human rating: [1, 6]

• Higher score more compositional

–VJ’05: 765 pairs (Venkatapathy and Joshi, 2005)

–MC’07: 638 pairs (McCarthy et al., 2007)

Evaluation 1: Compositionality Detection


System

output

Spearman’srank correlation

Human

rating 𝛼 𝑉𝑂

10 / 17

• Our method outperforms the state-of-the-art

supervised and WordNet-based methods

– Without explicit training data for the task

State of the Art Using 𝛼 𝑉𝑂


Ranking based onPMI and Frequency

Supervised learning with features

11 / 17

Trends of 𝛼 𝑉𝑂 during the Training


• 𝛼 𝑉𝑂 converges according to its corresponding phrase

– Room for improvement when treating light verbs

Compositional

Non-Compositional

𝛼 𝑉𝑂

12 / 17

• The results match our intuition

– 𝛼 𝑉𝑂 is low for phrases which form idiomatic phrases

Which Verbs are Compositional?


Light verbs

13 / 17

• Measuring similarities of transitive verbs paired with

the same subjects and objects

– Human rating: [1, 7]

• Higher score more similar

Evaluation 2: Verb Disambiguation


Transitive verb disambiguation task (GS’11)

phrase pair human rating system output

student write name

student spell name7 0.91

child show sign

child express sign6 0.85

system meet criterion

system visit criterion1 -0.1

Spearman’srank correlation

(Grefenstette and Sadrzadeh, 2011)

cosine distance betweenSVO embeddings

14 / 17

• Outperforming the baseline and other methods

– Our adaptive joint learning method is more

effective than just using compositional embeddings

State of the Art Using Joint Embeddings


Baseline of𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

15 / 17

Examples of Closest Neighbor Embeddings


should be higher

capturingidiomatic aspect

Baseline of 𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

16 / 17

• Adaptive joint learning of

– Compositional and non-compositional embeddings

– Compositionality detection function

• State-of-the-art results on compositionality

detection and transitive verb disambiguation tasks

• Future work:

– Context-sensitive compositionality detection

– General types of phrases

Summary