Adaptive Joint Learning of Compositional and
Non-Compositional Phrase Embeddings
Kazuma Hashimoto & Yoshimasa Tsuruoka
University of Tokyo
08/08/2016 ACL 2016 in Berlin, Germany
Session 1F: Noncompositionality
• Embedding meanings of phrases into a vector space
– e.g. Recurrent neural networks
Learning Phrase Embeddings
grab attentioncatch eye𝒗 catch eye
phrase embeddings
08/08/2016 ACL 2016 in Berlin, Germany
𝒗 catch 𝒗 eye
Composition function
𝒗 grab 𝒗 attention
𝒗 grab attention
word embeddings
Compositional semantics
2 / 17
• Not all phrases are compositional
Compositional or Non-Compositional?
08/08/2016 ACL 2016 in Berlin, Germany
catch earcatch eye
𝒄 catch eye
𝒗 catch 𝒗 eye
Composition function
catch e-mail
grab attention
catch eyecatch attention
𝒏 catch_eye
Compositionalembedding
Non-compositionalembedding
Selecting one of them is
not always the best option!
3 / 17
This Work: Adaptive Joint Learning
08/08/2016 ACL 2016 in Berlin, Germany
phrase
𝑝
Compositionality detection
𝛼 𝑝 ∈ 0, 1
Compositional embedding
Non-compositional embedding
𝛼 𝑝
1 − 𝛼 𝑝
phrase embedding
𝒄 𝑝
𝒏 𝑝
𝒗 𝑝
State-of-the-art results on Compositionality detection for Verb-Object (VO) pairs Transitive verb disambiguation
𝑝 = 𝑉𝑂
4 / 17
• Very simple weighted addition of 𝒄 𝑝 and 𝒏 𝑝
– 𝑝 : phrase
– 𝛼 𝑝 : compositionality score
Adaptive Weighted Addition
08/08/2016 ACL 2016 in Berlin, Germany
compositional
embedding
weight vector
logistic function
non-compositional
embedding
feature vector
5 / 17
• All model parameters are jointly learned according to
the objective function for a specific task
– The compositionality levels are automatically
adjusted to the task
Overview of the General Training Process
08/08/2016 ACL 2016 in Berlin, Germany
A specifictask
Minimize the objective function
Backpropagation
6 / 17
• Co-occurrence-based Subject-Verb-Object (SVO)
phrase embeddings (Hashimoto and Tsuruoka, 2015)
– Applying our joint learning method to VO phrases
• 𝑝 = 𝑉𝑂 in our method
Focusing on Verb-Object Compositionality
08/08/2016 ACL 2016 in Berlin, Germany
[importer make payment]
7 / 17
𝒄 𝑉𝑂
• Indices (binary features) of
– Verb
– Object
– Verb-object pair
• Count-based features (Venkatapathy and Joshi, 2005)
– Frequency
– PMI
• Effective in detecting the compositionality of verb-
object pairs (McCarthy et al., 2007)
Simple Statistical Features for 𝛼 𝑉𝑂
08/08/2016 ACL 2016 in Berlin, Germany 8 / 17
• Training data
– BNC
• SVO: 1.38M pairs, SVOPN 0.93M pairs
– Wikipedia
• SVO 23.6M pairs, SVOPN 17.3M pairs
– BNC-Wikipedia
• The concatenation of the BNC and Wikipedia data
• 25-dimensional embeddings (and matrices)
– Random initialization
– AdaGrad (Duchi et al., 2011)
Experimental Settings
08/08/2016 ACL 2016 in Berlin, Germany 9 / 17
• Measuring compositionality levels of verb-object pairs
– Human rating: [1, 6]
• Higher score more compositional
–VJ’05: 765 pairs (Venkatapathy and Joshi, 2005)
–MC’07: 638 pairs (McCarthy et al., 2007)
Evaluation 1: Compositionality Detection
08/08/2016 ACL 2016 in Berlin, Germany
System
output
Spearman’srank correlation
Human
rating 𝛼 𝑉𝑂
10 / 17
• Our method outperforms the state-of-the-art
supervised and WordNet-based methods
– Without explicit training data for the task
State of the Art Using 𝛼 𝑉𝑂
08/08/2016 ACL 2016 in Berlin, Germany
Ranking based onPMI and Frequency
Supervised learning with features
11 / 17
Trends of 𝛼 𝑉𝑂 during the Training
08/08/2016 ACL 2016 in Berlin, Germany
• 𝛼 𝑉𝑂 converges according to its corresponding phrase
– Room for improvement when treating light verbs
Compositional
Non-Compositional
𝛼 𝑉𝑂
12 / 17
• The results match our intuition
– 𝛼 𝑉𝑂 is low for phrases which form idiomatic phrases
Which Verbs are Compositional?
08/08/2016 ACL 2016 in Berlin, Germany
Light verbs
13 / 17
• Measuring similarities of transitive verbs paired with
the same subjects and objects
– Human rating: [1, 7]
• Higher score more similar
Evaluation 2: Verb Disambiguation
08/08/2016 ACL 2016 in Berlin, Germany
Transitive verb disambiguation task (GS’11)
phrase pair human rating system output
student write name
student spell name7 0.91
child show sign
child express sign6 0.85
system meet criterion
system visit criterion1 -0.1
Spearman’srank correlation
(Grefenstette and Sadrzadeh, 2011)
cosine distance betweenSVO embeddings
14 / 17
• Outperforming the baseline and other methods
– Our adaptive joint learning method is more
effective than just using compositional embeddings
State of the Art Using Joint Embeddings
08/08/2016 ACL 2016 in Berlin, Germany
Baseline of𝒗 𝑉𝑂 = 𝒄 𝑉𝑂
15 / 17
Examples of Closest Neighbor Embeddings
08/08/2016 ACL 2016 in Berlin, Germany
should be higher
capturingidiomatic aspect
Baseline of 𝒗 𝑉𝑂 = 𝒄 𝑉𝑂
16 / 17
• Adaptive joint learning of
– Compositional and non-compositional embeddings
– Compositionality detection function
• State-of-the-art results on compositionality
detection and transitive verb disambiguation tasks
• Future work:
– Context-sensitive compositionality detection
– General types of phrases
Summary
08/08/2016 ACL 2016 in Berlin, Germany 17 / 17
Top Related