Post on 03-Jul-2020
Cross-Lingual Alignment of Contextual Word Embeddings,
with Applications to Zero-shot Dependency Parsing
Tal Schuster*, Ori Ram*, Regina Barzilay, Amir Globerson
Task: Cross-lingual Zero-shot Dependency Parsing
Cross-Lingual Alignment of Contextual Word Embeddings 2
Goal: Utilize universal space of contextual embeddings
73.9
87.1
60
65
70
75
80
85
90
Many -> Frenchnon-contextual
French -> French
LAS
13.2 Gap
ELMo
Cross-Lingual Alignment of Contextual Word Embeddings 3
Idea: Align Contextual Word Embeddings
warm
English Spanish
calentar
cálido
Cross-Lingual Alignment of Contextual Word Embeddings 4
Idea: Align Contextual Word Embeddings
English Spanish
calentar
cálidowarm
Our Results – zero-shot
Cross-Lingual Alignment of Contextual Word Embeddings 5
By aligning ELMo contextual embeddings
73.9
80.8
87.1
60
65
70
75
80
85
90
Many -> Frenchnon-contextual
Ours French -> French
LAS
13.26.3
ELMo
Cross-Lingual Alignment of Contextual Word Embeddings 6
Problem DefinitionEnglish Spanish
Dictionarybear osowarm cálido
… …
•
• ELMo embeddings
• POS tags
•
• ELMo embeddings
• POS tags
Goal: Learn a linear alignment (!)
Cross-Lingual Alignment of Contextual Word Embeddings 7
Problem Definition - ExtensionsEnglish Spanish
Goal: Learn a linear alignment (!)
Dictionarybear osowarm cálido
… …
•
• ELMo embeddings
• POS tags
•
• ELMo embeddings
• POS tags
Cross-Lingual Alignment of Contextual Word Embeddings 8
Problem Definition - ExtensionsEnglish Spanish
Dictionarybear osowarm cálido
… …
•
• ELMo embeddings
• POS tags
•
• Deficient ELMo embeddings
• POS tags
Goal: Alignment (!) and improve the embeddings
Small
Cross-Lingual Alignment of Contextual Word Embeddings 9
Aligning Embeddings - Static Case
English Spanish
warm
cálido
!"#$ = &!"#'
& = argmin.∈01
∑ !"#$ −&!"#'4
(Mikolov et al., 2013)
Cross-Lingual Alignment of Contextual Word Embeddings 10
Aligning Embeddings - Contextual Case
warmCalentar
Cálido
?
Challenges: 1. Multiple senses per token2. Many representations per senses
Cross-Lingual Alignment of Contextual Word Embeddings 11
The Contextual Component
Fuzz (electric guitar) , distortion effects to create "warm" and "dirty" sounds.
winning just four matches in her Wimbledon warm up tournaments
Sunday was a glorious day , clear and warm.
He was a warm friend of Pope St. Gregory.
• Contextual embeddings of the word “warm”:
Cross-Lingual Alignment of Contextual Word Embeddings 12
Per Token Anchor
!"# = %&[(#,&]warm
Cross-Lingual Alignment of Contextual Word Embeddings 13
Utilizing Lexical Anchors for Alignment
English Spanish
river / río
less / menos
Dictionaryriver ríoless menos
… …
Cross-Lingual Alignment of Contextual Word Embeddings 14
Geometry of the Contextual Space
warm
0.18
• Contextual representation of the same token are clustered together
• The average distance between tokens is larger than within each token
0.85
river
warm
Cross-Lingual Alignment of Contextual Word Embeddings 15
Factorizing the Contextual Embedding
!"# = %&[(#,&]
warm
"#,& = !"# + ,"#,&
Anchor Shift+
Cross-Lingual Alignment of Contextual Word Embeddings 16
Anchor Based Alignment
A. Train ELMo model per language
Cross-Lingual Alignment of Contextual Word Embeddings 17
Anchor Based Alignment
B. Extract anchors!"# = %&[(#,&]
Cross-Lingual Alignment of Contextual Word Embeddings 18
Anchor Based Alignment
C. Compute alignment by anchors
! = argmin)∈+,
∑ ./01 −!./034
Dictionaryriver ríoless menos
… …
Cross-Lingual Alignment of Contextual Word Embeddings 19
Anchor Based Alignment
D. Apply alignment on contextual space
!",$%& = (!",$%)
= ((+!" + -!",$)
Cross-Lingual Alignment of Contextual Word Embeddings 20
Anchor Based AlignmentA. Train ELMo model per language B. Extract anchors
C. Align by anchors D. Apply alignment on contextual space
!"# = %&[(#,&]
+ = argmin2∈45
∑ (#78 −+(#7:;
(#,&78 = +(#,&7:
Cross-Lingual Alignment of Contextual Word Embeddings 21
Potential Problem: Multi-sense Words
bear her name
bear the pain
polar bear cub
teddy bear
• Contextual embeddings of the word “bear”:
Cross-Lingual Alignment of Contextual Word Embeddings 22
The Alignment Works for Multi-sense Words
osotener
bear
Cross-Lingual Alignment of Contextual Word Embeddings 23
The Alignment Works for Multi-sense Words
soil seed bankbattery bankclue bank
bank of the rivereastern bank of …
Cross-Lingual Alignment of Contextual Word Embeddings 24
No DictionaryEnglish Spanish
Dictionarybear osowarm cálido
… …
•
• ELMo embeddings
• POS tags
•
• ELMo embeddings
• POS tags
Cross-Lingual Alignment of Contextual Word Embeddings 25
Anchor Based Alignment - Unsupervised
Compute alignment by anchors via adversarial training
Cross-Lingual Alignment of Contextual Word Embeddings 26
Anchor Based Alignment - UnsupervisedA. Train ELMo model per language B. Extract anchors
C. Align by anchors – adversarial training D. Apply alignment on contextual space
!"# = %&[(#,&]
(#,&+, = -(#,&+.
Cross-Lingual Alignment of Contextual Word Embeddings 27
Low Resource Languages
English 2.5MGerman 800k
Spanish 400k
Turkish 100k
Kazakh 3k
articles per language
Cross-Lingual Alignment of Contextual Word Embeddings 28
Low Resource Languages
Dictionary
bear oso
warm cálido
… …
•
• ELMo embeddings
• POS tags
•
• Deficient ELMo embeddings
• POS tags
Small
Goal: Alignment (!) and improve the embeddings
Cross-Lingual Alignment of Contextual Word Embeddings 29
Anchored Language Model
A. Extract anchors from English model
!"# = %&[(#,&]
Cross-Lingual Alignment of Contextual Word Embeddings 30
Anchored Language Model
B. Use anchors as seeds for the low resource language
Dictionaryriver ríoless menos
… …
Cross-Lingual Alignment of Contextual Word Embeddings 31
Anchored Language Model
C. Learn language model for low resource language
!",$ − !̅"'
Cross-Lingual Alignment of Contextual Word Embeddings 32
Anchored Language Model
D. Learn and apply finer alignment
Dictionaryriver ríoless menos
… …
! = argmin)∈+,
∑ ./01 −!./034
Cross-Lingual Alignment of Contextual Word Embeddings 33
Anchored Language ModelA. Extract anchors from English model B. Learn language model for low resource language
C. Learn language model for low resource language D. Learn and apply finer alignment !",$%& = (!",$)*!",$ − !̅"
-
• Contextual embeddings (Peters et al., 2018; McCann et al., 2017; Howard and Ruder, 2018; Radford et al., 2018; Devlin et al., 2018)
• Cross-lingual alignment (Mikolov et al., 2013; Smith et al., 2017; Artetxe et al., 2017; Conneau et al., 2018)
• Multilingual parsing (Duong et al., 2015; Guo et al., 2016; Ammar et al., 2016; de Lhoneux et al., 2018; Che et al., 2018; Wang et al., 2018; Clark et al., 2018)
Cross-Lingual Alignment of Contextual Word Embeddings 34
Related Work
Cross-Lingual Alignment of Contextual Word Embeddings35
I prefer the morning flight
Dependency Parsing
Encoder
I prefer the morning flight
• first-order graph-based model (Dozat and Manning, 2017)
Cross-Lingual Alignment of Contextual Word Embeddings 36
Cross-lingual Zero-shot
Model
English…
FrenchSpanish German
Train Test
Cross-Lingual Alignment of Contextual Word Embeddings 37
Cross-lingual Transfer for Dependency Parsing
70.5
74.4
60
65
70
75
80
Average LAS score
Ammar et al.FastText
Cross-Lingual Alignment of Contextual Word Embeddings 38
Cross-lingual Zero-shot
Model
English Spanish German…
French
Train Test
!"#!"$
!%" !&'
Cross-Lingual Alignment of Contextual Word Embeddings 39
Cross-lingual Transfer for Dependency Parsing
70.5
74.477.3
60
65
70
75
80
Average LAS score Ammar et al.FastTextOurs
Cross-Lingual Alignment of Contextual Word Embeddings 40
Cross-lingual Transfer for Dependency Parsing
45
50
55
60
65
70
75
80
85
German Spanish French Italian Portuguese Swedish
LAS score per language Guo et al.Ammar et al.FastTextOurs
Cross-Lingual Alignment of Contextual Word Embeddings 42
Cross-lingual Transfer for Dependency Parsing
70.5
77.375
60
65
70
75
80
Average LAS score
Ammar et al.OursNo dictionary
Cross-Lingual Alignment of Contextual Word Embeddings 44
Cross-lingual Transfer for Dependency Parsing
70.5
77.375
73.1
60
65
70
75
80Average LAS score
Ammar et al.OursNo dictionaryNo POS tags
Cross-Lingual Alignment of Contextual Word Embeddings 45
Low Resource Language
Dictionarybear osowarm cálido
… …
•
• ELMo embeddings
• POS tags
•
• Deficient ELMo embeddings
• POS tags
Small
English 10k sentences (vs. 28M)
Cross-Lingual Alignment of Contextual Word Embeddings 46
Low Resource Language (10k sentences)
33.1
42.2
20
25
30
35
40
45
LAS score
4060
600
0
1000
2000
3000
4000
5000
Perplexity
• ELMo embeddings are clustered around their anchor
• Anchor based alignment preserves the contextual component
• Effective for cross-lingual transfer learning (not task-specific)
Cross-Lingual Alignment of Contextual Word Embeddings 47
Conclusions
Code available at:https://github.com/TalSchuster/CrossLingualELMo
https://github.com/TalSchuster/allennlp-MultiLang(soon part of the AllenNLP repo)