Cross-Lingual Alignment of Contextual Word Embeddings, with …people.csail.mit.edu › tals ›...

Post on 03-Jul-2020

12 views 0 download

Transcript of Cross-Lingual Alignment of Contextual Word Embeddings, with …people.csail.mit.edu › tals ›...

Cross-Lingual Alignment of Contextual Word Embeddings,

with Applications to Zero-shot Dependency Parsing

Tal Schuster*, Ori Ram*, Regina Barzilay, Amir Globerson

Task: Cross-lingual Zero-shot Dependency Parsing

Cross-Lingual Alignment of Contextual Word Embeddings 2

Goal: Utilize universal space of contextual embeddings

73.9

87.1

60

65

70

75

80

85

90

Many -> Frenchnon-contextual

French -> French

LAS

13.2 Gap

ELMo

Cross-Lingual Alignment of Contextual Word Embeddings 3

Idea: Align Contextual Word Embeddings

warm

English Spanish

calentar

cálido

Cross-Lingual Alignment of Contextual Word Embeddings 4

Idea: Align Contextual Word Embeddings

English Spanish

calentar

cálidowarm

Our Results – zero-shot

Cross-Lingual Alignment of Contextual Word Embeddings 5

By aligning ELMo contextual embeddings

73.9

80.8

87.1

60

65

70

75

80

85

90

Many -> Frenchnon-contextual

Ours French -> French

LAS

13.26.3

ELMo

Cross-Lingual Alignment of Contextual Word Embeddings 6

Problem DefinitionEnglish Spanish

Dictionarybear osowarm cálido

… …

• ELMo embeddings

• POS tags

• ELMo embeddings

• POS tags

Goal: Learn a linear alignment (!)

Cross-Lingual Alignment of Contextual Word Embeddings 7

Problem Definition - ExtensionsEnglish Spanish

Goal: Learn a linear alignment (!)

Dictionarybear osowarm cálido

… …

• ELMo embeddings

• POS tags

• ELMo embeddings

• POS tags

Cross-Lingual Alignment of Contextual Word Embeddings 8

Problem Definition - ExtensionsEnglish Spanish

Dictionarybear osowarm cálido

… …

• ELMo embeddings

• POS tags

• Deficient ELMo embeddings

• POS tags

Goal: Alignment (!) and improve the embeddings

Small

Cross-Lingual Alignment of Contextual Word Embeddings 9

Aligning Embeddings - Static Case

English Spanish

warm

cálido

!"#$ = &!"#'

& = argmin.∈01

∑ !"#$ −&!"#'4

(Mikolov et al., 2013)

Cross-Lingual Alignment of Contextual Word Embeddings 10

Aligning Embeddings - Contextual Case

warmCalentar

Cálido

?

Challenges: 1. Multiple senses per token2. Many representations per senses

Cross-Lingual Alignment of Contextual Word Embeddings 11

The Contextual Component

Fuzz (electric guitar) , distortion effects to create "warm" and "dirty" sounds.

winning just four matches in her Wimbledon warm up tournaments

Sunday was a glorious day , clear and warm.

He was a warm friend of Pope St. Gregory.

• Contextual embeddings of the word “warm”:

Cross-Lingual Alignment of Contextual Word Embeddings 12

Per Token Anchor

!"# = %&[(#,&]warm

Cross-Lingual Alignment of Contextual Word Embeddings 13

Utilizing Lexical Anchors for Alignment

English Spanish

river / río

less / menos

Dictionaryriver ríoless menos

… …

Cross-Lingual Alignment of Contextual Word Embeddings 14

Geometry of the Contextual Space

warm

0.18

• Contextual representation of the same token are clustered together

• The average distance between tokens is larger than within each token

0.85

river

warm

Cross-Lingual Alignment of Contextual Word Embeddings 15

Factorizing the Contextual Embedding

!"# = %&[(#,&]

warm

"#,& = !"# + ,"#,&

Anchor Shift+

Cross-Lingual Alignment of Contextual Word Embeddings 16

Anchor Based Alignment

A. Train ELMo model per language

Cross-Lingual Alignment of Contextual Word Embeddings 17

Anchor Based Alignment

B. Extract anchors!"# = %&[(#,&]

Cross-Lingual Alignment of Contextual Word Embeddings 18

Anchor Based Alignment

C. Compute alignment by anchors

! = argmin)∈+,

∑ ./01 −!./034

Dictionaryriver ríoless menos

… …

Cross-Lingual Alignment of Contextual Word Embeddings 19

Anchor Based Alignment

D. Apply alignment on contextual space

!",$%& = (!",$%)

= ((+!" + -!",$)

Cross-Lingual Alignment of Contextual Word Embeddings 20

Anchor Based AlignmentA. Train ELMo model per language B. Extract anchors

C. Align by anchors D. Apply alignment on contextual space

!"# = %&[(#,&]

+ = argmin2∈45

∑ (#78 −+(#7:;

(#,&78 = +(#,&7:

Cross-Lingual Alignment of Contextual Word Embeddings 21

Potential Problem: Multi-sense Words

bear her name

bear the pain

polar bear cub

teddy bear

• Contextual embeddings of the word “bear”:

Cross-Lingual Alignment of Contextual Word Embeddings 22

The Alignment Works for Multi-sense Words

osotener

bear

Cross-Lingual Alignment of Contextual Word Embeddings 23

The Alignment Works for Multi-sense Words

soil seed bankbattery bankclue bank

bank of the rivereastern bank of …

Cross-Lingual Alignment of Contextual Word Embeddings 24

No DictionaryEnglish Spanish

Dictionarybear osowarm cálido

… …

• ELMo embeddings

• POS tags

• ELMo embeddings

• POS tags

Cross-Lingual Alignment of Contextual Word Embeddings 25

Anchor Based Alignment - Unsupervised

Compute alignment by anchors via adversarial training

Cross-Lingual Alignment of Contextual Word Embeddings 26

Anchor Based Alignment - UnsupervisedA. Train ELMo model per language B. Extract anchors

C. Align by anchors – adversarial training D. Apply alignment on contextual space

!"# = %&[(#,&]

(#,&+, = -(#,&+.

Cross-Lingual Alignment of Contextual Word Embeddings 27

Low Resource Languages

English 2.5MGerman 800k

Spanish 400k

Turkish 100k

Kazakh 3k

articles per language

Cross-Lingual Alignment of Contextual Word Embeddings 28

Low Resource Languages

Dictionary

bear oso

warm cálido

… …

• ELMo embeddings

• POS tags

• Deficient ELMo embeddings

• POS tags

Small

Goal: Alignment (!) and improve the embeddings

Cross-Lingual Alignment of Contextual Word Embeddings 29

Anchored Language Model

A. Extract anchors from English model

!"# = %&[(#,&]

Cross-Lingual Alignment of Contextual Word Embeddings 30

Anchored Language Model

B. Use anchors as seeds for the low resource language

Dictionaryriver ríoless menos

… …

Cross-Lingual Alignment of Contextual Word Embeddings 31

Anchored Language Model

C. Learn language model for low resource language

!",$ − !̅"'

Cross-Lingual Alignment of Contextual Word Embeddings 32

Anchored Language Model

D. Learn and apply finer alignment

Dictionaryriver ríoless menos

… …

! = argmin)∈+,

∑ ./01 −!./034

Cross-Lingual Alignment of Contextual Word Embeddings 33

Anchored Language ModelA. Extract anchors from English model B. Learn language model for low resource language

C. Learn language model for low resource language D. Learn and apply finer alignment !",$%& = (!",$)*!",$ − !̅"

-

• Contextual embeddings (Peters et al., 2018; McCann et al., 2017; Howard and Ruder, 2018; Radford et al., 2018; Devlin et al., 2018)

• Cross-lingual alignment (Mikolov et al., 2013; Smith et al., 2017; Artetxe et al., 2017; Conneau et al., 2018)

• Multilingual parsing (Duong et al., 2015; Guo et al., 2016; Ammar et al., 2016; de Lhoneux et al., 2018; Che et al., 2018; Wang et al., 2018; Clark et al., 2018)

Cross-Lingual Alignment of Contextual Word Embeddings 34

Related Work

Cross-Lingual Alignment of Contextual Word Embeddings35

I prefer the morning flight

Dependency Parsing

Encoder

I prefer the morning flight

• first-order graph-based model (Dozat and Manning, 2017)

Cross-Lingual Alignment of Contextual Word Embeddings 36

Cross-lingual Zero-shot

Model

English…

FrenchSpanish German

Train Test

Cross-Lingual Alignment of Contextual Word Embeddings 37

Cross-lingual Transfer for Dependency Parsing

70.5

74.4

60

65

70

75

80

Average LAS score

Ammar et al.FastText

Cross-Lingual Alignment of Contextual Word Embeddings 38

Cross-lingual Zero-shot

Model

English Spanish German…

French

Train Test

!"#!"$

!%" !&'

Cross-Lingual Alignment of Contextual Word Embeddings 39

Cross-lingual Transfer for Dependency Parsing

70.5

74.477.3

60

65

70

75

80

Average LAS score Ammar et al.FastTextOurs

Cross-Lingual Alignment of Contextual Word Embeddings 40

Cross-lingual Transfer for Dependency Parsing

45

50

55

60

65

70

75

80

85

German Spanish French Italian Portuguese Swedish

LAS score per language Guo et al.Ammar et al.FastTextOurs

Cross-Lingual Alignment of Contextual Word Embeddings 42

Cross-lingual Transfer for Dependency Parsing

70.5

77.375

60

65

70

75

80

Average LAS score

Ammar et al.OursNo dictionary

Cross-Lingual Alignment of Contextual Word Embeddings 44

Cross-lingual Transfer for Dependency Parsing

70.5

77.375

73.1

60

65

70

75

80Average LAS score

Ammar et al.OursNo dictionaryNo POS tags

Cross-Lingual Alignment of Contextual Word Embeddings 45

Low Resource Language

Dictionarybear osowarm cálido

… …

• ELMo embeddings

• POS tags

• Deficient ELMo embeddings

• POS tags

Small

English 10k sentences (vs. 28M)

Cross-Lingual Alignment of Contextual Word Embeddings 46

Low Resource Language (10k sentences)

33.1

42.2

20

25

30

35

40

45

LAS score

4060

600

0

1000

2000

3000

4000

5000

Perplexity

• ELMo embeddings are clustered around their anchor

• Anchor based alignment preserves the contextual component

• Effective for cross-lingual transfer learning (not task-specific)

Cross-Lingual Alignment of Contextual Word Embeddings 47

Conclusions

Code available at:https://github.com/TalSchuster/CrossLingualELMo

https://github.com/TalSchuster/allennlp-MultiLang(soon part of the AllenNLP repo)