A Neural Reordering Model for Phrase-based...

81
A Neural Reordering Model for Phrase-based Translation Peng Li Tsinghua University [email protected] joint work with Yang Liu, Maosong Sun, Tatsuya Izuha, Dakun Zhang

Transcript of A Neural Reordering Model for Phrase-based...

Page 1: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

A Neural Reordering Model for Phrase-based Translation

Peng Li Tsinghua University

[email protected]

joint work with Yang Liu, Maosong Sun, Tatsuya Izuha, Dakun Zhang

Page 2: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Page 3: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Page 4: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Page 5: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Page 6: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush

Page 7: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush

Page 8: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk

Page 9: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk

Page 10: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharon

Page 11: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharon

segmentation

Page 12: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharon

segmentation reordering

Page 13: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Phrase-based Translation

2

(Koehn et al., 2003; Och and Ney, 2004)

布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharon

segmentation reordering translation

Page 14: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Reordering is Hard

3

Chinese

President

XiJinping

and

his

Us

counterpart Barack

Obama

opentwo

days

of

talks

in

California on

anumber

of

high-stakes issues

Page 15: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Reordering is Hard

3

Chinese

President

XiJinping

and

his

Us

counterpart Barack

Obama

opentwo

days

of

talks

in

California on

anumber

of

high-stakes issues

Q: Can you figure out a sentence using these words?

Page 16: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Reordering is Hard

4

Chinese President Xi Jinping and his Us counterpart

Barack Obama open two days

of

talks in California

on a number

of

high-stakes issues

Q: Can you figure out a sentence using these words?

Page 17: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Reordering is Hard• An NP-complete problem (Knight, 1999; Zaslavskiy et al., 2009)

• Reordering modeling has attracted intensive attention, e.g.

• Distance-based model (Koehn et al., 2003)

• Word-based lexicalized model (Koehn et al., 2007)

• Phrase-based lexicalized model (Tillman, 2004)

• Hierarchical phrase-based lexicalized model (Galley and Manning, 2008)

5

Page 18: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Distance-based Model

6

(Koehn et al., 2003)

+2

d=2

布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharond=0 d=5

-5

Page 19: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Lexicalized Models

7

(Koehn et al., 2007; Tillman, 2004; Galley and Manning, 2008)

M布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharon

Page 20: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Lexicalized Models

8

(Koehn et al., 2007; Tillman, 2004; Galley and Manning, 2008)

M布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharon

D

Page 21: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Lexicalized Models

9

(Koehn et al., 2007; Tillman, 2004; Galley and Manning, 2008)

M布什 与 沙⻰龙 举⾏行 了 会谈

Bush held a talk with Sharon

S D

Page 22: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Challenge #1: Sparsity

10

Source Phrase Target Phrase M S D

布什 Bush 0.7 0.2 0.1

举⾏行 了 会谈 held a talk 0.1 0.1 0.8

与 沙⻰龙 with Sharon 0.7 0.1 0.2

举⾏行 了 held a 0.6 0.1 0.3

会谈 talk 0.4 0.3 0.3

Page 23: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Challenge #1: Sparsity• Probability distributions are estimated by MLE

11

Page 24: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Challenge #1: Sparsity• Probability distributions are estimated by MLE

11

P举⾏行 了 会谈

held a talkD =

Page 25: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Challenge #1: Sparsity• Probability distributions are estimated by MLE

11

P举⾏行 了 会谈

held a talkD =

# D举⾏行 了 会谈

held a talk

Page 26: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Challenge #1: Sparsity• Probability distributions are estimated by MLE

11

P举⾏行 了 会谈

held a talkD =

# D举⾏行 了 会谈

held a talk

# ●举⾏行 了 会谈

held a talk

Page 27: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Challenge #2: Ambiguity

12

Page 28: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

沙⻰龙与布什

Challenge #3: Context Insensitivity

13

举⾏行

Bush held a talk with Sharon

S了 会谈

Page 29: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

沙⻰龙与布什

Challenge #3: Context Insensitivity

13

举⾏行

Bush held a talk with Sharon

S了 会谈

How to resolve the three challenges?

Page 30: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Including More Contexts

14

与 沙⻰龙 举⾏行 了 会谈

held a talk

S

with Sharon

Sparsity

Ambiguity

Context Insensitivity

Page 31: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Including More Contexts

14

与 沙⻰龙 举⾏行 了 会谈

held a talk

S

with Sharon

Sparsity

Ambiguity

Context Insensitivity✔

Page 32: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Including More Contexts

14

与 沙⻰龙 举⾏行 了 会谈

held a talk

S

with Sharon

Sparsity

Ambiguity

Context Insensitivity✔

?

Page 33: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Sparsity• Including more contexts leads to severer sparsity

15

=P举⾏行 了 会谈

held a talkD

与 沙⻰龙

with Sharon

#举⾏行 了 会谈

held a talkD

与 沙⻰龙

with Sharon

#举⾏行 了 会谈

held a talk•

与 沙⻰龙

with Sharon

Page 34: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Sparsity• Including more contexts leads to severer sparsity

15

=P举⾏行 了 会谈

held a talkD

与 沙⻰龙

with Sharon

#举⾏行 了 会谈

held a talkD

与 沙⻰龙

with Sharon

#举⾏行 了 会谈

held a talk•

与 沙⻰龙

with Sharon

Reordering as Classification

Page 35: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Neural Reordering Model

16

Page 36: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Neural Reordering Model• A neural classifier for predicting reordering orientations

16

Page 37: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Neural Reordering Model• A neural classifier for predicting reordering orientations

• Conditioned on both the current and previous phrase pairs

• Improves context sensitivity

• Reduces reordering ambiguity

16

Page 38: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Neural Reordering Model• A neural classifier for predicting reordering orientations

• Conditioned on both the current and previous phrase pairs

• Improves context sensitivity

• Reduces reordering ambiguity

• A single classifier for all phrase pairs

• Uses vector space representations

• Alleviates the data sparsity problem

16

Page 39: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

17

aheld talk(Pollack; 1990; Socher et. al, 2011)

Page 40: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

17

aheld talk

x1

(Pollack; 1990; Socher et. al, 2011)

Page 41: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

17

aheld talk

x1 x2

(Pollack; 1990; Socher et. al, 2011)

Page 42: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

17

aheld talk

x1 x2 x3

(Pollack; 1990; Socher et. al, 2011)

Page 43: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

17

aheld talk

x1 x2 x3

y1 = f

(1)(W (1)[x1;x2] + b

(1))y1

(Pollack; 1990; Socher et. al, 2011)

Page 44: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

17

aheld talk

x1 x2 x3

y1 = f

(1)(W (1)[x1;x2] + b

(1))y1

x

01 x

02

[x01;x

02] = f

(2)(W (2)y1 + b

(2))

(Pollack; 1990; Socher et. al, 2011)

Page 45: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

17

aheld talk

x1 x2 x3

y1

x

01 x

02

||x1 � x

01||2 ||x2 � x

02||2

(Pollack; 1990; Socher et. al, 2011)

Page 46: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

18

aheld talk

x1 x2 x3

y1

(Pollack; 1990; Socher et. al, 2011)

Page 47: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

18

aheld talk

x1 x2 x3

y1

y2y2 = f

(1)(W (1)[y1;x3] + b

(1))

(Pollack; 1990; Socher et. al, 2011)

Page 48: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

18

aheld talk

x1 x2 x3

y1

y2y2 = f

(1)(W (1)[y1;x3] + b

(1))

[y01;x03] = f

(2)(W (2)y2 + b

(2))y01 x

03

(Pollack; 1990; Socher et. al, 2011)

Page 49: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

18

aheld talk

x1 x2 x3

y1

y2

y01 x

03

||x3 � x

03||2

||y1 � y01||2

(Pollack; 1990; Socher et. al, 2011)

Page 50: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Recursive Autoencoder (RAE)

18

aheld talk

x1 x2 x3

y1

y2

(Pollack; 1990; Socher et. al, 2011)

Page 51: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Neural Classifier

19

orientations

current phrase pair previous phrase pair

Page 52: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Neural Classifier

19

orientations

current phrase pair previous phrase pair

RAE

Page 53: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Neural Classifier

19

orientations

current phrase pair previous phrase pair

softmax

Page 54: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Training

20

Reordering error on predicting orientations

Reconstruction error on recovering training examples

Page 55: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Training

20

Reordering error on predicting orientations

Reconstruction error on recovering training examples

linear interpolation

Page 56: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Reconstruction Error• Reconstruction error

!

• Source side average reconstruction error

!

• Total reconstruction error

21

Erec([c1; c2]; ✓) =1

2||[c1; c2]� [c01; c

02]||2

Erec,s(S; ✓) =1

Ns

X

i

X

p2T ✓R(ti,s)

Erec([p.c1, p.c2]; ✓)

Erec(S; ✓) = Erec,s(S; ✓) + Erec,t(S; ✓)

Page 57: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Reordering Error• Average cross-entropy error

!

• Joint training objective

22

E

reo

(S; ✓) =1

|S|X

i

�X

o

d

ti(o) · log(P✓

(o|ti

))

!

J = ↵Erec

(S; ✓) + (1� ↵)Ereo

(S; ✓) +R(✓)

R(✓) =�L

2||✓

L

� ✓L0 ||2 +

�rec

2||✓

rec

||2 + �reo

2||✓

reo

||2

Page 58: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Optimization• Hyper-parameters optimization

• Optimized by random search (Bergstra and Bengio, 2012)

• Training objective optimization: L-BFGS

• Using backpropagation through structures to compute the gradients (Goller and Kuchler, 1996)

23

↵,�L

,�rec

,�reo

Page 59: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Experiments• Chinese-English translation

• Training: 1.2M sentence pairs

• LM: 4-gram, 397.6M words

• Dev. set: NIST 06

• Test set: NIST 02-05, 08

• Case-insensitive BLEU

24

Page 60: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

• Baselines

• Distance-based model

• Lexicalized model

Experiments• Chinese-English translation

• Training: 1.2M sentence pairs

• LM: 4-gram, 397.6M words

• Dev. set: NIST 06

• Test set: NIST 02-05, 08

• Case-insensitive BLEU

24

× M/S/D

left/right

orientations

word-based

phrase-based

hier. phrase-based

model type

Page 61: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

M/S/D Orientations• Care about relative position and adjacency

25

与 沙⻰龙

with Sharon

举⾏行 了 会谈

held a talkBush

布什D

与 沙⻰龙

with Sharon

举⾏行 了 会谈

held a talkBush

布什S

Page 62: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Left/Right Orientations• Only care about relative position

26

与 沙⻰龙

with Sharon

举⾏行 了 会谈

held a talkBush

布什right

与 沙⻰龙

with Sharon

举⾏行 了 会谈

held a talkBush

布什left

Page 63: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Translation

27

Page 64: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Translation

27

BLEU

24

26.25

28.5

30.75

33

MT06 (dev) MT02 MT03 MT04 MT05 MT08

baseline neural M/S/D neural left/right

Page 65: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Translation

27

BLEU

24

26.25

28.5

30.75

33

MT06 (dev) MT02 MT03 MT04 MT05 MT08

baseline neural M/S/D neural left/right

Page 66: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Translation

27

BLEU

24

26.25

28.5

30.75

33

MT06 (dev) MT02 MT03 MT04 MT05 MT08

baseline neural M/S/D neural left/right

Page 67: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability• The unaligned Chinese word “de” makes a big difference in

determining M/S/D orientations

28

⾦金⻔门 有 六 万 的 常住 ⼈人⼝口

Kinmen has 60000 resident population

jinmen you liu wan de changzhu renkou

M

Page 68: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability• The unaligned Chinese word “de” makes a big difference in

determining M/S/D orientations

28

⾦金⻔门 有 六 万 的 常住 ⼈人⼝口

Kinmen has 60000 resident population

jinmen you liu wan de changzhu renkou

M

Page 69: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability• The unaligned Chinese word “de” makes a big difference in

determining M/S/D orientations

28

⾦金⻔门 有 六 万 的 常住 ⼈人⼝口

Kinmen has 60000 resident population

jinmen you liu wan de changzhu renkou

M

Page 70: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability• The unaligned Chinese word “de” makes a big difference in

determining M/S/D orientations

29

⾦金⻔门 有 六 万 的 常住 ⼈人⼝口

Kinmen has 60000 resident population

jinmen you liu wan de changzhu renkou

D

Page 71: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability

30

M S D

六 万 的

60000

常住 ⼈人⼝口

resident population

六 万

60000

常住 ⼈人⼝口

resident population

Page 72: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability• Left/right orientations are not so sensitive to unaligned words

31

⾦金⻔门 有 六 万 的 常住 ⼈人⼝口

Kinmen has 60000 resident population

jinmen you liu wan de changzhu renkou

right

Page 73: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability• Left/right orientations are not so sensitive to unaligned words

32

⾦金⻔门 有 六 万 的 常住 ⼈人⼝口

Kinmen has 60000 resident population

jinmen you liu wan de changzhu renkou

right

Page 74: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability

33

left right

六 万 的

60000

常住 ⼈人⼝口

resident population

六 万

60000

常住 ⼈人⼝口

resident population

Page 75: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Non-Separability

34

left rightM S D

Page 76: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Distortion Limit

35

2 4 6 822

23

24

25

Distortion Limit

BL

EU

neural

lexicalized

Page 77: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Word Vectors

36

BLEU

20

23.5

27

30.5

34

MT06 (dev) MT02 MT03 MT04 MT05 MT08

task-oriented vectors word2vec vectors

Page 78: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Vector Space Representations

37

but is willing toeconomy is required to

range of services to said his visit is to

is making use of

june 18, 2001

late 2011

as detention centergroup all togethertake care of old

by june 1

and complete by end 1998

or other economicand for other

within the agencies

Page 79: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Conclusion• We propose a neural reordering model for phrase-based translation

• It improves the context sensitivity, reduces ambiguity and alleviates the data sparsity problem

38

Page 80: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Conclusion• We propose a neural reordering model for phrase-based translation

• It improves the context sensitivity, reduces ambiguity and alleviates the data sparsity problem

• Future work

• Train MT system and neural classifier jointly

• Develop more efficient models to leverage larger contexts

• Extend our work to syntax-based and n-gram based models

38

Page 81: A Neural Reordering Model for Phrase-based Translationnlp.csai.tsinghua.edu.cn/~lpeng/talks/coling-2014.pdfReordering is Hard • An NP-complete problem (Knight, 1999; Zaslavskiy et

Thanks!

39