Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

23
Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree Machine Translation Toshiaki Nakazawa Japan Science and Technology Agency (JST John Richardson Sadao Kurohashi Kyoto University 4/11/2016 @ EMNLP2016

Transcript of Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Page 1: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Insertion Position Selection Model for Flexible Non-Terminals

in Dependency Tree-to-TreeMachine Translation

Toshiaki NakazawaJapan Science and Technology Agency

(JST )John Richardson Sadao Kurohashi

Kyoto University4/11/2016 @ EMNLP2016

Page 2: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Where to insert?

I found Pikachu by chance

yesterdayinsertion positions

0.70.25 0.02 0.01prob. 0.010.01

2

Page 3: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Where to insert?

I found Pikachu by chance yesterday

in the parkinsertion positions

0.20.1 0.6 0.010.01

@Texas State Capitol

0.010.1

3

Page 4: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Pikachu

Dependency Tree-to-Tree Translation

私は昨日

公園で

ピカチュウを

見つけた

私は

を見つけた

I

found

by

Input Translation Rules Output

ピカチュウ Pikachu

偶然 [X7][X7]

偶然

chance

I

found

by

[X7]

chance

公園 thepark

昨日 yesterday

で4

Page 5: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Dependency Tree-to-Tree Translation

私は昨日

公園で

ピカチュウを

見つけた

私は

を見つけた

Input Translation Rules Output

ピカチュウ Pikachu

偶然

公園 thepark

[X7]偶然

昨日 yesterday

[X]

[X]

[X]

[X]

found

by

chance

[X]I

[X7]found

Pikachu

by

I

chance

yesterday

the

park

in

found

Pikachu

by

I

chance

yesterday

Pikachu

I

found

by

chance

Flexible Non-terminals[Richardson+, 2016]

floatingsubtreefloatingsubtree

5

Page 6: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Translation Quality and Decoding Speedw/ and w/o Flexible Non-terminals

• Using ASPEC (Asian Scientific Paper Excerpt Corpus) JE and JC

• Time is a relative decoding time

Ja->En En->Ja Ja->Zh Zh->JaBLEU Time BLEU Time BLEU Time BLEU Time

w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28

6

Page 7: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Appropriate Insertion Position Selection• roughly half of all translation rules were

augmented with flexible non-terminals [Richardson+, 2016]

• flexible non-terminals make the search space much bigger -> slower decoding speed, increased search error

• reduce the number of possible insertion positions in translation rules by a Neural Network model

7

Page 8: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Insertion Position Selection Model for Flexible Non-Terminals

in Dependency Tree-to-TreeMachine Translation

Toshiaki NakazawaJapan Science and Technology Agency

John Richardson Sadao KurohashiKyoto University

4/11/2016 @ EMNLP2016

Page 9: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

INSERTION POSITION SELECTION MODEL

9

Page 10: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Insertion Position Selection Model• For each insertion position:–predict• scores of the insertion positions

– given• input: the floating word (I) and its parent word

(Ps) with the distance (Ds)• target: previous (Sp) and next (Sn) sibling words

of the insertion position and the parent (Pt) with the distance (Dt)

10

Page 11: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Information for Selection Model

私は昨日

公園で

ピカチュウを

見つけた

私は

を見つけた

Input Translation Rules

偶然[X7]

偶然 found

by

chance

I

[X7]

I

Ps

Pt

Sp

Sn

Ds

=4

[X]

Dt=-2

Non-terminals:reverted to the original word in the parallel corpus

11

[yesterday]

[found]

Page 12: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Information for Selection Model

私は昨日

公園で

ピカチュウを

見つけた

私は

を見つけた

Input Translation Rules

偶然[X7]

偶然 found

by

chance

I

[X7]

I

Ps

Pt

Sp

Sn

Ds

=4

[X]

Dt=-3

= [POST-BOTTOM]

12

[yesterday]

[found]

Page 13: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Neural Network Model

220

I

Ps

Pt

Sp1

Sn1

Ds

Dtk

100100

220220

220220

100

word to be inserted

parent of I

distance from PS

previous sibling

next sibling

parent of the insertion position

distance from Pt

fully-connectedfeed-forward network

( )

・・・11

1

・・・

insertion position 2

insertion position N

scores

0.10.6・・・0.1

01・・・0

( )

softmax gold

loss =softmax cross-entropy

insertion position 1

13

Page 14: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Training Data Creation• Training data for the NN model can be

automatically created from the word-aligned parallel corpus– consider each alignment as the floating word and

remove it from the target tree

14

私は

を見つけた

I

found

byピカチュウ Pikachu

偶然

chance

[X][X][X]

[X]label

0

00

1

Page 15: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

EXPERIMENTS

15

Page 16: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Insertion Position Selection Experiment• Parallel corpus: ASPEC-JE/JC (2M/680K

sentences)• Data size

• Comparison– L2-regularized logistic regression (using Multi-core

LIBLINEAR)

Ja->En

En->Ja

Ja->Zh

Zh->Ja

Training 15.7M 5.7M

Development 160K 58K

Test 160K 58K

Ave. # IP 3.39 3.15 3.72 3.41

16

Page 17: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Experimental ResultsJa->En En->Ja Ja->Zh Zh->Ja

Training 15.7M 5.7MDevelopment 160K 58KTest 160K 58KAve. # IP 3.39 3.15 3.72 3.41Mean loss 0.089 0.058 0.105 0.056Top 1 Accuracy (%) 97.08 97.72 96.51 97.99Top 2 Accuracy (%) 98.94 99.52 98.97 99.56Logit Accuracy (%) 55.00 89.03 68.04 83.16

17

Page 18: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Translation Experiment• Parallel corpus: ASPEC-JE/JC (2M/680K

sentences)• Decoder: KyotoEBMT [Richardson+, 2014]• 5 Settings– Phrase-based and hierarchical phrase-based SMTs – w/o Flex: not using flexible non-terminals– w/ Flex: baseline with flexible non-terminals– Prop: using insertion position selection (only top 1)

• BLEU and relative decoding time

18

Page 19: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Translation Experimental Results

Ja->En En->Ja Ja->Zh Zh->JaBLEU Time BLEU Time BLEU Time BLEU Time

PBSMT 18.45 - 27.48 - 27.96 - 34.65 -HPBSMT 18.72 - 30.19 - 27.71 - 35.43 -w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28Prop 22.07 2.25 30.50 1.27 29.83 2.21 34.71 1.89

19

Page 20: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

20

Conclusion• Proposed insertion position selection model to

reduced the number of insertion positions for flexible non-terminals in the translation rules

• Automatic evaluation scores and decoding speed are improved

Page 21: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

21

Future Work• Use grand-children’s info– Recursive NN [Liu et al., 2015] or Convolutional

NN [Mou et al., 2015]

• Shift to NMT!!– Actually, we’ve already shifted and participated

WAT2016 shared tasks• However, NMT is still far from perfect

Page 22: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

J->E Adequacy in WAT2016

22

3.76 3.710%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

21.75 2137.25

51.75 46.7530.5

20.75 26.7516.25

4.75 510

1 0.5 6

12345

3.83Average adequacy

BLEU 26.22 26.39 25.41

Kyoto-U(NMT)

NAIST/CMU(NMT)

NAIST(2015 best, F2T)

Team name

Page 23: Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

23

Thank You!AD I’m co-organizing

The 3rd Workshop on Asian Translation(WAT2016)

in conjunction with COLING 2016Invited talk by Google about GNMT!

Please come to the workshop!

http://lotus.kuee.kyoto-u.ac.jp/WAT/