Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree
-
Upload
toshiaki-nakazawa -
Category
Science
-
view
239 -
download
0
Transcript of Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree
Insertion Position Selection Model for Flexible Non-Terminals
in Dependency Tree-to-TreeMachine Translation
Toshiaki NakazawaJapan Science and Technology Agency
(JST )John Richardson Sadao Kurohashi
Kyoto University4/11/2016 @ EMNLP2016
Where to insert?
I found Pikachu by chance
yesterdayinsertion positions
0.70.25 0.02 0.01prob. 0.010.01
2
Where to insert?
I found Pikachu by chance yesterday
in the parkinsertion positions
0.20.1 0.6 0.010.01
@Texas State Capitol
0.010.1
3
Pikachu
Dependency Tree-to-Tree Translation
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
I
found
by
Input Translation Rules Output
ピカチュウ Pikachu
偶然 [X7][X7]
偶然
chance
I
found
by
[X7]
chance
公園 thepark
昨日 yesterday
で4
Dependency Tree-to-Tree Translation
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
Input Translation Rules Output
ピカチュウ Pikachu
偶然
公園 thepark
[X7]偶然
昨日 yesterday
で
[X]
[X]
[X]
[X]
found
by
chance
[X]I
[X7]found
Pikachu
by
I
chance
yesterday
the
park
in
found
Pikachu
by
I
chance
yesterday
Pikachu
I
found
by
chance
Flexible Non-terminals[Richardson+, 2016]
floatingsubtreefloatingsubtree
5
Translation Quality and Decoding Speedw/ and w/o Flexible Non-terminals
• Using ASPEC (Asian Scientific Paper Excerpt Corpus) JE and JC
• Time is a relative decoding time
Ja->En En->Ja Ja->Zh Zh->JaBLEU Time BLEU Time BLEU Time BLEU Time
w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28
6
Appropriate Insertion Position Selection• roughly half of all translation rules were
augmented with flexible non-terminals [Richardson+, 2016]
• flexible non-terminals make the search space much bigger -> slower decoding speed, increased search error
• reduce the number of possible insertion positions in translation rules by a Neural Network model
7
Insertion Position Selection Model for Flexible Non-Terminals
in Dependency Tree-to-TreeMachine Translation
Toshiaki NakazawaJapan Science and Technology Agency
John Richardson Sadao KurohashiKyoto University
4/11/2016 @ EMNLP2016
INSERTION POSITION SELECTION MODEL
9
Insertion Position Selection Model• For each insertion position:–predict• scores of the insertion positions
– given• input: the floating word (I) and its parent word
(Ps) with the distance (Ds)• target: previous (Sp) and next (Sn) sibling words
of the insertion position and the parent (Pt) with the distance (Dt)
10
Information for Selection Model
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
Input Translation Rules
偶然[X7]
偶然 found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=4
[X]
Dt=-2
Non-terminals:reverted to the original word in the parallel corpus
11
[yesterday]
[found]
Information for Selection Model
私は昨日
公園で
ピカチュウを
見つけた
私は
を見つけた
Input Translation Rules
偶然[X7]
偶然 found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=4
[X]
Dt=-3
= [POST-BOTTOM]
12
[yesterday]
[found]
Neural Network Model
220
I
Ps
Pt
Sp1
Sn1
Ds
Dtk
100100
220220
220220
100
word to be inserted
parent of I
distance from PS
previous sibling
next sibling
parent of the insertion position
distance from Pt
fully-connectedfeed-forward network
( )
・・・11
1
・・・
insertion position 2
insertion position N
scores
0.10.6・・・0.1
01・・・0
( )
softmax gold
loss =softmax cross-entropy
insertion position 1
13
Training Data Creation• Training data for the NN model can be
automatically created from the word-aligned parallel corpus– consider each alignment as the floating word and
remove it from the target tree
14
私は
を見つけた
I
found
byピカチュウ Pikachu
偶然
chance
[X][X][X]
[X]label
0
00
1
EXPERIMENTS
15
Insertion Position Selection Experiment• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)• Data size
• Comparison– L2-regularized logistic regression (using Multi-core
LIBLINEAR)
Ja->En
En->Ja
Ja->Zh
Zh->Ja
Training 15.7M 5.7M
Development 160K 58K
Test 160K 58K
Ave. # IP 3.39 3.15 3.72 3.41
16
Experimental ResultsJa->En En->Ja Ja->Zh Zh->Ja
Training 15.7M 5.7MDevelopment 160K 58KTest 160K 58KAve. # IP 3.39 3.15 3.72 3.41Mean loss 0.089 0.058 0.105 0.056Top 1 Accuracy (%) 97.08 97.72 96.51 97.99Top 2 Accuracy (%) 98.94 99.52 98.97 99.56Logit Accuracy (%) 55.00 89.03 68.04 83.16
17
Translation Experiment• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)• Decoder: KyotoEBMT [Richardson+, 2014]• 5 Settings– Phrase-based and hierarchical phrase-based SMTs – w/o Flex: not using flexible non-terminals– w/ Flex: baseline with flexible non-terminals– Prop: using insertion position selection (only top 1)
• BLEU and relative decoding time
18
Translation Experimental Results
Ja->En En->Ja Ja->Zh Zh->JaBLEU Time BLEU Time BLEU Time BLEU Time
PBSMT 18.45 - 27.48 - 27.96 - 34.65 -HPBSMT 18.72 - 30.19 - 27.71 - 35.43 -w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28Prop 22.07 2.25 30.50 1.27 29.83 2.21 34.71 1.89
19
20
Conclusion• Proposed insertion position selection model to
reduced the number of insertion positions for flexible non-terminals in the translation rules
• Automatic evaluation scores and decoding speed are improved
21
Future Work• Use grand-children’s info– Recursive NN [Liu et al., 2015] or Convolutional
NN [Mou et al., 2015]
• Shift to NMT!!– Actually, we’ve already shifted and participated
WAT2016 shared tasks• However, NMT is still far from perfect
J->E Adequacy in WAT2016
22
3.76 3.710%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
21.75 2137.25
51.75 46.7530.5
20.75 26.7516.25
4.75 510
1 0.5 6
12345
3.83Average adequacy
BLEU 26.22 26.39 25.41
Kyoto-U(NMT)
NAIST/CMU(NMT)
NAIST(2015 best, F2T)
Team name
23
Thank You!AD I’m co-organizing
The 3rd Workshop on Asian Translation(WAT2016)
in conjunction with COLING 2016Invited talk by Google about GNMT!
Please come to the workshop!
http://lotus.kuee.kyoto-u.ac.jp/WAT/