Chunk-based Decoder for Neural Machine Translationynaga/papers/ishiwatari-acl...Chunk-based Decoder...
Transcript of Chunk-based Decoder for Neural Machine Translationynaga/papers/ishiwatari-acl...Chunk-based Decoder...
Chunk-based Decoder for Neural Machine TranslationShonosuke Ishiwatari¹, Jingtao Yao², Shujie Liu³, Mu Li³, Ming Zhou³, Naoki Yoshinaga⁴, Masaru Kitsuregawa⁴⁵, Weijia Jia²
¹The University of Tokyo ²Shanghai Jiao Tong University ³Microsoft Research Asia
⁴Institute of Industrial Science, the University of Tokyo ⁵National Institute of Informatics
Overview
Proposal: Chunk-based Decoder for Neural Machine Translation
Translation examples
Idea: Using a chunk rather than a word as basic translation unit
SOTA performance in distant languages (En -> Ja)
Translation between distant language pairs are difficult
Word-based Encoder-Decoder
か が 犬だれ
!""
!"# $%&&'( $)*+,'+(' " -+.
+ * α(4, dog)
!""
/&&'(&%+(
…
0'1+-'2
3(1+-'2
Difficulties of translation of En -> Ja
Experiments
Our decoder for NMT
Quality of translationQuality of generated chunks
※Scores inside () are not our implementations
Data - ASPEC [Nakazawa+ 16], 1.6M En/Ja pairs
Preprocessing - Bunsetsu chunking with J.DepP [Yoshinaga & Kitsuregawa 09]
Baseline systems 1. Word-based encoder-decoder [Bahdanau+ 15] 2. Tree-based encoder [Eriguchi+ 16] (SOTA)
Results
I heard that someone was bitten by a dog, weren’ t you injured?だれ か が / 犬 に / 噛ま れ た そう だ けれど、/ 君 は / 怪我 し なかっ た ?
Japanese sentence has - longer sequence (En: 25 vs. Ja: 30 [words/sentence]) - free chunk order (e.g., 「だれかが/犬に」 = 「犬に/だれかが」)
Enc-Dec with Attention [Bahdanau+ 15] - encodes / decodes “word-by-word”
Two step decoding - First a chunk, then words inside the chunk
Two additional connections 1. to capture the interaction between chunks 2. to memorize peprevious outputs well
!"#$%&
!"#$%!&'()"#*+,$-.'/$0+(,"%*-#$+( %$"'&'-#1#'2($#.#,!*+,$0+(,"%*-#$'3!%+1#0$34$%)).4+1-$%.!#*1%!+1-$5'.!%-#$%6!#*$+1!*'02,+1-$%!&'()"#*+,$7#$-%($+1$%$!4)+,%.$0+#.#,!*+,$3%**+#*$0+(,"%*-#$*#%,!'*$8
3#,%2(#$2(#*$')#*%!+'1$+($+&)'*!%1!$6'*$!"#$+0#%$(2))'*! +1$&%!#*+%.$0#5#.')!$9$%1$+1!#*6%,#$6'*$%$(23(!%1,#$')#*%!+'1$%!$%!'&+,$.#5#.$/%($0#5#.')#0$8$
'&(&$&)%&
*"$+,-./&+01.2+.).#34567
82#)9,-./&+0:$":"/&+7
:$:$:$ :$:$ :$ :$
:$ :$ :$
:$ :$$$$$:$
:$:$ :$
:$ :$:$
*$");4%2#)94<"%.=>") ?/#::"$=@4>/4A>//>);
!"#$%"&'()*" +,-. /0+-1
!"#$%&'()*+,-.+*/012 2345 67327!"#$%&'()*+,-.+*/032 4546 78596!"#$%&'()*+,-.+*/082 457: 7;594
<.=+&'()*+ 457> 7;541
8$%"9 +,-. /0+-1
<.=+&'()*+0*$?.+*=@0:;<=>?@AB"%'%"#$%"&'CD&$*$B"%E
8$%"9'F FG374 2737F
-.+*/ 3 17568 6853:
-.+*/08 1954; 685;8
A=**&'()*+0*$?.+*=0BC=DE#?"D@08>F0@0<.=+&'()*+0+*?.+*= ,195:82 ,685>>2
<.=+&'()*+0*$?.+*=&+*?.+*=0BG("+($(#@087F 1>511,195>92
68533,685>;2
1. Some languages use many words to represent one thing while others use less words
2. Some languages are free word-order while others are not
decodes sentences in a “chunk-by-chunk” manner toovercome the differencesof length and word-order
Sequence of a sentence becomes much shorter Fixed word order and free chunk order can be modeled independenlty
か が 犬 に
だれかが
犬に
end-of-chunk
!"#$%&'()('*+,,
-"./0(/*1(2#($3(14
だれ
5./6&'()('*+,,
#7680(
9.6('1*:;<(6*=./6*./6(/*8$6*:/((*=./6*./6(/*;$6(7($6($0'>4
9.6('*?
か 犬 にend-of-chunkが
!"#$%&'()('*+,,
だれ
-./0&'()('*+,,
1.0('*23*4$5(/&6"#$%*6.$$(657.$
1.0('*83*-./0&5.&6"#$%*9((0:;6%