The Indian Patent Regime | Indian Patent Act | Indian Patent Law | Patent Protection in India
Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of...
Transcript of Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of...
![Page 1: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/1.jpg)
1
Zi Long, Takehito Utsuro
University of Tsukuba
Tomoharu Mitsuhashi Japan Patent Information Organization
Mikio Yamamoto
University of Tsukuba
Translation of Patent Sentences
with a Large Vocabulary
of Technical Terms
Using Neural Machine Translation
WAT2016, December 12, 2016 @ Osaka, Japan
![Page 2: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/2.jpg)
… …
encoder
Vinyals et al. Grammar as a foreign
language. In Proc. NIPS, 2015
… … 図 に 示す 。 <EOS>
(as shown in Figure … )
input
A large vector that represents the entire
input sentence
Neural Machine Translation (encoder-decoder Model and attention mechanism)
![Page 3: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/3.jpg)
… …
図 に 示す 。 <EOS>
(as shown in Figure … )
encoder
… …
… …
如 图 所示 。 <EOS> … …
decoder
Neural Machine Translation (encoder-decoder Model and attention mechanism)
Vinyals et al. Grammar as a foreign
language. In Proc. NIPS, 2015
output
input
![Page 4: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/4.jpg)
Neural Machine Translation VS Statistic Machine Translation
Neural Machine
Translation (NMT)
fluency
vocabulary
high
small
Statistic Machine
Translation (SMT)
large
low
![Page 5: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/5.jpg)
Neural Machine Translation VS Statistic Machine Translation
Neural Machine
Translation (NMT)
fluency
vocabulary
high
small
Statistic Machine
Translation (SMT)
large
low • phrase-level • store explicit phrase
translation table
• word-level • inappropriate for
translating technical terms
Luong et al. Addressing the Rare
Word Problem in Neural Machine
Translation. In Proc. ACL, 2015
![Page 6: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/6.jpg)
6
NMT with a Large Vocabulary of Technical Terms
Step 1. training NMT model with technical term tokens
Step 2. applying NMT model with technical term tokens
Approach 1. NMT decoding and SMT technical term translation
Approach 2. NMT rescoring of 1,000-best SMT translations (not as fluent as Approach 1)
![Page 7: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/7.jpg)
7
NMT with a Large Vocabulary of Technical Terms
Step 1. training NMT model with technical term tokens
Step 2. applying NMT model with technical term tokens
Approach 1. NMT decoding and SMT technical term translation
Approach 2. NMT rescoring of 1,000-best SMT translations (not as fluent as Approach 1)
![Page 8: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/8.jpg)
8
SMT
NMT training after replacing Technical Terms with Tokens
Japanese sentence: cmac/ユニット/312/は/信号/
を/ブリッジ/インタフェース/388/に/提供/する/。
Chinese sentence: cmac/单元/312/将/信号/提供/给/桥架/接口/388/。
(cmac unit 312 provides a signal to the bridge interface 388.)
aligned patent sentence pairs
1. aligning technical
term pairs using SMT
phrase translation table
and word alignment
SMT translation model
phrase
translation
table
word
alignment
aligned patent sentence pairs
![Page 9: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/9.jpg)
9
NMT training after replacing Technical Terms with Tokens
Chinese sentence: cmac/单元/312/将/信号/提供/给/桥架/接口/388/。
(cmac unit 312 provides a signal to the bridge interface 388.)
aligned patent sentence pairs
Japanese sentence with
technical term tokens
“TT1”, “TT2” : TT1 /312/は/信号/を/TT2
/388/に/提供/する/。
Chinese sentence with
technical term tokens
“TT1”, “TT2” : TT1 /312/将/信号/提供/给/TT2 /388/。
(TT1 312 provides a signal to the TT2 388.)
Japanese sentence: cmac/ユニット/312/は/信号/
を/ブリッジ/インタフェース/388/に/提供/する/。
(with technical term tokens)
aligned patent sentence pairs
2. replacing each aligned technical term pair with
an identical technical term token “TTi” (i = 1, 2, …)
SMT
SMT translation model
phrase
translation
table
word
alignment
aligned patent sentence pairs
![Page 10: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/10.jpg)
10
NMT training after replacing Technical Terms with Tokens
Chinese sentence: cmac/单元/312/将/信号/提供/给/桥架/接口/388/。
(cmac unit 312 provides a signal to the bridge interface 388.)
aligned patent sentence pairs
Japanese sentence with
technical term tokens
“TT1”, “TT2” : TT1 /312/は/信号/を/TT2
/388/に/提供/する/。
Chinese sentence with
technical term tokens
“TT1”, “TT2” : TT1 /312/将/信号/提供/给/TT2 /388/。
(TT1 312 provides a signal to the TT2 388.)
(with technical term tokens)
aligned patent sentence pairs
NMT translation
model (with technical
term tokens)
NMT
Japanese sentence: cmac/ユニット/312/は/信号/
を/ブリッジ/インタフェース/388/に/提供/する/。
SMT
SMT translation model
phrase
translation
table
word
alignment
aligned patent sentence pairs
![Page 11: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/11.jpg)
11
NMT with a Large Vocabulary of Technical Terms
Step 1. training NMT model with technical term tokens
Step 2. applying NMT model with technical term tokens
Approach 1. NMT decoding and SMT technical term translation
Approach 2. NMT rescoring of 1,000-best SMT translations (not as fluent as Approach 1)
![Page 12: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/12.jpg)
12
NMT
translation
model (with
technical
term tokens)
output Chinese
translation
NMT Decoding and SMT technical Term Translation
input Japanese sentence
replacing
technical
terms with
technical
term tokens
extracted Japanese
technical terms
replacing
technical
term tokens
with technical
term
translation
by SMT
output
Chinese
translation
(with
technical
term tokens)
Chinese translations of technical terms
phrase
translation
table
decoding by NMT
translation model (with
technical term tokens)
![Page 13: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/13.jpg)
13
Training and Test Sets
・・・・・・
pair 1:
pair 2:
pair n:
J: ・・・冷蔵庫および冷蔵庫扉閉鎖装置・・・ C: ・・・电冰箱及电冰箱门锁闭装置・・・。
J: ・・・真空断熱材及びその製造方法・・・.
C: ・・・真空绝热材料及其制作方法・・・。
J: ・・・運動に関する3次元デカルト座標系を定める。 C: ・・・以限定用于描述运动的三维笛卡尔坐标系・・・。
(Cartesian coordinate system)
(vacuum thermal insulation)
(closing appliance of Refrigerator's door)
• 2.8M parallel sentences extracted from Japanese-
Chinese patent families • Randomly selected 1,000 sentence pairs as the test set
![Page 14: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/14.jpg)
14
Experiments Settings
Baseline SMT (PBMT) phrase-based SMT model trained with the same
training set using Moses.
Baseline NMT uni-direction model with attention mechanism
3 layer deep LSTMs with 512 cells in each layer and a 512-dimensional word embedding.
limit both the Japanese vocabulary and the Chinese vocabulary to 40,000 most frequently used word
more training details.
![Page 15: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/15.jpg)
15
Experiments Settings
NMT with PosUnk model [Luong+ 2015] same training parameters with the baseline NMT
training NMT model with PosUnk model
NMT with technical term tokens same training parameters with the baseline NMT
training NMT model after replacing technical terms with tokens.
![Page 16: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/16.jpg)
16
Evaluation Results (automatic evaluation - BLEU)
52.5
53.5 54.0
55.3
51.0
51.5
52.0
52.5
53.0
53.5
54.0
54.5
55.0
55.5
56.0
Baseline
SMT
(PBMT)
Baseline
NMT
NMT with
PosUnk model
[Luong+ 2015]
NMT with
technical term
tokens
• BLEU score
![Page 17: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/17.jpg)
17
Evaluation Results (automatic evaluation - RIBES)
88.5
90.0
90.4
90.8
88.0
88.5
89.0
89.5
90.0
90.5
91.0
Baseline
NMT
NMT with
PosUnk model
[Luong+ 2015]
NMT with
technical term
tokens
Baseline
SMT
(PBMT)
• RIBES score
![Page 18: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/18.jpg)
18
Evaluation Results (human evaluation – pairwise evaluation)
• Pairwise Evaluation (scores range from -100 to 100)
5.0
36.5
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
Baseline
NMT
NMT with
technical term
tokens
![Page 19: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/19.jpg)
19
Evaluation Results (human evaluation – JPO adequacy evaluation)
• JPO adequacy evaluation (scores range from 1 to 5)
3.5
3.8
4.3
2.0
2.5
3.0
3.5
4.0
4.5
Baseline
NMT
NMT with
technical term
tokens
Baseline
SMT
(PBMT)
![Page 20: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/20.jpg)
20
Example of correct translations produced by NMT decoding
次に、酸化膜をhf洗浄により除去した後、貼り合わせウェーハの剥離面から酸素イオンを注入した。
(Next, after removing an oxide film by
hf washing, we inject oxygen ions from
the peeled surface of the laminated
wafer.)
Japanese sentence
接着,通过hf清洗除去氧化膜后,从贴合的UNK的剥离面注入氧气。
(TRANSLATION:
Then, after the oxide film was removed by
hf cleaning, oxygen was injected from the
peeled surface of the laminated UNK.)
(TRANSLATION:
Subsequently, after the oxide film was
removed by washing with hf, and oxygen
ions were injected from the peeled surface
of the laminated wafer.)
接着,通过hf洗涤除去氧化膜后,从贴合晶片的剥离面注入氧离子。
NMT decoding
by the PROPOSED NMT
NMT decoding
by the BASELINE NMT
The Chinese word
“晶片”(wafer) is out of
the vocabulary
Correct!
Incorrect!
![Page 21: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/21.jpg)
21
Example of correct translations produced by NMT decoding
次に、酸化膜をhf洗浄により除去した後、貼り合わせウェーハの剥離面から酸素イオンを注入した。
(Next, after removing an oxide film by
hf washing, we inject oxygen ions from
the peeled surface of the laminated
wafer.)
Japanese sentence
接着,通过hf清洗除去氧化膜后,从贴合的UNK的剥离面注入氧气。
(TRANSLATION:
Then, after the oxide film was removed by
hf cleaning, oxygen was injected from the
peeled surface of the bonded UNK.)
(TRANSLATION:
Subsequently, after the oxide film was
removed by washing with hf, and oxygen
ions were injected from the peeled surface
of the laminated wafer.)
接着,通过hf洗涤除去氧化膜后,从贴合晶片的剥离面注入氧离子。
NMT decoding
by the PROPOSED NMT
NMT decoding
by the BASELINE NMT
Correct!
Incorrect!
Japanese technical term
“貼り合わせウェーハ”(laminated
wafer) is translated as Chinese
technical term “贴合晶片” by SMT
![Page 22: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/22.jpg)
22
NMT with a Large Vocabulary of Technical Terms
Step 1. training NMT model with technical term tokens
Step 2. applying NMT model with technical term tokens
Approach 1. NMT decoding and SMT technical term translation
Approach 2. NMT rescoring of 1,000-best SMT translations (not as fluent as Approach 1)
![Page 23: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/23.jpg)
23
Evaluation Results (2) (automatic evaluation)
• BLEU score
55.3 55.6
50.0
51.0
52.0
53.0
54.0
55.0
56.0
NMT rescoring of
SMT 1,000-best
translation
NMT decoding
and SMT technical
term translation
Slightly
higher
• RIBES score
90.8
89.3
88.5
89.0
89.5
90.0
90.5
91.0
NMT rescoring of
SMT 1,000-best
translation
NMT decoding
and SMT technical
term translation
much higher
![Page 24: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/24.jpg)
24
Evaluation Results (2) (human evaluation)
• Pairwise Evaluation
(scores range from -100 to 100)
36.5
31.0
28.0
29.0
30.0
31.0
32.0
33.0
34.0
35.0
36.0
37.0
NMT rescoring of
SMT 1,000-best
translation
NMT decoding
and SMT technical
term translation
higher
• JPO adequacy evaluation
(scores range from 1 to 5)
4.3
4.1
4.0
4.1
4.1
4.2
4.2
4.3
4.3
4.4
NMT rescoring of
SMT 1,000-best
translation
NMT decoding
and SMT technical
term translation
higher
![Page 25: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/25.jpg)
25
Conclusion and Future Work
Translating patent sentences with a large vocabulary of technical terms by training an NMT system on a bilingual corpus, wherein technical terms are replaced with tokens
Evaluation experiments on Japanese-Chinese patent sentences proved the effectiveness of the proposed method
Future Work: Evaluate the proposed method with a bidirection NMT system
(Bahdanau et al. 2015 )
Rescore 1,000-best NMT translations by using SMT system
![Page 26: Translation of Patent Sentences with a Large …...1WAT2016, Zi Long, Takehito Utsuro University of Tsukuba Tomoharu Mitsuhashi Japan Patent Information Organization Mikio Yamamoto](https://reader030.fdocuments.net/reader030/viewer/2022040522/5e7d7876a8406569e83bff17/html5/thumbnails/26.jpg)
26 26
Thank you for your attention!