ISE - TAUS Tokyo Forum 2015

27
Good to Go: More Preferable MT than HT ISE MT Project 2015 ISE Wakabayashi electrosuisse japan Nakamura

Transcript of ISE - TAUS Tokyo Forum 2015

Page 1: ISE - TAUS Tokyo Forum 2015

Good to Go: More Preferable MT than HTISE MT Project 2015

ISEWakabayashi

electrosuisse japanNakamura

Page 2: ISE - TAUS Tokyo Forum 2015

A leading company that pioneers new areas in technical communication technologies

• Date of establishment– October, 1979

• Business sites– Tokyo (Headquarters), Osaka, Kobe– Beijing, Shanghai, Switzerland

• Affiliated business– Electrosuisse Japan (Kobe)

• Our Business– Technical Communication– Interface Design– Systems Design &

Development– Technical Consulting

• Customer Fields– Japanese Governmental Agencies,

Educational/Research Institutions– Financial Institutions, Trading,

Manufacturing, Information Services

About ISE

Information System Engineering, Electrosuisse Co. Ⓒ 20152015/4/10 2

Page 3: ISE - TAUS Tokyo Forum 2015

Background• Typical Japanese manufacturers

– In Japanese: Excessively detailed manuals – Other languages incl. English: not good at dealing with

them → outsourcing to LSPs → data accumulation– Result (fact): English (master) → EU langs → large TMs

• Depending on langs: almost 1 mil. TM segment pairs!!!

– Utilizing TMs in MT to make L10n more effective and efficient:

• Source language: English, not Japanese • Main usage: Eng to European languages • Which is the most appropriate MT tool for us?

3Information System Engineering, Electrosuisse Co. Ⓒ 20152015/4/10

Page 4: ISE - TAUS Tokyo Forum 2015

Integrating MT system into L10n• Purpose: Efficient L10n for the documents of a

global business (Simship)

• ISE: System solution provider for clients (manufacturers), working for them (in house) 42015/4/10 Information System Engineering,

Electrosuisse Co. Ⓒ 2015

Page 5: ISE - TAUS Tokyo Forum 2015

Project Phases

1. From English data: Utilizing TMs in MT to make L10n more effective and efficient: speeding up and lowering cost

2. From Japanese data: JA-EN MT-ize: utilizing MT to facilitate communication including SNS, customer support

Information System Engineering, Electrosuisse Co. Ⓒ 2015 52015/4/10

Page 6: ISE - TAUS Tokyo Forum 2015

1st Phase : from EN

• Evaluation for Preparation– Source Language : EN– Target Language : FR, DE, CN– Domain : ICT Equipment– Document Type : Manual– Corpus volumes(After cleaning)

• FR : approx. 300K TUs• DE : approx. 250K TUs• CN : approx. 200K TUs

Information System Engineering, Electrosuisse Co. Ⓒ 2015 72015/4/10

Page 7: ISE - TAUS Tokyo Forum 2015

Evaluation Process• Corpus Cleaning

– Removing duplicate and/or conflicting language pairs from corpora

• Corpus Training– Building up translation models using each corpus

• Tuning/Improvement Cycle– Applying translation rules, terminology, user dictionary,

and normalization dictionary• Measurement

– HT and MT+PE translation time• Evaluation

– Productivity, Quality

Information System Engineering, Electrosuisse Co. Ⓒ 2015 82015/4/10

Page 8: ISE - TAUS Tokyo Forum 2015

BLEU/Perfect/TER/WER

• BLEU– Excellent levels

• Target score : 50+

• Perfect– Excellent levels except DE

• Target score: 25+

• TER/WER– Good scores for each language– Effective and practical levels

• Target score: Under 40Information System Engineering,

Electrosuisse Co. Ⓒ 2015 9

FR DE CN

BLEU CLEAR CLEAR CLEAR

Perfect CLEAR UNDER CLEAR

TER CLAER CLEAR CLEAR

WER CLEAR CLEAR CLEAR

2015/4/10

Page 9: ISE - TAUS Tokyo Forum 2015

Productivity Evaluation

• Methodology– Translation Targets

• Pick up 120 sentences from manual

– Measurement translation time each lang.• HT• MT + PE1• MT + PE2

2015/4/10 Information System Engineering, Electrosuisse Co. Ⓒ 2015 10

Page 10: ISE - TAUS Tokyo Forum 2015

Productivity Evaluation

• Achieved Doubled Productivity Compared with Standard HT

Information System Engineering, Electrosuisse Co. Ⓒ 2015 112015/4/10

Page 11: ISE - TAUS Tokyo Forum 2015

Quality Evaluation

• Methodology– Testers evaluate the following 5 translations:

i. Original translationii. HTiii. MTiv. MT + PE1v. MT + PE2

– Testers do not know which is which– Perfect score: 100

Information System Engineering, Electrosuisse Co. Ⓒ 2015 122015/4/10

Page 12: ISE - TAUS Tokyo Forum 2015

Quality Evaluation

• From HT– Excellent!

• From MT– Good score :FR, CN

• From MT+PE– Achieved the original translation quality level– DE: got the scores to B after PE (from MT: C)– Realized the same quality as those of the originals

Information System Engineering, Electrosuisse Co. Ⓒ 2015 13

FR DE CNReference(Original) A B A

HT A A AMT A C BMT+PE1 A B AMT+PE2 A B A

2015/4/10

Page 13: ISE - TAUS Tokyo Forum 2015

Targeted MT System

Information System Engineering, Electrosuisse Co. Ⓒ 2015 142015/4/10

Page 14: ISE - TAUS Tokyo Forum 2015

2nd Phase: JA to EN

Information System Engineering, Electrosuisse Co. Ⓒ 2015 152015/4/10

Page 15: ISE - TAUS Tokyo Forum 2015

Problems in JA>EN• Terminology/word usages (expressions)• Katakana words• Itemization• Viewing points• Parallelism• Modifications• Grammar• No subjects: you or we (instruction or descriptive)• Passive• Syntax

– Ha-ga construction – Ergative case– Others

• Singular or plural• Definite article• Pronominalization

Information System Engineering, Electrosuisse Co. Ⓒ 2015 162015/4/10

Page 16: ISE - TAUS Tokyo Forum 2015

Improvements

• Improving Japanese text quality by applying the Plain and Logical Japanese 77 Rules• Almost a half of them are effective (subjective evaluation)

– Effective 26– Somewhat effective 16– None 35

Information System Engineering, Electrosuisse Co. Ⓒ 2015 172015/4/10

Page 17: ISE - TAUS Tokyo Forum 2015

Viewing Points

• 主語も述語も共有しない重文は複数の文に分ける Divide the compound sentence that doesn’t share the subjects and predicatives– まず手動でおおよそのポイントを調整し、その調整が合ったところを自動的に認識して、正確なポイントを検出します。

– First you adjust the approximate point manually, recognizing the place where the adjustment is agreeable automatically, you detect the accurate point.

– 手動でおおよそのポイントを合わせます。機器は、その調整されたポイントを自動的に認識して、正確なポイントを検出します。

– Adjust the approximate point manually. The equipment, recognizing the adjusted points automatically, detects the accurate point.

Information System Engineering, Electrosuisse Co. Ⓒ 2015 182015/4/10

Page 18: ISE - TAUS Tokyo Forum 2015

Modifications

• 連用修飾の数量表現は、連体修飾句に言い換える Replace the continuous modifications with adnominal modifications in quantity expression – このメーリングリストのファシリテーターが複数必要になります。

– The facilitator of this mailing list is several needed.– このメーリングリストには、複数のファシリテーターが必要です。

– Several facilitators are necessary in this mailing list.Information System Engineering,

Electrosuisse Co. Ⓒ 2015 192015/4/10

Page 19: ISE - TAUS Tokyo Forum 2015

Parallelism

• 並列関係にあるものは、並列であることを明示する(パラレリズム) Show the things in the parallel grammatical form if they are parallel– 正常運転を行ったときはプラスの反応を示し、逆転する場合はマイナスの反応が示されます。

– When normal operation shows the plus reaction, when it is reversed, the negative reaction is shown.

– 正常運転を行なったときはプラスの反応が示され、逆転運転を行なったときはマイナスの反応が示されます。

– When doing normal operation, the plus reaction is shown, when doing reversal driving, the negative reaction is shown.

Information System Engineering, Electrosuisse Co. Ⓒ 2015 202015/4/10

Page 20: ISE - TAUS Tokyo Forum 2015

No subjects: you or we (instruction or descriptive)

• 抽象的な品詞よりも、より具体的な品詞を使うUse concrete words, not using abstract words– 開始点Aより終点Bまで実線を引く。

– From the start point A the solid line is pulled to terminus B.

– 実線を開始点Aから終点Bまで書く。

– Write the solid line from the start point A to the end point B.

Information System Engineering, Electrosuisse Co. Ⓒ 2015 212015/4/10

Page 21: ISE - TAUS Tokyo Forum 2015

Ha-ga structure

• 主題を表す副助詞「は」は使わない Don’t use the restrictive particle, “Ha,” showing topic– このマニュアルは、イラストが効果的です。

– This as for the manual, illustration is effective.

– このマニュアルに描かれているイラストは効果的です。

– The illustration which is drawn in this manual is effective. Singular or plural?

Information System Engineering, Electrosuisse Co. Ⓒ 2015 222015/4/10

Page 22: ISE - TAUS Tokyo Forum 2015

Ergative case• 「なる」表現は「する」表現や受身形に言い換える Clarify

the doers by replacing “Be” expressions with “Do” expressions– 私たちは6月に結婚することになりました。– そのマニュアルに修正を入れることになりました。– We came to the point of getting married in June.– That it came to the point of inserting correction in the manual.

– 私たちは6月に結婚します。– CS部門がそのマニュアルを修正します。– We get married in June.– CS section corrects that manual.

Information System Engineering, Electrosuisse Co. Ⓒ 2015 232015/4/10

Page 23: ISE - TAUS Tokyo Forum 2015

Singular or plural

• この章では以下のことを説明します:

• In this chapter thing below is explained:

• The following are explained in this chapter:

Information System Engineering, Electrosuisse Co. Ⓒ 2015 242015/4/10

Page 24: ISE - TAUS Tokyo Forum 2015

Summary 1

• Advantages for clients– Integrating an MT system in the current document

production system smoothly– Decreasing l10n costs– Getting various by-products (Ex. increasing the

document categories to be translated)

• Advantages for ISE (Solution provider)– Step to a new business filed (MT)– Adding new values to its solution business

Information System Engineering, Electrosuisse Co. Ⓒ 2015 252015/4/10

Page 25: ISE - TAUS Tokyo Forum 2015

Summary 2

• MT by-products– Information sharing in a global business– Speeding up in development and sales– Revitalizing and facilitating internal communication

using SNS

2015/4/10 Information System Engineering, Electrosuisse Co. Ⓒ 2015 26

Page 26: ISE - TAUS Tokyo Forum 2015

Summary 3

• What can ISE do for you:– Training for Japanese writing: Plain and Logical

Japanese 77 Rules– Improving Japanese documents (Pre-editing)– Training for English writing– Post-editing

2015/4/10 Information System Engineering, Electrosuisse Co. Ⓒ 2015 27

Page 27: ISE - TAUS Tokyo Forum 2015

• Contact:

Information System Engineering, Electrosuisse Co. Ⓒ 2015 28

ISEWakabayashi

[email protected]

electrosuisse japanNakamura

[email protected]/4/10