ISE - TAUS Tokyo Forum 2015
-
Upload
taus-enabling-better-translation -
Category
Presentations & Public Speaking
-
view
106 -
download
1
Transcript of ISE - TAUS Tokyo Forum 2015
Good to Go: More Preferable MT than HTISE MT Project 2015
ISEWakabayashi
electrosuisse japanNakamura
A leading company that pioneers new areas in technical communication technologies
• Date of establishment– October, 1979
• Business sites– Tokyo (Headquarters), Osaka, Kobe– Beijing, Shanghai, Switzerland
• Affiliated business– Electrosuisse Japan (Kobe)
• Our Business– Technical Communication– Interface Design– Systems Design &
Development– Technical Consulting
• Customer Fields– Japanese Governmental Agencies,
Educational/Research Institutions– Financial Institutions, Trading,
Manufacturing, Information Services
About ISE
Information System Engineering, Electrosuisse Co. Ⓒ 20152015/4/10 2
Background• Typical Japanese manufacturers
– In Japanese: Excessively detailed manuals – Other languages incl. English: not good at dealing with
them → outsourcing to LSPs → data accumulation– Result (fact): English (master) → EU langs → large TMs
• Depending on langs: almost 1 mil. TM segment pairs!!!
– Utilizing TMs in MT to make L10n more effective and efficient:
• Source language: English, not Japanese • Main usage: Eng to European languages • Which is the most appropriate MT tool for us?
3Information System Engineering, Electrosuisse Co. Ⓒ 20152015/4/10
Integrating MT system into L10n• Purpose: Efficient L10n for the documents of a
global business (Simship)
• ISE: System solution provider for clients (manufacturers), working for them (in house) 42015/4/10 Information System Engineering,
Electrosuisse Co. Ⓒ 2015
Project Phases
1. From English data: Utilizing TMs in MT to make L10n more effective and efficient: speeding up and lowering cost
2. From Japanese data: JA-EN MT-ize: utilizing MT to facilitate communication including SNS, customer support
Information System Engineering, Electrosuisse Co. Ⓒ 2015 52015/4/10
1st Phase : from EN
• Evaluation for Preparation– Source Language : EN– Target Language : FR, DE, CN– Domain : ICT Equipment– Document Type : Manual– Corpus volumes(After cleaning)
• FR : approx. 300K TUs• DE : approx. 250K TUs• CN : approx. 200K TUs
Information System Engineering, Electrosuisse Co. Ⓒ 2015 72015/4/10
Evaluation Process• Corpus Cleaning
– Removing duplicate and/or conflicting language pairs from corpora
• Corpus Training– Building up translation models using each corpus
• Tuning/Improvement Cycle– Applying translation rules, terminology, user dictionary,
and normalization dictionary• Measurement
– HT and MT+PE translation time• Evaluation
– Productivity, Quality
Information System Engineering, Electrosuisse Co. Ⓒ 2015 82015/4/10
BLEU/Perfect/TER/WER
• BLEU– Excellent levels
• Target score : 50+
• Perfect– Excellent levels except DE
• Target score: 25+
• TER/WER– Good scores for each language– Effective and practical levels
• Target score: Under 40Information System Engineering,
Electrosuisse Co. Ⓒ 2015 9
FR DE CN
BLEU CLEAR CLEAR CLEAR
Perfect CLEAR UNDER CLEAR
TER CLAER CLEAR CLEAR
WER CLEAR CLEAR CLEAR
2015/4/10
Productivity Evaluation
• Methodology– Translation Targets
• Pick up 120 sentences from manual
– Measurement translation time each lang.• HT• MT + PE1• MT + PE2
2015/4/10 Information System Engineering, Electrosuisse Co. Ⓒ 2015 10
Productivity Evaluation
• Achieved Doubled Productivity Compared with Standard HT
Information System Engineering, Electrosuisse Co. Ⓒ 2015 112015/4/10
Quality Evaluation
• Methodology– Testers evaluate the following 5 translations:
i. Original translationii. HTiii. MTiv. MT + PE1v. MT + PE2
– Testers do not know which is which– Perfect score: 100
Information System Engineering, Electrosuisse Co. Ⓒ 2015 122015/4/10
Quality Evaluation
• From HT– Excellent!
• From MT– Good score :FR, CN
• From MT+PE– Achieved the original translation quality level– DE: got the scores to B after PE (from MT: C)– Realized the same quality as those of the originals
Information System Engineering, Electrosuisse Co. Ⓒ 2015 13
FR DE CNReference(Original) A B A
HT A A AMT A C BMT+PE1 A B AMT+PE2 A B A
2015/4/10
Targeted MT System
Information System Engineering, Electrosuisse Co. Ⓒ 2015 142015/4/10
2nd Phase: JA to EN
Information System Engineering, Electrosuisse Co. Ⓒ 2015 152015/4/10
Problems in JA>EN• Terminology/word usages (expressions)• Katakana words• Itemization• Viewing points• Parallelism• Modifications• Grammar• No subjects: you or we (instruction or descriptive)• Passive• Syntax
– Ha-ga construction – Ergative case– Others
• Singular or plural• Definite article• Pronominalization
Information System Engineering, Electrosuisse Co. Ⓒ 2015 162015/4/10
Improvements
• Improving Japanese text quality by applying the Plain and Logical Japanese 77 Rules• Almost a half of them are effective (subjective evaluation)
– Effective 26– Somewhat effective 16– None 35
Information System Engineering, Electrosuisse Co. Ⓒ 2015 172015/4/10
Viewing Points
• 主語も述語も共有しない重文は複数の文に分ける Divide the compound sentence that doesn’t share the subjects and predicatives– まず手動でおおよそのポイントを調整し、その調整が合ったところを自動的に認識して、正確なポイントを検出します。
– First you adjust the approximate point manually, recognizing the place where the adjustment is agreeable automatically, you detect the accurate point.
– 手動でおおよそのポイントを合わせます。機器は、その調整されたポイントを自動的に認識して、正確なポイントを検出します。
– Adjust the approximate point manually. The equipment, recognizing the adjusted points automatically, detects the accurate point.
Information System Engineering, Electrosuisse Co. Ⓒ 2015 182015/4/10
Modifications
• 連用修飾の数量表現は、連体修飾句に言い換える Replace the continuous modifications with adnominal modifications in quantity expression – このメーリングリストのファシリテーターが複数必要になります。
– The facilitator of this mailing list is several needed.– このメーリングリストには、複数のファシリテーターが必要です。
– Several facilitators are necessary in this mailing list.Information System Engineering,
Electrosuisse Co. Ⓒ 2015 192015/4/10
Parallelism
• 並列関係にあるものは、並列であることを明示する(パラレリズム) Show the things in the parallel grammatical form if they are parallel– 正常運転を行ったときはプラスの反応を示し、逆転する場合はマイナスの反応が示されます。
– When normal operation shows the plus reaction, when it is reversed, the negative reaction is shown.
– 正常運転を行なったときはプラスの反応が示され、逆転運転を行なったときはマイナスの反応が示されます。
– When doing normal operation, the plus reaction is shown, when doing reversal driving, the negative reaction is shown.
Information System Engineering, Electrosuisse Co. Ⓒ 2015 202015/4/10
No subjects: you or we (instruction or descriptive)
• 抽象的な品詞よりも、より具体的な品詞を使うUse concrete words, not using abstract words– 開始点Aより終点Bまで実線を引く。
– From the start point A the solid line is pulled to terminus B.
– 実線を開始点Aから終点Bまで書く。
– Write the solid line from the start point A to the end point B.
Information System Engineering, Electrosuisse Co. Ⓒ 2015 212015/4/10
Ha-ga structure
• 主題を表す副助詞「は」は使わない Don’t use the restrictive particle, “Ha,” showing topic– このマニュアルは、イラストが効果的です。
– This as for the manual, illustration is effective.
– このマニュアルに描かれているイラストは効果的です。
– The illustration which is drawn in this manual is effective. Singular or plural?
Information System Engineering, Electrosuisse Co. Ⓒ 2015 222015/4/10
Ergative case• 「なる」表現は「する」表現や受身形に言い換える Clarify
the doers by replacing “Be” expressions with “Do” expressions– 私たちは6月に結婚することになりました。– そのマニュアルに修正を入れることになりました。– We came to the point of getting married in June.– That it came to the point of inserting correction in the manual.
– 私たちは6月に結婚します。– CS部門がそのマニュアルを修正します。– We get married in June.– CS section corrects that manual.
Information System Engineering, Electrosuisse Co. Ⓒ 2015 232015/4/10
Singular or plural
• この章では以下のことを説明します:
• In this chapter thing below is explained:
• The following are explained in this chapter:
Information System Engineering, Electrosuisse Co. Ⓒ 2015 242015/4/10
Summary 1
• Advantages for clients– Integrating an MT system in the current document
production system smoothly– Decreasing l10n costs– Getting various by-products (Ex. increasing the
document categories to be translated)
• Advantages for ISE (Solution provider)– Step to a new business filed (MT)– Adding new values to its solution business
Information System Engineering, Electrosuisse Co. Ⓒ 2015 252015/4/10
Summary 2
• MT by-products– Information sharing in a global business– Speeding up in development and sales– Revitalizing and facilitating internal communication
using SNS
2015/4/10 Information System Engineering, Electrosuisse Co. Ⓒ 2015 26
Summary 3
• What can ISE do for you:– Training for Japanese writing: Plain and Logical
Japanese 77 Rules– Improving Japanese documents (Pre-editing)– Training for English writing– Post-editing
2015/4/10 Information System Engineering, Electrosuisse Co. Ⓒ 2015 27
• Contact:
Information System Engineering, Electrosuisse Co. Ⓒ 2015 28
ISEWakabayashi
electrosuisse japanNakamura
[email protected]/4/10