Integrating Machine Translation with Translation Memory: A Practical Approach

18
Introduction Methodology Discussion Integrating Machine Translation with Translation Memory: A Practical Approach Panagiotis Kanavos and Dimitrios Kartsaklis November 4, 2010 Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 1/ 18

Transcript of Integrating Machine Translation with Translation Memory: A Practical Approach

Page 1: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

Integrating Machine Translation with TranslationMemory: A Practical Approach

Panagiotis Kanavos and Dimitrios Kartsaklis

November 4, 2010

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 1/ 18

Page 2: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

Introduction

I Despite the ongoing research and the progress on the field,Machine Translation has not been widely accepted by theprofessional translation industry

I Common criticisms:I MT is only suitable for draft translations of e-mails and web

pagesI MT is not efficient for morphologically rich languagesI MT is useful only to large companies owning a wealth of

resources

I In a nutshell : MT is something for researchers to play aroundwith

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 2/ 18

Page 3: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

A Case Study

I How MT can be incorporated into professional translationworkflows, with limited resources, in ways that significantlyincrease productivity.

I We combine both statistical and rule-based MT systems withTranslation Memory software using two approaches:

I The on demand, sentence-by-sentence application of MTI The one-time application of MT into the whole translation

project

I The case study is conducted in production conditions, withfinal deliverables that require the highest translation quality.

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 3/ 18

Page 4: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Our setting

I Language pair: English to Greek

I Text to be translated: Two Informatics books: onetechnical guide and one academic textbook.

I TM size: 140,000 TUs coming from in-domain texts

I Terminology DB size: 30,000 entries

I Fuzzy threshold: 70%

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 4/ 18

Page 5: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Software programs and combinations

I MT systems:I Statistical: MosesI Rule-based: Systran

I CAT programs:I Swordfish II (Java application) over LinuxI Deja Vu X over MS WindowsI Wordfast, an MS Word macro template

I Three combinations, based on practical factors:I Sentence-by-sentence workflow with Swordfish/MosesI Sentence-by-sentence workflow with Wordfast/SystranI One-time MT application workflow with Deja Vu X/Moses

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 5/ 18

Page 6: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Swordfish/Moses combination

I Swordfish: Allows connection to external programs or scriptsI Connection with Moses achieved with a custom Python scriptI Basic workflow:

if TM match > 80% thenaccept fuzzy match for post-edit

else if 70% < TM match =< 80% thenevaluate the fuzzy matchif quality not acceptable then

apply MTend if

elseapply MTif quality not acceptable then

type the translation from scratchend if

end if

post-edit

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 6/ 18

Page 7: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Swordfish/Moses combination: Results

Book 1 : Instructive guide, Book 2 : Textbook

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 7/ 18

Page 8: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Wordfast/Systran combination

I Wordfast: A macro template working on top of MS Word

I Great deal of customization through MS Word macros

I Rule-based version of Systran, supporting user dictionaries

I Basic workflow:if TM match < 70% then

apply pre-editing macrossend segment to MT engineapply post-editing macroswhile MT result not good do

amend Systran user dictionary and re-send segment to MTend while

elseaccept the translation for post-edit

end if

post-edit

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 8/ 18

Page 9: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Wordfast/Systran combination: Results

Book 1 : Instructive guide, Book 2 : Textbook

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 9/ 18

Page 10: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Deja Vu X/Moses combination

I Deja Vu X: similar concept to SwordfishI However: No way of integration with an MT system, so the

only option is pre-translation of the whole project with MosesI Send for MT only segments with no TM matches or TM

matches below 80%I Pre-translation stage:

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 10/ 18

Page 11: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Deja Vu X/Moses combination

I Basic workflow:if TM match > 80% then

accept the translation for post-editelse

evaluate MT translationif quality not acceptable then

if any TM match exists (between 70-80%) thenaccept the translation for post-edit

elseapply “auto-assemble” featureif quality not acceptable then

type the translation from scratchend if

end ifend if

end if

post-edit

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 11/ 18

Page 12: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

ConfigurationSegment-by-segment workflowsOne-time MT application workflow

Deja Vu X/Moses combination: Results

Book 1 : Instructive guide, Book 2 : Textbook

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 12/ 18

Page 13: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

Productivity increase

I MT & TM combination: Productivity increased to a level notpossible by applying either technology in isolation:

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 13/ 18

Page 14: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

Important factors

I Quantity and quality of TM entriesI The domain of the translation material used to train the

statistical MT systemI The above impose serious limitations for those who work with

small texts in many different domains. Rule-based systems aremore suitable in such cases

I Language pair: Coding efficient user dictionaries withmorphologically rich languages is difficult and requires sometrial and error. Phrase-based systems like Moses have betterperformance

I Style of text: Productivity is higher with repetitive text andstep-by-step instructions

I User expertise with all technologies involved

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 14/ 18

Page 15: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

A proposal for a unified application

I For general acceptance by the professional translationcommunity, MT should be integrated with TM into anintuitive unified system

I Basically a TM environment, with the MT engine as an extracomponent working on top of it

I MT suggestions should be presented in a controlled andselective way

I Basic components:I A 2-column translation grid for source and target segmentsI Terminology managementI MT engineI Alignment toolI Quality assurance control

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 15/ 18

Page 16: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

Advanced issues

I Automation of the training process with TM databases

I Statistical systems require considerable computing resources.A solution: MT as Software As a Service (SaaS)

I Terminology databases can be used for more than referencepurposes

I Additional entry fields for coding MT dictionary entries(Systran)

I Linguistic information can be used for creating factored models(Moses)

I Automatic suggestions-as-you-type (TransType, Caitra)

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 16/ 18

Page 17: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

Summary

I The combination of MT with TM results in significantproductivity increase not feasible in a TM-only environment

I Currently there is not a straightforward way for doing that

I Work is in progress by the authors towards this purpose, inthe form of a Software Specification document that willdescribe the design and the components of such a system inevery detail

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 17/ 18

Page 18: Integrating Machine Translation with Translation Memory: A Practical Approach

IntroductionMethodology

Discussion

Thank you!

Any questions?

Panagiotis Kanavos and Dimitrios Kartsaklis Integrating MT with TM: A Practical Approach 18/ 18