MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime...
-
Upload
irma-chapman -
Category
Documents
-
view
231 -
download
0
description
Transcript of MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime...
![Page 1: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/1.jpg)
MEMT:Multi-Engine Machine Translation Guided by Explicit Word Matching
Faculty: Alon Lavie, Jaime Carbonell
Students and Staff:
Gregory Hanneman, Justin Merrill(Shyamsundar Jayaraman, Satanjeev Banerjee)
![Page 2: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/2.jpg)
October 26, 2005 MEMT 2
MEMT Goals and Approach• Scientific Challenge:
– How to combine the output of multiple MT engines into a synthetic output that outperforms the originals in translation quality
– Synthetic combination of the output from the original systems, NOT just selecting the best system
• Engineering Challenge:– How to integrate multiple distributed translation
engines and the MEMT combination engine in a common framework that supports ongoing development and evaluation
![Page 3: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/3.jpg)
October 26, 2005 MEMT 3
Synthetic Combination MEMT• Approach:
– Original MT engines treated as “black boxes” – each provides a single “best” translation
– Explicitly identify and align the words that are common between any pair of translations
– Use the alignments as reinforcement and as indicators of possible locations for the words in the combined output
– Each engine has a “confidence” that is used for the words that it contributes
– Decoder searches for an optimal synthetic combination of words and phrases that optimizes a scoring function that combines the alignment confidence weights and a LM score
![Page 4: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/4.jpg)
October 26, 2005 MEMT 4
The Word Alignment Matcher• Developed by Satanjeev Banerjee as a
component in our METEOR Automatic MT Evaluation metric
• Finds maximal alignment match with minimal “crossing branches”
• Allows alignment of:– Identical words– Morphological variants of words– Synonymous words (based on WordNet synsets)
• Implementation: Clever search algorithm for best match using pruning of sub-optimal sub-solutions
![Page 5: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/5.jpg)
October 26, 2005 MEMT 5
Matcher Examplethe sri lanka prime minister criticizes the leader of the country
President of Sri Lanka criticized by the country’s Prime Minister
![Page 6: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/6.jpg)
October 26, 2005 MEMT 6
The MEMT Algorithm• Algorithm builds collections of partial hypotheses of
increasing length • Partial hypotheses are extended by selecting the “next
available” word from one of the original systems • Sentences are initially assumed synchronous:
– Each word is either aligned with another word or is an alternative of another word
• Extending a partial hypothesis with a word “pulls” and “uses” its aligned words with it, and marks its alternatives as “used” – “vectors” keep track of this
• Partial hypotheses are scored and ranked• Pruning and re-combination• Hypothesis can end if any original system proposes an
end of sentence as next word
![Page 7: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/7.jpg)
October 26, 2005 MEMT 7
Scoring MEMT Hypotheses• Scoring:
– Word confidence score [0,1] based on engine confidence and reinforcement from alignments of the words
– LM score based on trigram LM– Log-linear combination: weighted sum of
logs of confidence score and LM score– Select best scoring hypothesis based on:
• Total score (bias towards shorter hypotheses)• Average score per word
![Page 8: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/8.jpg)
October 26, 2005 MEMT 8
Additional Parameters• Parameters:
– “lingering word” horizon: how long is a word allowed to linger when words following it have already been used?
– “lookahead” horizon: how far ahead can we look for an alternative for a word that is not aligned?
– “POS matching”: limit search for an alternative to only words of the same POS
![Page 9: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/9.jpg)
October 26, 2005 MEMT 9
ExampleIBM: victims russians are one man and his wife and abusing their eight
year old daughter plus a ( 11 and 7 years ) man and his wife and driver , egyptian nationality . : 0.6327
ISI: The victims were Russian man and his wife, daughter of the most from the age of eight years in addition to the young girls ) 11 7 years ( and a man and his wife and the bus driver Egyptian nationality. : 0.7054
CMU: the victims Cruz man who wife and daughter both critical of the eight years old addition to two Orient ( 11 ) 7 years ) woman , wife of bus drivers Egyptian nationality . : 0.5293
MEMT Sentence : Selected : the victims were russian man and his wife and daughter of the
eight years from the age of a 11 and 7 years in addition to man and his wife and bus drivers egyptian nationality . 0.7647 -3.25376
Oracle : the victims were russian man and wife and his daughter of the eight years old from the age of a 11 and 7 years in addition to the man and his wife and bus drivers egyptian nationality young girls . 0.7964 -3.44128
![Page 10: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/10.jpg)
October 26, 2005 MEMT 10
Current System• Initial development tests performed on
TIDES 2003 Arabic-to-English MT data, using IBM, ISI and CMU SMT system output
• Evaluation tests performed on Arabic-to-English EBMT Apptek and SYSTRAN system output and on three Chinese-to-English COTS systems
![Page 11: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/11.jpg)
October 26, 2005 MEMT 11
Experimental Results:Arabic-to-English
System METEOR ScoreApptek .4241EBMT .4231Systran .4405Choosing best online translation .4432MEMT .5185Best hypothesis generated by MEMT .5883
![Page 12: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/12.jpg)
October 26, 2005 MEMT 12
Experimental Results:Chinese-to-EnglishSystem METEOR Score
Online Translator A .4917Online Translator B .4859Online Translator C .4910Choosing best online translation .5381MEMT .5301Best hypothesis generated by MEMT .5840
![Page 13: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/13.jpg)
October 26, 2005 MEMT 13
Demo
![Page 14: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/14.jpg)
October 26, 2005 MEMT 14
Architecture and Engineering• Challenge: How do we construct an effective
architecture for running MEMT within large-scale distributed projects?– Example: GALE Project– Multiple MT engines running at different locations– Input may be text or output of speech recognizers,
Output may go downstream to other applications (IE, Summarization, TDT)
• Approach: Using IBM’s UIMA: Unstructured Information Management Architecture– Provides support for building robust processing
“workflows” with heterogeneous components– Components act as “annotators” at the character
level within documents
![Page 15: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/15.jpg)
October 26, 2005 MEMT 15
UIMA-based MEMT• MT engines and MEMT engine are set up as distributed
servers:– Communication over socket connections– Sentence-by-sentence translation
• Java “wrappers” convert these into UIMA-style annotator components
• UIMA-based “workflows” implement a variety of a-synchronous tasks, with results stored in a common Annotations Database (ADB)– Translation workflows– MEMT workflow– Evaluation/scoring workflow
• ADB and ADB Collection Reader/Consumer components developed at CMU by Eric Nyberg’s group
![Page 16: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/16.jpg)
October 26, 2005 MEMT 16
UIMA-based MEMT• Translation Workflow:
– Retrieve document from ADB– “Annotate” document with translation annotator X– Write back new “annotation” into ADB
![Page 17: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/17.jpg)
October 26, 2005 MEMT 17
UIMA-based MEMT• MEMT Workflow:
– Retrieve document translation annotations labeled by X, Y, Z from ADB
– “Annotate” the document with a new MEMT annotation– Write back MEMT annotation into ADB
![Page 18: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/18.jpg)
October 26, 2005 MEMT 18
Conclusions• New sentence-level MEMT approach with
promising performance• Easy to run on both research and COTS
systems• UIMA-based architecture design for
effective integration in large distributed systems/projects– Pilot study has been very positive– Can serve as a model for integration
framework(s) under GALE
![Page 19: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/19.jpg)
October 26, 2005 MEMT 19
Open Research Issues• Main Open Research Issues:
– Improvements to the underlying algorithm: better word alignments, “artificial” word alignments
– Confidence scores at the sentence or word level– Decoding is still suboptimal
• Oracle scores show there is much room for improvement
• Need for additional discriminant features– Extend approach to Multi-Engine SR combination– Engineering issues: synchronization, human friendly
interfaces with workflows
![Page 20: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/20.jpg)
October 26, 2005 MEMT 20
References• 2005, Jayaraman, S. and A. Lavie
. "Multi-Engine Machine Translation Guided by Explicit Word Matching" . In Companion Volume of Proceedings of the 43th Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005.
• 2005, Jayaraman, S. and A. Lavie. "Multi-Engine Machine Translation Guided by Explicit Word Matching" . In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT-2005), Budapest, Hungary, May 2005.
![Page 21: MEMT: Multi-Engine Machine Translation Guided by Explicit Word Matching Faculty: Alon Lavie, Jaime Carbonell Students and Staff: Gregory Hanneman, Justin.](https://reader036.fdocuments.net/reader036/viewer/2022062317/5a4d1b947f8b9ab0599c2c22/html5/thumbnails/21.jpg)
October 26, 2005 MEMT 21