Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement
description
Transcript of Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement
Error Analysis of Two Types of Grammar for the purpose
ofAutomatic Rule Refinement
Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell
Language Technologies Institute
Carnegie Mellon University
AMTA 2004
October 1 AMTA 2004 2
Outline
• Automatic Rule Refinement• AVENUE and resource-poor scenarios• Experiment
• Data (eng2spa)• Two types of grammar• Evaluation results• Error analysis• RR required for each type
• Conclusions and Future Work
October 1 AMTA 2004 3
General- MT output still requires post-editing- Current systems do not recycle post-editing efforts
back into the system, beyond adding as new training data
within Avenue- Resource-poor scenarios: lack of manual grammar
or very small initial grammar- Need to validate elicitation corpus and
automatically learned translation rules
Motivation for Automatic RR
October 1 AMTA 2004 4
Motivation for Automatic RRGeneral- MT output still requires post-editing- Current systems do not recycle post-editing efforts
back into the system, beyond adding as new training data
within Avenue- Resource-poor scenarios: lack of manual grammar
or very small initial grammar- Need to validate elicitation corpus and
automatically learned translation rules
October 1 AMTA 2004 5
AVENUE and resource-poor scenarios
• No e-data available (often spoken tradition) SMT or EBMT
• lack of computational linguists to write a grammar
So how can we even start to think about MT?– That’s what AVENUE is all about
Elicitation Corpus + Automatic Rule Learning + Rule Refinement
What do we usually have available in resource-poor scenarios? Bilingual users
October 1 AMTA 2004 6
AVENUE overview
Learning
Module
Transfer Rules
Lexical Resources
Run Time Transfer System
Lattice
Translation
Correction
Tool
Word-Aligned Parallel Corpus
Elicitation Tool
Elicitation Corpus
Elicitation Rule Learning
Run-Time System
Rule Refinement
Rule
Refinement
Module
Handcrafted rules
Morphology
Morpho-logical analyzer
October 1 AMTA 2004 7
Automatic and Interactive RLR
SLS3
SLSentence1– TLSentence1 SLSentence2– TLSentence2
Automatically Learned Rule R
TLS3
1st step
2nd step
TLS3’
RR module
R’ (R refined)
SLS3
TLS3’
October 1 AMTA 2004 8
Interactive Elicitation of MT errorsAssumptions:
• non-expert bilingual users can reliably detect and minimally correct MT errors, given:– SL sentence (I saw you)– up to 5 TL sentences (Yo vi tú, ...)– word-to-word alignments (I-yo, saw-vi, you-tú)– (context)
• using an online GUI: the Translation Correction Tool (TCTool)
Goal: Simplify MT correction task maximally
User studies: 90% error detection accuracy and 73% error classification [LREC 2004]
October 1 AMTA 2004 11
TCTool v0.1•Add a word•Delete a word•Modify a word•Change word order
Actions:
October 1 AMTA 2004 12
RR Framework• Find best RR operations given a:
• grammar (G),
• lexicon (L),
• (set of) source language sentence(s) (SL),
• (set of) target language sentence(s) (TL),
• its parse tree (P), and
• minimal correction of TL (TL’)
such that TQ2 > TQ1• Which can also be expressed as:
max TQ(TL|TL’,P,SL,RR(G,L))
October 1 AMTA 2004 13
Types of RR operations• Grammar:
– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]
– R0 R1 [=R0 + constr] Cov[R0] Cov[R1]
– R0 R1[=R0 + constr= -]
R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]
• Lexicon– Lex0 Lex0 + Lex1[=Lex0 + constr]
– Lex0 Lex1[=Lex0 + constr]
– Lex0 Lex1[Lex0 + TLword] Lex1 (adding lexical item)
bifurcate
refine
October 1 AMTA 2004 15
Data: English - Spanish
Training• First 200 sentences from AVENUE Elicitation
Corpus• Lexicon: extracted semi-automatically from first
400 sentences (442 entries)
Test• 32 sentences manually selected from the next 200
sentences in the EC to showcase a variety of MT errors
October 1 AMTA 2004 16
Manual grammar
• 12 rules (2 S, 7 NP, 3 VP)
• Produces 1.6 different translations on average
October 1 AMTA 2004 17
Learned Grammar + feature constraints
• 316 rules (194 S, 43 NP, 78 VP, 1 PP)• emulated decoder by reordering of 3 rules
• Produces 18.6 different translations on average
October 1 AMTA 2004 18
Comparing Grammar Output: Results
• Manually:
• Automatic MT Evaluation:NIST BLEU METEOR
Manual grammar 4.3 0.16 0.6Learned grammar 3.7 0.14 0.55
October 1 AMTA 2004 19
Error Analysis• Most of the errors produced by the manual grammar can be
classified into:– lack of subj-pred agreement– wrong word order of object pronouns (clitic)– wrong preposition– wrong form (case)– OOV words
• On top of these, the learned grammar output exhibited errors of the following type:– lack of agreement constraints– missing preposition– over-generalization
October 1 AMTA 2004 20
• Same (both good)
• Manual Grammar better
• Learned Grammar better
• Different (both bad)
Examples
October 1 AMTA 2004 21
Types of RR required for
Manual Grammar
• Bifurcate a rule to code an exception:– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]
– R0 R1[=R0 + constr= -]
R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]
Learned Grammar
• Adjust feature constraints, such as agreement:– R0 R1 [=R0 +|- constr] Cov[R0] Cov[R1]
October 1 AMTA 2004 22
Conclusions
• TCTool + RR can improve both hand-crafted and automatically learned grammars.
• In the current experiment, MT errors differ almost 50% of the time, depending on the type of grammar.
• Manual G will need to be refined to encode exceptions, whereas Learned G will need to be refined to achieve the right level of generalization.
• We expect the RR to give the most leverage when combined with the Learned Grammar.
October 1 AMTA 2004 23
Future Work
• Experiment where user corrections are used both as new training examples for RL and to refine the existing grammar with the RR module.
• Investigate using reference translations to refine MT grammars automatically... but much harder since they are not minimal post-editions.
October 1 AMTA 2004 24
Questions???
Thank you!
October 1 AMTA 2004 28
RR Framework• types of operations: bifurcate, make more
specific/general, add blocking constraints, etc.
• formalizing error information (clue word)
• finding triggering features