APPLICATIONS 1: MACHINE TRANSLATION I: MACHINE TRANSLATION …
Semantic Evaluation of Machine Translation
-
Upload
cleatus-angel -
Category
Documents
-
view
36 -
download
0
description
Transcript of Semantic Evaluation of Machine Translation
Semantic Evaluation of Machine Translation
Billy Wong, City University of Hong Kong21st May 2010
Introduction Surface text similarity is not a reliable
indicator in automatic MT evaluation Insensitive to variation of translation
Deeper linguistic analysis is preferred WordNet is widely used for matching synonyms
E.g. METEOR (Banerjee & Lavie 2005), TERp (Snover et al. 2009), ATEC (Wong & Kit 2010)…
Is the similarity of words between MT outputs and references fully described?
Motivation WordNet
Granularity of sense distinctions is highly fine-grained
Word pairs not in the same sense: [mom vs mother], [safeguard vs security], [expansion vs
extension], [journey vs tour], [impact vs influence]…etc. Word pairs in similar meaning
Problematic if ignore them in evaluation What is needed is a word similarity measure
Proposal: Utilization of word similarity measures in
automatic MT evaluation
Word Similarity Measures Knowledge-based (WordNet)
Wup (Wu & Palmer 1994) Res (Resnik 1995) Jcn (Jiang & Conrath 1997) Hso (Hirst & St-Onge 1998) Lch (Leacock & Chodorow 1998) Lin (Lin 1998) Lesk (Banerjee & Pedersen 2002)
Corpus-based LSA (Landauer et al. 1998)
Experiment Three questions:
To what extent two words are considered similar? Which word similarity measure(s) is/are more
appropriate to use? How much performance gain an MT evaluation
metric can obtain by incorporating word similarity measures?
Setting Data
MetricsMATR08 development data 1992 MT outputs 8 MT systems 4 references
Evaluation metric Unigram matching
Exact match / synonym / semantically similar Same weight
Three variants Precision (p), recall (r) and F-measure (f)
where c: MT output t: reference translation
Result (1) Correlation thresholds of each measure
Result (2) Correlation of the metric
Conclusion The importance of semantically similar words
in automatic MT evaluation Two word similarity measures, wup and LSA,
perform relatively better
Remaining problems Semantic similarity vs. Semantic relatedness
E.g. [committee vs chairman] (LSA) Most WordNet similarity measures run on verbs
and nouns only
Thank you