tailieunhanh - Báo cáo khoa học: "Going Beyond AER: An Extensive Analysis of Word Alignments and Their Impact on MT"

This paper presents an extensive evaluation of five different alignments and investigates their impact on the corresponding MT system output. We introduce new measures for intrinsic evaluations and examine the distribution of phrases and untranslated words during decoding to identify which characteristics of different alignments affect translation. We show that precision-oriented alignments yield better MT output (translating more words and using longer phrases) than recalloriented alignments. . | Going Beyond AER An Extensive Analysis of Word Alignments and Their Impact on MT Necip Fazil Ayan and Bonnie J. Dorr Institute of Advanced Computer Studies UMIACS University of Maryland College Park MD 20742 nfa bonnie @ Abstract This paper presents an extensive evaluation of five different alignments and investigates their impact on the corresponding MT system output. We introduce new measures for intrinsic evaluations and examine the distribution of phrases and untranslated words during decoding to identify which characteristics of different alignments affect translation. We show that precision-oriented alignments yield better MT output translating more words and using longer phrases than recall-oriented alignments. 1 Introduction Word alignments are a by-product of statistical machine translation MT and play a crucial role in MT performance. In recent years researchers have proposed several algorithms to generate word alignments. However evaluating word alignments is difficult because even humans have difficulty performing this task. The state-of-the art evaluation metric alignment error rate AER attempts to balance the precision and recall scores at the level of alignment links Och and Ney 2000 . Other metrics assess the impact of alignments externally . different alignments are tested by comparing the corresponding MT outputs using automated evaluation metrics . BLEU Papineni et al. 2002 or METEOR Banerjee and Lavie 2005 . However these studies showed that AER and BLEU do not correlate well Callison-Burch et al. 2004 Goutte et al. 2004 Ittycheriah and Roukos 2005 . Despite significant AER improvements achieved by several researchers the improvements in BLEU scores are insignificant or at best small. This paper demonstrates the difficulty in assessing whether alignment quality makes a difference in MT performance. We describe the impact of certain alignment characteristics on MT performance but also identify several alignment-related factors