tailieunhanh - Báo cáo khoa học: "a Method for Automatic Evaluation of Machine Translation"

Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent evaluations. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 311-318. Bleu a Method for Automatic Evaluation of Machine Translation Kishore Papineni Salim Roukos Todd Ward and Wei-Jing Zhu IBM T. J. Watson Research Center Yorktown Heights NY 10598 USA papineni roukos toddward weijing @ Abstract Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a method of automatic machine translation evaluation that is quick inexpensive and language-independent that correlates highly with human evaluation and that has little marginal cost per run. We present this method as an automated understudy to skilled human judges which substitutes for them when there is need for quick or frequent 1 Introduction Rationale Human evaluations of machine translation MT weigh many aspects of translation including adequacy fidelity and fluency of the translation Hovy 1999 White and O Connell 1994 . A comprehensive catalog of MT evaluation techniques and their rich literature is given by Reeder 2001 . For the most part these various human evaluation approaches are quite expensive Hovy 1999 . Moreover they can take weeks or months to finish. This is a big problem because developers of machine translation systems need to monitor the effect of daily changes to their systems in order to weed out bad ideas from good ideas. We believe that MT progress stems from evaluation and that there is a logjam of fruitful research ideas waiting to be released from 1 So we call our method the bilingual evaluation understudy Bleu. the evaluation bottleneck. Developers would benefit from an inexpensive automatic evaluation that is quick language-independent and correlates highly with human evaluation. We propose such an evaluation method in this paper. Viewpoint How does one measure translation performance .