tailieunhanh - Báo cáo khoa học: "A machine learning approach to the automatic evaluation of machine translation"

We present a machine learning approach to evaluating the wellformedness of output of a machine translation system, using classifiers that learn to distinguish human reference translations from machine translations. This approach can be used to evaluate an MT system, tracking improvements over time; to aid in the kind of failure analysis that can help guide system development; and to select among alternative output strings. The method presented is fully automated and independent of source language, target language and domain. . | A machine learning approach to the automatic evaluation of machine translation Simon Corston-Oliver Michael Gamon and Chris Brockett Microsoft Research One Microsoft Way Redmond WA 98052 USA simonco mgamon chrisbkt @ Abstract We present a machine learning approach to evaluating the well-formedness of output of a machine translation system using classifiers that learn to distinguish human reference translations from machine translations. This approach can be used to evaluate an MT system tracking improvements over time to aid in the kind of failure analysis that can help guide system development and to select among alternative output strings. The method presented is fully automated and independent of source language target language and domain. 1 Introduction Human evaluation of machine translation MT output is an expensive process often prohibitively so when evaluations must be performed quickly and frequently in order to measure progress. This paper describes an approach to automated evaluation designed to facilitate the identification of areas for investigation and improvement. It focuses on evaluating the wellformedness of output and does not address issues of evaluating content transfer. Researchers are now applying automated evaluation in MT and natural language generation tasks both as system-internal goodness metrics and for the assessment of output. Langkilde and Knight 1998 for example employ n-gram metrics to select among candidate outputs in natural language generation while Ringger et al. 2001 use ngram perplexity to compare the output of MT systems. Su et al. 1992 Alshawi et al. 1998 and Bangalore et al. 2000 employ string edit distance between reference and output sentences to gauge output quality for MT and generation. To be useful to researchers however assessment must provide linguistic information that can guide in identifying areas where work is required. See Nyberg et al. 1994 for useful discussion of this issue. The better the MT .

TỪ KHÓA LIÊN QUAN