tailieunhanh - Báo cáo khoa học: "Minimum Bayes Risk Decoding for BLEU"

We present a Minimum Bayes Risk (MBR) decoder for statistical machine translation. The approach aims to minimize the expected loss of translation errors with regard to the BLEU score. We show that MBR decoding on N -best lists leads to an improvement of translation quality. We report the performance of the MBR decoder on four different tasks: the TCSTAR EPPS Spanish-English task 2006, the NIST Chinese-English task 2005 and the GALE Arabic-English and Chinese-English task 2006. The absolute improvement of the BLEU score is between for the TCSTAR task and for the GALE ChineseEnglish task. . | Minimum Bayes Risk Decoding for BLEU Nicola Ehling and Richard Zens and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl fur Informatik 6 - Computer Science Department RWTH Aachen University D-52056 Aachen Germany ehling zens ney @ Abstract We present a Minimum Bayes Risk MBR decoder for statistical machine translation. The approach aims to minimize the expected loss of translation errors with regard to the BLEU score. We show that MBR decoding on N-best lists leads to an improvement of translation quality. We report the performance of the MBR decoder on four different tasks the TC-STAR EPPS Spanish-English task 2006 the NIST Chinese-English task 2005 and the GALE Arabic-English and Chinese-English task 2006. The absolute improvement of the BLEU score is between for the TC-STAR task and for the GALE Chinese-English task. 1 Introduction In recent years statistical machine translation SMT systems have achieved substantial progress regarding their perfomance in international translation tasks TC-STAR NIST GALE . Statistical approaches to machine translation were proposed at the beginning of the nineties and found widespread use in the last years. The standard version of the Bayes decision rule which aims at a minimization of the sentence error rate is used in virtually all approaches to statistical machine translation. However most translation systems are judged by their ability to minimize the error rate on the word level or n-gram level. Common error measures are the Word Error Rate WER and the Position Independent Word Error Rate PER as well as evaluation metric on the n-gram level like the BLEU and NIST score that measure precision and fluency of a given translation hypothesis. 101 The remaining part of this paper is structured as follows after a short overview of related work in Sec. 2 we describe the MBR decoder in Sec. 3. We present the experimental results in Sec. 4 and conclude in Sec. 5. 2 Related Work MBR .