Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "CD ER: Efficient MT Evaluation Using Block Movements"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Most state-of-the-art evaluation measures for machine translation assign high costs to movements of word blocks. In many cases though such movements still result in correct or almost correct sentences. In this paper, we will present a new evaluation measure which explicitly models block reordering as an edit operation. Our measure can be exactly calculated in quadratic time. Furthermore, we will show how some evaluation measures can be improved by the introduction of word-dependent substitution costs. . | CDer Efficient MT Evaluation Using Block Movements Gregor Leusch and Nicola Ueffing and Hermann Ney Lehrstuhl fur Informatik VI Computer Science Department RWTH Aachen University D-52056 Aachen Germany leusch ueffing ney @i6.informatik.rwth-aachen.de Abstract Most state-of-the-art evaluation measures for machine translation assign high costs to movements of word blocks. In many cases though such movements still result in correct or almost correct sentences. In this paper we will present a new evaluation measure which explicitly models block reordering as an edit operation. Our measure can be exactly calculated in quadratic time. Furthermore we will show how some evaluation measures can be improved by the introduction of word-dependent substitution costs. The correlation of the new measure with human judgment has been investigated systematically on two different language pairs. The experimental results will show that it significantly outperforms state-of-the-art approaches in sentence-level correlation. Results from experiments with word dependent substitution costs will demonstrate an additional increase of correlation between automatic evaluation measures and human judgment. 1 Introduction Research in machine translation MT depends heavily on the evaluation of its results. Especially for the development of an MT system an evaluation measure is needed which reliably assesses the quality of MT output. Such a measure will help analyze the strengths and weaknesses of different translation systems or different versions of the same system by comparing output at the sentence level. In most applications of MT understandability for humans in terms of readability as well as semantical correctness should be the evaluation criterion. But as human evaluation is tedious and cost-intensive automatic evaluation measures are used in most MT research tasks. A high correlation between these automatic evaluation measures and human evaluation is thus desirable. State-of-the-art .