tailieunhanh - Báo cáo khoa học: "ADP based Search Algorithm for Statistical Machine Translation"

We introduce a novel search algorithm for statistical machine translation based on dynamic programming (DP). During the search process two statistical knowledge sources are combined: a translation model and a bigram language model. This search algorithm expands hypotheses along the positions of the target string while guaranteeing progressive coverage of the words in the source string. We present experimental results on the Verbmobil task. | A DP based Search Algorithm for Statistical Machine Translation s. Niefien s. Vogel H. Ney and c. Tillmann Lehrstuhl fur Informatik VI RWTH Aachen - University of Technology D-52056 Aachen Germany Email niessen@informatik. rwth-aachen. de Abstract We introduce a novel search algorithm for statistical machine translation based on dynamic programming DP . During the search process two statistical knowledge sources are combined a translation model and a bigram language model. This search algorithm expands hypotheses along the positions of the target string while guaranteeing progressive coverage of the words in the source string. We present experimental results on the Verbmobil task. 1 Introduction In this paper we address the problem of finding the most probable target language representation of a given source language string. In our approach we use a DP based search algorithm which sequentially visits the target string positions while progressively considering the source string words. The organization of the paper is as follows. After reviewing the statistical approach to machine translation we first describe the statistical knowledge sources used during the search process. We then present our DP based search algorithm in detail. Finally experimental results for a bilingual corpus are reported. Statistical Machine Translation In statistical machine translation the goal of the search strategy can be formulated as follows We are given a source language French string f 1 fj which is to be translated into a target language English string e ei. .ej with the unknown length I. Every English string is considered as a possible translation for the input string. If we assign a probability Pr e If to each pair of strings ef fl then we have to choose the length I opt and the English string ê that maximize Pr e for a given French string f . According to Bayes decision rule lopt and Ể1 P can be found by Upt ẻi p i argmax Pr e I v 7 Z eỉ argmax Pr e -Pr fief . 1 Fr e is the .