tailieunhanh - Báo cáo khoa học: "A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation"

Inspired by previous preprocessing approaches to SMT, this paper proposes a novel, probabilistic approach to reordering which combines the merits of syntax and phrase-based SMT. Given a source sentence and its parse tree, our method generates, by tree operations, an n-best list of reordered inputs, which are then fed to standard phrase-based decoder to produce the optimal translation. Experiments show that, for the NIST MT-05 task of Chinese-toEnglish translation, the proposal leads to BLEU improvement of . . | A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation Chi-Ho Li Dongdong Zhang Mu Li Ming Zhou Minghui Li Yi Guan Microsoft Research Asia Beijing China chl dozhang@ muli mingzhou@ Harbin Institute of Technology Harbin China mhli@ guanyi@ Abstract Inspired by previous preprocessing approaches to SMT this paper proposes a novel probabilistic approach to reordering which combines the merits of syntax and phrase-based SMT. Given a source sentence and its parse tree our method generates by tree operations an n-best list of reordered inputs which are then fed to standard phrase-based decoder to produce the optimal translation. Experiments show that for the NIST MT-05 task of Chinese-to-English translation the proposal leads to BLEU improvement of . 1 Introduction The phrase-based approach has been considered the default strategy to Statistical Machine Translation SMT in recent years. It is widely known that the phrase-based approach is powerful in local lexical choice and word reordering within short distance. However long-distance reordering is problematic in phrase-based SMT. For example the distancebased reordering model Koehn et al. 2003 allows a decoder to translate in non-monotonous order under the constraint that the distance between two phrases translated consecutively does not exceed a limit known as distortion limit. In theory the distortion limit can be assigned a very large value so that all possible reorderings are allowed yet in practise it is observed that too high a distortion limit not only harms efficiency but also translation performance Koehn et al. 2005 . In our own exper-720 iment setting the best distortion limit for Chinese-English translation is 4. However some ideal translations exhibit reorderings longer than such distortion limit. Consider the sentence pair in NIST MT-2005 test set shown in figure 1 a after translating the word V mend the decoder .