tailieunhanh - Báo cáo khoa học: "Tree-to-String Alignment Template for Statistical Machine Translation"

We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntaxbased because TATs are extracted automatically from word-aligned, source side parsed parallel texts. To translate a source sentence, we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. . | Tree-to-String Alignment Template for Statistical Machine Translation Yang Liu Qun Liu and Shouxun Lin Institute of Computing Technology Chinese Academy of Sciences Kexueyuan South Road Haidian District P O. BoX 2704 Beijing 100080 China yliu liuqun sxlin @ Abstract We present a novel translation model based on tree-to-string alignment template TAT which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntaxbased because TATs are extracted automatically from word-aligned source side parsed parallel texts. To translate a source sentence we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. Our experiments show that the TAT-based model significantly outperforms Pharaoh a state-of-the-art decoder for phrase-based models. 1 Introduction Phrase-based translation models Marcu and Wong 2002 Koehn et al. 2003 Och and Ney 2004 which go beyond the original IBM translation models Brown et al. 1993 1 by modeling translations of phrases rather than individual words have been suggested to be the state-of-the-art in statistical machine translation by empirical evaluations. In phrase-based models phrases are usually strings of adjacent words instead of syntactic constituents excelling at capturing local reordering and performing translations that are localized to 1The mathematical notation we use in this paper is taken from that paper a source string fl fl . fj . fj is to be translated into a target string el el . ei . ei. Here I is the length of the target string and J is the length of the source string. substrings that are common enough to be observed on training data. However a key limitation of phrase-based models is that they fail to model reordering at the phrase level robustly. Typically phrase reordering is modeled in terms .

TÀI LIỆU LIÊN QUAN