tailieunhanh - Báo cáo khoa học: "A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation"

The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of subtrees. This paper goes further to present a translation model based on non-contiguous tree sequence alignment, where a non-contiguous tree sequence is a sequence of sub-trees and gaps. Compared with the contiguous tree sequencebased model, the proposed model can well handle non-contiguous phrases with any large gaps by means of non-contiguous tree sequence alignment. . | A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun1 2 Min Zhang1 Chew Lim Tan2 1 Institute for Infocomm Research 2School of Computing National University of Singapore sunjun@ mzhang@ tancl@ Abstract The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases where a tree sequence is a contiguous sequence of subtrees. This paper goes further to present a translation model based on non-contiguous tree sequence alignment where a non-contiguous tree sequence is a sequence of sub-trees and gaps. Compared with the contiguous tree sequencebased model the proposed model can well handle non-contiguous phrases with any large gaps by means of non-contiguous tree sequence alignment. An algorithm targeting the noncontiguous constituent decoding is also proposed. Experimental results on the NIST MT-05 Chinese-English translation task show that the proposed model statistically significantly outperforms the baseline systems. 1 Introduction Current research in statistical machine translation SMT mostly settles itself in the domain of either phrase-based or syntax-based. Between them the phrase-based approach Marcu and Wong 2002 Koehn et al 2003 Och and Ney 2004 allows local reordering and contiguous phrase translation. However it is hard for phrase-based models to learn global reorderings and to deal with noncontiguous phrases. To address this issue many syntax-based approaches Yamada and Knight 2001 Eisner 2003 Gildea 2003 Ding and Palmer 2005 Quirk et al 2005 Zhang et al 2007 2008a Bod 2007 Liu et al 2006 2007 Hearne and Way 2003 tend to integrate more syntactic information to enhance the non-contiguous phrase modeling. In general most of them achieve this goal by introducing syntactic non-terminals as translational equivalent placeholders in both source and target sides. Nevertheless the generated rules are strictly .