tailieunhanh - Báo cáo khoa học: "Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation"

We propose a novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs). The model provides content-dependent, hierarchical phrasal reordering with generalization based on features automatically learned from a real-world bitext. We present an algorithm to extract all reordering events of neighbor blocks from bilingual data. In our experiments on Chineseto-English translation, this MaxEnt-based reordering model obtains significant improvements in BLEU score on the NIST MT-05 and IWSLT-04 tasks. . | Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation Deyi Xiong Institute of Computing Technology Chinese Academy of Sciences Beijing China 100080 Graduate School of Chinese Academy of Sciences liuqun sxlin @ dyxiong@ Qun Liu and Shouxun Lin Institute of Computing Technology Chinese Academy of Sciences Beijing China 100080 Abstract We propose a novel reordering model for phrase-based statistical machine translation SMT that uses a maximum entropy MaxEnt model to predicate reorderings of neighbor blocks phrase pairs . The model provides content-dependent hierarchical phrasal reordering with generalization based on features automatically learned from a real-world bitext. We present an algorithm to extract all reordering events of neighbor blocks from bilingual data. In our experiments on Chinese-to-English translation this MaxEnt-based reordering model obtains significant improvements in BLEU score on the NIST MT-05 and IWSLT-04 tasks. 1 Introduction Phrase reordering is of great importance for phrase-based SMT systems and becoming an active area of research recently. Compared with word-based SMT systems phrase-based systems can easily address reorderings of words within phrases. However at the phrase level reordering is still a computationally expensive problem just like reordering at the word level Knight 1999 . Many systems use very simple models to reorder phrases 1. One is distortion model Och and Ney 2004 Koehn et al. 2003 which penalizes translations according to their jump distance instead of their content. For example if N words are skipped a penalty of N will be paid regardless of which words are reordered. This model takes the risk of penalizing long distance jumps 1In this paper we focus our discussions on phrases that are not necessarily aligned to syntactic constituent boundary. which are common between two languages with very different orders. Another simple model is flat reordering model Wu 1996 Zens et .