tailieunhanh - Báo cáo khoa học: "Hypothesis Mixture Decoding for Statistical Machine Translation"
This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. . | Hypothesis Mixture Decoding for Statistical Machine Translation Nan Duan School of Computer Science and Technology Tianjin University Tianjin China v-naduan@ Mu Li and Ming Zhou Natural Language Computing Group Microsoft Research Asia Beijing China muli mingzhou @ Abstract This paper presents hypothesis mixture decoding HM decoding a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages first each component system decodes independently with the explored search space kept for use in the next step second a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks. 1 Introduction Besides tremendous efforts on constructing more complicated and accurate models for statistical machine translation SMT Och and Ney 2004 Chiang 2005 Galley et al. 2006 Shen et al. 2008 Chiang 2010 many researchers have concentrated on the approaches that improve translation quality using information between hypotheses from one or more SMT systems as well. System combination is built on top of the N-best outputs generated by multiple component systems Rosti et al. 2007 He et al. 2008 Li et al. 2009b which aligns multiple hypotheses to build confusion networks as new search spaces and outputs 1258 the highest scoring paths as the final translations. Consensus decoding on the other hand can be based on either single or multiple systems single system based methods
đang nạp các trang xem trước