tailieunhanh - Báo cáo khoa học: "An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation"

We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. . | NiuTrans An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation Tong Xiaof Jingbo Zhuf Hao Zhangf and Qiang Lif Natural Language Processing Lab Northeastern University Key Laboratory of Medical Image Computing Ministry of Education xiaotong zhujingbo @ zhanghao1216 liqiangneu @ Abstract We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation including the phrase-based model the hierachical phrase-based model and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms such as phrase-based decoding decoding as parsing tree-parsing and forest-based decoding. Moreover several useful utilities were distributed with the toolkit including a discriminative reordering model a simple and fast language model and an implementation of minimum error rate training for weight tuning. 1 Introduction We present NiuTrans a new open source machine translation toolkit which was developed for constructing high quality machine translation systems. The NiuTrans toolkit supports most statistical machine translation SMT paradigms developed over the past decade and allows for training and decoding with several state-of-the-art models including the phrase-based model Koehn et al. 2003 the hierarchical phrase-based model Chiang 2007 and various syntax-based models Galley et al. 2004 Liu et al. 2006 . In particular 19 a unified framework was adopted to decode with different models and ease the implementation of decoding algorithms. Moreover some useful utilities were distributed with the toolkit such as a discriminative reordering model a simple and fast language model and an implementation of minimum error rate training that allows for various evaluation metrics for tuning the system. In addition the .