tailieunhanh - Báo cáo khoa học: "A Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation"

In this paper, we propose a linguistically annotated reordering model for BTG-based statistical machine translation. The model incorporates linguistic knowledge to predict orders for both syntactic and non-syntactic phrases. The linguistic knowledge is automatically learned from source-side parse trees through an annotation algorithm. We empirically demonstrate that the proposed model leads to a significant improvement of in the BLEU score over the baseline reordering model on the NIST MT-05 Chinese-to-English translation task. . | A Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation Deyi Xiong Min Zhang Aiti Aw and Haizhou Li Human Language Technology Institute for Infocomm Research 21 Heng Mui Keng Terrace Singapore 119613 dyxiong mzhang aaiti hli @ Abstract In this paper we propose a linguistically annotated reordering model for BTG-based statistical machine translation. The model incorporates linguistic knowledge to predict orders for both syntactic and non-syntactic phrases. The linguistic knowledge is automatically learned from source-side parse trees through an annotation algorithm. We empirically demonstrate that the proposed model leads to a significant improvement of in the BLEU score over the baseline reordering model on the NIST MT-05 Chinese-to-English translation task. 1 Introduction In recent years Bracketing Transduction Grammar BTG proposed by Wu 1997 has been widely used in statistical machine translation SMT . However the original BTG does not provide an effective mechanism to predict the most appropriate orders between two neighboring phrases. To address this problem Xiong et al. 2006 enhance the BTG with a maximum entropy MaxEnt based reordering model which uses boundary words of bilingual phrases as features. Although this model outperforms previous unlexicalized models it does not utilize any linguistically syntactic features which have proven useful for phrase reordering Wang et al. 2007 . Zhang et al. 2007 integrates source-side syntactic knowledge into a phrase reordering model based on BTG-style rules. However one limitation of this method is that it only reorders syntactic phrases because linguistic knowledge from parse trees is only carried by syntactic phrases as far as reordering is concerned while non-syntactic phrases are combined monotonously with a flat reordering score. In this paper we propose a linguistically annotated reordering model for BTG-based SMT which is a significant extension to the work

TÀI LIỆU MỚI ĐĂNG