tailieunhanh - Báo cáo khoa học: "Reordering with Source Language Collocations"
This paper proposes a novel reordering model for statistical machine translation (SMT) by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. | Reordering with Source Language Collocations Zhanyi Liu1 2 Haifeng Wang2 Hua Wu2 Ting Liu1 Sheng Li1 1Harbin Institute of Technology Harbin China 2Baidu Inc. Beijing China liuzhanyi wanghaifeng wu_hua @ tliu lisheng @ Abstract This paper proposes a novel reordering model for statistical machine translation SMT by means of modeling the translation orders of the source language collocations. The model is learned from a word-aligned bilingual corpus where the collocated words in source sentences are automatically detected. During decoding the model is employed to softly constrain the translation orders of the source language collocations so as to constrain the translation orders of those source phrases containing these collocated words. The experimental results show that the proposed method significantly improves the translation quality achieving the absolute improvements of BLEU score over the baseline methods. 1 Introduction Reordering for SMT is first proposed in IBM models Brown et al. 1993 usually called IBM constraint model where the movement of words during translation is modeled. Soon after Wu 1997 proposed an ITG Inversion Transduction Grammar model for SMT called ITG constraint model where the reordering of words or phrases is constrained to two kinds straight and inverted. In order to further improve the reordering performance many structure-based methods are proposed including the reordering model in hierarchical phrase-based SMT systems Chiang 2005 and syntax-based SMT systems Zhang et al. 1036 2007 Marton and Resnik 2008 Ge 2010 Vis-weswariah et al. 2010 . Although the sentence structure has been taken into consideration these methods don t explicitly make use of the strong correlations between words such as collocations which can effectively indicate reordering in the target language. In this paper we propose a novel method to improve the reordering for SMT by estimating the reordering score of the source-language collocations
đang nạp các trang xem trước