tailieunhanh - Báo cáo khoa học: "Forest-based Tree Sequence to String Translation Model"

This paper proposes a forest-based tree sequence to string translation model for syntaxbased statistical machine translation, which automatically learns tree sequence to string translation rules from word-aligned sourceside-parsed bilingual texts. The proposed model leverages on the strengths of both tree sequence-based and forest-based translation models. | Forest-based Tree Sequence to String Translation Model Hui Zhang1 2 Min Zhang1 Haizhou Li1 Aiti Aw1 Chew Lim Tan2 institute for Infocomm Research 2National University of Singapore zhangh1982@ mzhang hli aaiti @ tancl@ Abstract This paper proposes a forest-based tree sequence to string translation model for syntaxbased statistical machine translation which automatically learns tree sequence to string translation rules from word-aligned sourceside-parsed bilingual texts. The proposed model leverages on the strengths of both tree sequence-based and forest-based translation models. Therefore it can not only utilize forest structure that compactly encodes exponential number of parse trees but also capture nonsyntactic translation equivalences with linguistically structured information through tree sequence. This makes our model potentially more robust to parse errors and structure divergence. Experimental results on the NIST MT-2003 Chinese-English translation task show that our method statistically significantly outperforms the four baseline systems. 1 Introduction Recently syntax-based statistical machine translation SMT methods have achieved very promising results and attracted more and more interests in the SMT research community. Fundamentally syntax-based SMT views translation as a structural transformation process. Therefore structure divergence and parse errors are two of the major issues that may largely compromise the performance of syntax-based SMT Zhang et al. 2008a Mi et al. 2008 . Many solutions have been proposed to address the above two issues. Among these advances forest-based modeling Mi et al. 2008 Mi and Huang 2008 and tree sequence-based modeling Liu et al. 2007 Zhang et al. 2008a are two interesting modeling methods with promising results reported. Forest-based modeling aims to improve translation accuracy through digging the potential better parses from w-bests . forest while tree sequence-based modeling

TÀI LIỆU LIÊN QUAN