tailieunhanh - Báo cáo khoa học: "Clause Restructuring for Statistical Machine Translation"
We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. . | Clause Restructuring for Statistical Machine Translation Michael Collins MIT CSAIL mcollins@ Philipp Koehn School of Informatics University of Edinburgh pkoehn@ Ivona KuCerova MIT Linguistics Department kucerova@ Abstract We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English showing an improvement from Bleu score for a baseline system to Bleu score for the system with reordering a statistically significant improvement. 1 Introduction Recent research on statistical machine translation SMT has lead to the development of phrasebased systems Och et al. 1999 Marcu and Wong 2002 Koehn et al. 2003 . These methods go beyond the original IBM machine translation models Brown et al. 1993 by allowing multi-word units phrases in one language to be translated directly into phrases in another language. A number of empirical evaluations have suggested that phrase-based systems currently represent the state-of-the-art in statistical machine translation. In spite of their success a key limitation of phrase-based systems is that they make little or no direct use of syntactic information. It appears likely that syntactic information will be crucial in accurately modeling many phenomena during translation for example systematic differences between the word order of different languages. For this reason
đang nạp các trang xem trước