tailieunhanh - Báo cáo khoa học: "Supertagged Phrase-Based Statistical Machine Translation"

Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic structure caused system performance to deteriorate. In this work we show that incorporating lexical syntactic descriptions in the form of supertags can yield significantly better PBSMT systems. We describe a novel PBSMT model that integrates supertags into the target language model and the target side of the translation model. Two kinds of supertags are employed: those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar. Despite the differences between these two approaches, the supertaggers give similar improvements. In addition to supertagging, we also explore the utility of a surface. | Supertagged Phrase-Based Statistical Machine Translation Hany Hassan School of Computing Dublin City University Dublin 9 Ireland hhasan@ Khalil Sima an Language and Computation University of Amsterdam Amsterdam The Netherlands simaan@ Andy Way School of Computing Dublin City University Dublin 9 Ireland away@ Abstract Until quite recently extending Phrase-based Statistical Machine Translation PBSMT with syntactic structure caused system performance to deteriorate. In this work we show that incorporating lexical syntactic descriptions in the form of supertags can yield significantly better PBSMT systems. Wede-scribe a novel PBSMT model that integrates supertags into the target language model and the target side of the translation model. Two kinds of supertags are employed those from Lexicalized Tree-Adjoining Grammar and Combinatory Categorial Grammar. Despite the differences between these two approaches the supertaggers give similar improvements. In addition to supertagging we also explore the utility of a surface global grammaticality measure based on combinatory operators. We perform various experiments on the Arabic to English NIST 2005 test set addressing issues such as sparseness scalability and the utility of system subcomponents. Our best result BLEU improves by relative to a state-of-the-art PBSMT model which compares very favourably with the leading systems on the NIST 2005 task. 1 Introduction Within the field of Machine Translation by far the most dominant paradigm is Phrase-based Statistical Machine Translation PBSMT Koehn et al. 2003 288 Tillmann Xia 2003 . However unlike in rule- and example-based MT it has proven difficult to date to incorporate linguistic syntactic knowledge in order to improve translation quality. Only quite recently have Chiang 2005 and Marcu et al. 2006 shown that incorporating some form of syntactic structure could show improvements over a baseline PBSMT system. While .

TÀI LIỆU LIÊN QUAN