tailieunhanh - Báo cáo khoa học: "Soft Syntactic Constraints for Hierarchical Phrased-Based Translation"

In adding syntax to statistical MT, there is a tradeoff between taking advantage of linguistic analysis, versus allowing the model to exploit linguistically unmotivated mappings learned from parallel training data. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction, starting with a context-free translation model learned directly from aligned parallel text, and then adding soft constituent-level constraints based on parses of the source language. . | Soft Syntactic Constraints for Hierarchical Phrased-Based Translation Yuval Marton and Philip Resnik Department of Linguistics and the Laboratory for Computational Linguistics and Information Processing CLIP at the Institute for Advanced Computer Studies UMIACS University of Maryland College Park MD 20742-7505 USA ymarton resnik @t Abstract In adding syntax to statistical MT there is a tradeoff between taking advantage of linguistic analysis versus allowing the model to exploit linguistically unmotivated mappings learned from parallel training data. A number of previous efforts have tackled this tradeoff by starting with a commitment to linguistically motivated analyses and then finding appropriate ways to soften that commitment. We present an approach that explores the tradeoff from the other direction starting with a context-free translation model learned directly from aligned parallel text and then adding soft constituent-level constraints based on parses of the source language. We obtain substantial improvements in performance for translation from Chinese and Arabic to English. 1 Introduction The statistical revolution in machine translation beginning with Brown et al. 1993 in the early 1990s replaced an earlier era of detailed language analysis with automatic learning of shallow source-target mappings from large parallel corpora. Over the last several years however the pendulum has begun to swing back in the other direction with researchers exploring a variety of statistical models that take advantage of source- and particularly target-language syntactic analysis . Cowan et al. 2006 Zoll-mann and Venugopal 2006 Marcu et al. 2006 Galley et al. 2006 and numerous others . Chiang 2005 distinguishes statistical MT approaches that are syntactic in a formal sense go- ing beyond the finite-state underpinnings of phrasebased models from approaches that are syntactic in a linguistic sense . taking advantage of a priori language knowledge in the form