tailieunhanh - Báo cáo khoa học: "Dependency-Based Statistical Machine Translation"
We present a Czech-English statistical machine translation system which performs tree-to-tree translation of dependency structures. The only bilingual resource required is a sentence-aligned parallel corpus. All other resources are monolingual. We also refer to an evaluation method and plan to compare our system’s output with a benchmark system. | Dependency-Based Statistical Machine Translation Heidi J. Fox Brown Laboratory for Linguistic Information Processing Brown University Box 1910 Providence RI 02912 hjf@ Abstract We present a Czech-English statistical machine translation system which performs tree-to-tree translation of dependency structures. The only bilingual resource required is a sentence-aligned parallel corpus. All other resources are monolingual. We also refer to an evaluation method and plan to compare our system s output with a benchmark system. 1 Introduction The goal of statistical machine translation SMT is to develop mathematical models of the translation process whose parameters can be automatically estimated from a parallel corpus. Given a string of foreign words F we seek to find the English string E which is a correct translation of the foreign string. The first work on SMT done at IBM Brown et al. 1990 Brown et al. 1992 Brown et al. 1993 Berger et al. 1994 used a noisy-channel model resulting in what Brown et al. 1993 call the Fundamental Equation of Machine Translation argmax E E P E P F E 1 In this equation we see that the translation problem is factored into two subproblems. P E is the language model and P F E is the translation model. The work described here focuses on developing improvements to the translation model. While the IBM work was groundbreaking it was also deficient in several ways. Their model translates words in isolation and the component which accounts for word order differences between languages is based on linear position in the sentence. Conspicuously absent is all but the most elementary use of syntactic information. Several researchers have subsequently formulated models which incorporate the intuition that syntactically close constituents tend to stay close across languages. Below are descriptions of some of these different methods of integrating syntax. Stochastic Inversion Transduction Grammars Wu and Wong 1998 This formalism uses a grammar for
đang nạp các trang xem trước