tailieunhanh - Báo cáo khoa học: "Fixed Length Word Suffix for Factored Statistical Machine Translation"
Factored Statistical Machine Translation extends the Phrase Based SMT model by allowing each word to be a vector of factors. Experiments have shown effectiveness of many factors, including the Part of Speech tags in improving the grammaticality of the output. However, high quality part of speech taggers are not available in open domain for many languages. | Fixed Length Word Suffix for Factored Statistical Machine Translation Narges Sharif Razavian School of Computer Science Carnegie Mellon Universiy Pittsburgh USA nsharifr@ Stephan Vogel School of Computer Science Carnegie Mellon Universiy Pittsburgh USA Abstract Factored Statistical Machine Translation extends the Phrase Based SMT model by allowing each word to be a vector of factors. Experiments have shown effectiveness of many factors including the Part of Speech tags in improving the grammaticality of the output. However high quality part of speech taggers are not available in open domain for many languages. In this paper we used fixed length word suffix as a new factor in the Factored SMT and were able to achieve significant improvements in three set of experiments large NIST Arabic to English system medium WMT Spanish to English system and small TRANSTAC English to Iraqi system. 1 Introduction Statistical Machine Translation SMT is currently the state of the art solution to the machine translation. Phrase based SMT is also among the top performing approaches available as of today. This approach is a purely lexical approach using surface forms of the words in the parallel corpus to generate the translations and estimate probabilities. It is possible to incorporate syntactical information into this framework through different ways. Source side syntax based re-ordering as preprocessing step dependency based reordering models cohesive decoding features are among many available successful attempts for the integration of syntax into the translation model. Factored translation modeling is another way to achieve this goal. These models allow each word to be represented as a vector of factors rather than a single surface form. Factors can represent richer expression power on each word. Any factors such as word stems gender part of speech tense etc. can be easily used in this framework. Previous work in factored translation modeling .
đang nạp các trang xem trước