tailieunhanh - Báo cáo khoa học: "Chunk-based Statistical Translation"

This paper describes an alternative translation model based on a text chunk under the framework of statistical machine translation. The translation model suggested here first performs chunking. Then, each word in a chunk is translated. Finally, translated chunks are reordered. Under this scenario of translation modeling, we have experimented on a broadcoverage Japanese-English traveling corpus and achieved improved performance. | Chunk-based Statistical Translation Taro Watanabet Eiichiro Sumitat and Hiroshi G. Okunoị @ t ATR Spoken Language Translation ịDepartment of Intelligence Science Research Laboratories and Technology 2-2-2 Hikaridai Keihanna Science City Graduate School of Informatics Kyoto Uniersity Kyoto 619-0288 JAPAN Kyoto 606-8501 JAPAN Abstract This paper describes an alternative translation model based on a text chunk under the framework of statistical machine translation. The translation model suggested here first performs chunking. Then each word in a chunk is translated. Finally translated chunks are reordered. Under this scenario of translation modeling we have experimented on a broadcoverage Japanese-English traveling corpus and achieved improved performance. 1 Introduction The framework of statistical machine translation formulates the problem of translating a source sentence in a language J into a target language E as the maximization problem of the conditional probability E argmaxEP E J . The application of the Bayes Rule resulted in E argmaxEP E P J E . The former term P E is called a language model representing the likelihood of E. The latter term P J E is called a translation model representing the generation probability from E into J. As an implementation of P J E the word alignment based statistical translation Brown et al. 1993 has been successfully applied to similar language pairs such as French-English and German-English but not to drastically different ones such as Japanese-English. This failure has been due to the limited representation by word alignment and the weak model structure for handling complicated word correspondence. This paper provides a chunk-based statistical translation as an alternative to the word alignment based statistical translation. The translation process inside the translation model is structured as follows. A source sentence is first chunked and then each chunk is translated into target .

TÀI LIỆU LIÊN QUAN