tailieunhanh - Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information"

To adapt a translation model trained from the data in one domain to another, previous works paid more attention to the studies of parallel corpus while ignoring the in-domain monolingual corpora which can be obtained more easily. In this paper, we propose a novel approach for translation model adaptation by utilizing in-domain monolingual topic information instead of the in-domain bilingual corpora, which incorporates the topic information into translation probability estimation. | Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information Jinsong Su1 2 Hua Wu3 Haifeng Wang3 Yidong Chen1 Xiaodong Shi1 Huailin Dong1 and Qun Liu2 Xiamen University Xiamen China1 Institute of Computing Technology Chinese Academy of Sciences Beijing China2 Baidu Inc. Beijing China3 jssu ydchen mandel hldong @ w _hua wanghaifeng @ liuqun@ Abstract To adapt a translation model trained from the data in one domain to another previous works paid more attention to the studies of parallel corpus while ignoring the in-domain monolingual corpora which can be obtained more easily. In this paper we propose a novel approach for translation model adaptation by utilizing in-domain monolingual topic information instead of the in-domain bilingual corpora which incorporates the topic information into translation probability estimation. Our method establishes the relationship between the out-of-domain bilingual corpus and the in-domain monolingual corpora via topic mapping and phrase-topic distribution probability estimation from in-domain monolingual corpora. Experimental result on the NIST Chinese-English translation task shows that our approach significantly outperforms the baseline system. 1 Introduction In recent years statistical machine translation SMT has been rapidly developing with more and more novel translation models being proposed and put into practice Koehn et al. 2003 Och and Ney 2004 Galley et al. 2006 Liu et al. 2006 Chiang 2007 Chiang 2010 . However similar to other natural language processing NLP tasks SMT systems often suffer from domain adaptation problem during practical applications. The simple reason is that the underlying statistical models always tend to closely Part of this work was done during the first author s internship at Baidu. 459 approximate the empirical distributions of the training data which typically consist of bilingual sentences and monolingual target language sentences.

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.