tailieunhanh - Báo cáo khoa học: "Improving Statistical Machine Translation with Monolingual Collocation"

This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. | Improving Statistical Machine Translation with Monolingual Collocation Zhanyi Liu1 Haifeng Wang2 Hua Wu2 Sheng Li1 1Harbin Institute of Technology Harbin China Inc. Beijing China zhanyiliu@ wanghaifeng wu_hua @ lisheng@ Abstract This paper proposes to use monolingual collocations to improve Statistical Machine Translation SMT . We make use of the collocation probabilities which are estimated from monolingual corpora in two aspects namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. As compared to baseline systems we achieve absolute improvements of BLEU score on a phrase-based SMT system and BLEU score on a parsing-based SMT system. 1 Introduction Statistical bilingual word alignment Brown et al. 1993 is the base of most SMT systems. As compared to single-word alignment multi-word alignment is more difficult to be identified. Although many methods were proposed to improve the quality of word alignments Wu 1997 Och and Ney 2000 Marcu and Wong 2002 Cherry and Lin 2003 Liu et al. 2005 Huang 2009 the correlation of the words in multi-word alignments is not fully considered. In phrase-based SMT Koehn et al. 2003 the phrase boundary is usually determined based on the bi-directional word alignments. But as far as we know few previous studies exploit the collocation relations of the words in a phrase. Some This work was partially done at Toshiba China Research and Development Center. researches used soft syntactic constraints to predict whether source phrase can be translated together Marton and Resnik 2008 Xiong et al. 2009 . However the constraints were learned from the parsed corpus which is not available for many languages. In this paper we propose to use monolingual collocations to improve SMT. We first identify potentially .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.