tailieunhanh - Báo cáo khoa học: "Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation"

In this paper, we propose a novel method of reducing the size of translation model for hierarchical phrase-based machine translation systems. Previous approaches try to prune infrequent entries or unreliable entries based on statistics, but cause a problem of reducing the translation coverage. | Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation Seung-Wook Lee Dongdong Zhang Mu Li Ming Zhou Hae-Chang Rim Dept. of Computer Radio Comms. Engineering Korea University Seoul South Korea swlee rim @. Microsoft Research Asia Beijing China dozhang muli mingzhou @ Abstract In this paper we propose a novel method of reducing the size of translation model for hierarchical phrase-based machine translation systems. Previous approaches try to prune infrequent entries or unreliable entries based on statistics but cause a problem of reducing the translation coverage. On the contrary the proposed method try to prune only ineffective entries based on the estimation of the information redundancy encoded in phrase pairs and hierarchical rules and thus preserve the search space of SMT decoders as much as possible. Experimental results on Chinese-to-English machine translation tasks show that our method is able to reduce almost the half size of the translation model with very tiny degradation of translation performance. 1 Introduction Statistical Machine Translation SMT has gained considerable attention during last decades. From a bilingual corpus all translation knowledge can be acquired automatically in SMT framework. Phrasebased model Koehn et al. 2003 and hierarchical phrase-based model Chiang 2005 Chiang 2007 show state-of-the-art performance in various language pairs. This achievement is mainly benefit from huge size of translational knowledge extracted from sufficient parallel corpus. However the errors of automatic word alignment and non-parallelized bilingual sentence pairs sometimes have caused the unreliable and unnecessary translation rule acquisition. According to Bloodgood and Callison-Burch 291 2010 and our own preliminary experiments the size of phrase table and hierarchical rule table consistently increases linearly with the growth of training size while the translation performance tends to

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.