Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Optimizing Word Alignment Combination For Phrase Table Training"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Combining word alignments trained in two translation directions has mostly relied on heuristics that are not directly motivated by intended applications. We propose a novel method that performs combination as an optimization process. Our algorithm explicitly maximizes the effectiveness function with greedy search for phrase table training or synchronized grammar extraction. Experimental results show that the proposed method leads to significantly better translation quality than existing methods. . | Optimizing Word Alignment Combination For Phrase Table Training Yonggang Deng and Bowen Zhou IBM T.J. Watson Research Center Yorktown Heights NY 10598 USA ydeng zhou @us.ibm.com Abstract Combining word alignments trained in two translation directions has mostly relied on heuristics that are not directly motivated by intended applications. We propose a novel method that performs combination as an optimization process. Our algorithm explicitly maximizes the effectiveness function with greedy search for phrase table training or synchronized grammar extraction. Experimental results show that the proposed method leads to significantly better translation quality than existing methods. Analysis suggests that this simple approach is able to maintain accuracy while maximizing coverage. 1 Introduction Word alignment is the process of identifying word-to-word links between parallel sentences. It is a fundamental and often a necessary step before linguistic knowledge acquisitions such as training a phrase translation table in phrasal machine translation MT system Koehn et al. 2003 or extracting hierarchial phrase rules or synchronized grammars in syntax-based translation framework. Most word alignment models distinguish translation direction in deriving word alignment matrix. Given a parallel sentence word alignments in two directions are established first and then they are combined as knowledge source for phrase training or rule extraction. This process is also called symmetrization. It is a common practice in most state of the art MT systems. Widely used alignment models such as IBM Model serial Brown et al. 1993 and HMM all assume one-to-many alignments. Since many-to-many links are commonly observed in natural language symmetrization is able to make up for this modeling limitation. On the other hand combining two directional alignments practically can lead to improved performance. Symmetrization can also be realized during alignment model training Liang et al. 2006 Zens et