tailieunhanh - Báo cáo khoa học: "Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules"
This paper presents a novel filtration criterion to restrict the rule extraction for the hierarchical phrase-based translation model, where a bilingual but relaxed wellformed dependency restriction is used to filter out bad rules. Furthermore, a new feature which describes the regularity that the source/target dependency edge triggers the target/source word is also proposed. | Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules Zhiyang Wang 1 Yajuan Lu 1 Key Lab. of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences PO. Box 2704 Beijing 100190 China wangzhiyang@ Qun Liu 1 Young-Sook Hwang HILab Convergence Technology Center C I Business SKTelecom 11 Euljiro2-ga Jung-gu Seoul 100-999 Korea yshwang@ Abstract This paper presents a novel filtration criterion to restrict the rule extraction for the hierarchical phrase-based translation model where a bilingual but relaxed well-formed dependency restriction is used to filter out bad rules. Furthermore a new feature which describes the regularity that the source target dependency edge triggers the target source word is also proposed. Experimental results show that the new criteria weeds out about 40 rules while with translation performance improvement and the new feature brings another improvement to the baseline system especially on larger corpus. 1 Introduction Hierarchical phrase-based HPB model Chiang 2005 is the state-of-the-art statistical machine translation SMT model. By looking for phrases that contain other phrases and replacing the subphrases with nonterminal symbols it gets hierarchical rules. Hierarchical rules are more powerful than conventional phrases since they have better generalization capability and could capture long distance reordering. However when the training corpus becomes larger the number of rules will grow exponentially which inevitably results in slow and memory-consuming decoding. In this paper we address the problem of reducing the hierarchical translation rule table resorting to the dependency information of bilingual languages. We only keep rules that both sides are relaxed-well-formed RWF dependency structure see the definition in Section 3 and discard others which do not satisfy this constraint. In this way about 40 bad rules are weeded out from the original rule
đang nạp các trang xem trước