tailieunhanh - Báo cáo khoa học: "Learning Better Rule Extraction with Translation Span Alignment"

This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. | Learning Better Rule Extraction with Translation Span Alignment Jingbo Zhu Tong Xiao Chunliang Zhang Natural Language Processing Laboratory Northeastern University Shenyang China zhujingbo xiaotong zhangcl @ Abstract This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. Experiments on Chinese-English translation demonstrate improvements over standard methods for tree-to-string and tree-to-tree translation. 1 Introduction Most syntax-based statistical machine translation SMT systems typically utilize word alignments and parse trees on the source target side to learn syntactic transformation rules from parallel data. The approach suffers from a practical problem that even one spurious word alignment link can prevent some desirable syntactic translation rules from extraction which can in turn affect the quality of translation rules and translation performance May and Knight 2007 Fossum et al. 2008 . To address this challenge a considerable amount of previous research has been done to improve alignment quality by incorporating some statistics and linguistic heuristics or syntactic information into word alignments Cherry and Lin 2006 DeNero and Klein 2007 May and Knight 2007 Fossum et al. 2008 Hermjakob 2009 Liu et al. 2010 . Unlike their efforts this paper presents a simple approach that automatically builds the translation span alignment TSA of a sentence pair by utilizing a phrase-based forced decoding technique and then improves syntactic rule extraction by deleting spurious links and adding new valuable links based on bilingual translation span correspondences. The proposed approach has two promising properties. Word alignment Frontier node bp nnS vBz imports have an fia AP-VP RB vBN drastically fallen T Figure 1. A real example of .

TỪ KHÓA LIÊN QUAN