tailieunhanh - Báo cáo khoa học: "Machine Translation System Combination by Confusion Forest"

The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. | Machine Translation System Combination by Confusion Forest Taro Watanabe and Eiichiro Sumita National Institute of Information and Communications Technology 3-5 Hikaridai Keihanna Science City 619-0289 JAPAN @ Abstract The state-of-the-art system combination method for machine translation MT is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses First MT outputs are parsed. Second a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space. 1 Introduction System combination techniques take the advantages of consensus among multiple systems and have been widely used in fields such as speech recognition Fiscus 1997 Mangu et al. 2000 or parsing Henderson and Brill 1999 . One of the state-of-the-art system combination methods for MT is based on confusion networks which are compact graph-based structures representing multiple hypotheses Bangalore et al. 2001 . Confusion networks are constructed based on string similarity information. First one skeleton or 1249 backbone sentence is selected. Then other hypotheses are aligned against the skeleton forming a lattice with each arc representing alternative word candidates. The alignment method is either model-based Matusov et al. 2006 He et al. 2008 in which a statistical word aligner is used to compute hypothesis alignment or

TỪ KHÓA LIÊN QUAN