tailieunhanh - Báo cáo khoa học: "Binarized Forest to String Translation"
Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably, our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. . | Binarized Forest to String Translation Hao Zhang Google Research haozhang@ Licheng Fang Computer Science Department University of Rochester lfang@ Peng Xu Google Research xp@ Xiaoyun Wu Google Research xiaoyunwu@ Abstract Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forest-to-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. For the purpose of reducing search errors we apply the synchronous binarization technique to forest-to-string decoding. Combining the two techniques we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track BLEU points over a phrase-based system BLEU points over a hierarchical phrase-based system . Consistent and significant gains are also shown in WMT 2010 in the English to German French Spanish and Czech tracks. 1 Introduction In recent years researchers have explored a wide spectrum of approaches to incorporate syntax and structure into machine translation models. The unifying framework for these models is synchronous grammars Chiang 2005 or tree transducers Graehl and Knight 2004 . Depending on whether or not monolingual parsing is carried out on the 835 source side or the target side for inference there are four general categories within the framework string-to-string Chiang 2005 Zollmann and Venugopal 2006 string-to-tree Galley et al. 2006 Shen et al. 2008 .
đang nạp các trang xem trước