tailieunhanh - Báo cáo khoa học: "Asynchronous Binarization for Synchronous Grammars"

Binarization of n-ary rules is critical for the efficiency of syntactic machine translation decoding. Because the target side of a rule will generally reorder the source side, it is complex (and sometimes impossible) to find synchronous rule binarizations. However, we show that synchronous binarizations are not necessary in a two-stage decoder. Instead, the grammar can be binarized one way for the parsing stage, then rebinarized in a different way for the reranking stage. Each individual binarization considers only one monolingual projection of the grammar, entirely avoiding the constraints of synchronous binarization and allowing binarizations that are separately optimized for. | Asynchronous Binarization for Synchronous Grammars John DeNero Adam Pauls and Dan Klein Computer Science Division University of California Berkeley denero adpauls klein @ Abstract Binarization of n-ary rules is critical for the efficiency of syntactic machine translation decoding. Because the target side of a rule will generally reorder the source side it is complex and sometimes impossible to find synchronous rule binarizations. However we show that synchronous binarizations are not necessary in a two-stage decoder. Instead the grammar can be binarized one way for the parsing stage then rebinarized in a different way for the reranking stage. Each individual binarization considers only one monolingual projection of the grammar entirely avoiding the constraints of synchronous binarization and allowing binarizations that are separately optimized for each stage. Compared to n-ary forest reranking even simple target-side binarization schemes improve overall decoding accuracy. 1 Introduction Syntactic machine translation decoders search over a space of synchronous derivations scoring them according to both a weighted synchronous grammar and an n-gram language model. The rewrites of the synchronous translation grammar are typically flat n-ary rules. Past work has synchronously binarized such rules for efficiency Zhang et al. 2006 Huang et al. 2008 . Unfortunately because source and target orders differ synchronous binarizations can be highly constrained and sometimes impossible to find. Recent work has explored two-stage decoding which explicitly decouples decoding into a source parsing stage and a target language model integration stage Huang and Chiang 2007 . Because translation grammars continue to increase in size and complexity both decoding stages require efficient approaches DeNero et al. 2009 . In this paper we show how two-stage decoding enables independent binarizations for each stage. The source-side binarization guarantees cubictime .