tailieunhanh - Báo cáo khoa học: "Improving Tree-to-Tree Translation with Packed Forests"

Current tree-to-tree models suffer from parsing errors as they usually use only 1best parses for rule extraction and decoding. We instead propose a forest-based tree-to-tree model that uses packed forests. The model is based on a probabilistic synchronous tree substitution grammar (STSG), which can be learned from aligned forest pairs automatically. The decoder finds ways of decomposing trees in the source forest into elementary trees using the source projection of STSG while building target forest in parallel. Comparable to the state-of-the-art phrase-based system Moses, using packed forests in tree-to-tree translation results in a significant absolute improvement of BLEU. | Improving Tree-to-Tree Translation with Packed Forests Yang Liu and Yajuan Lu and Qun Liu Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences . Box 2704 Beijing 100190 China yliu Ivyajuan liuqun @ Abstract Current tree-to-tree models suffer from parsing errors as they usually use only 1best parses for rule extraction and decoding. We instead propose a forest-based tree-to-tree model that uses packed forests. The model is based on a probabilistic synchronous tree substitution grammar STSG which can be learned from aligned forest pairs automatically. The decoder finds ways of decomposing trees in the source forest into elementary trees using the source projection of STSG while building target forest in parallel. Comparable to the state-of-the-art phrase-based system Moses using packed forests in tree-to-tree translation results in a significant absolute improvement of BLEU points over using 1-best trees. 1 Introduction Approaches to syntax-based statistical machine translation make use of parallel data with syntactic annotations either in the form of phrase structure trees or dependency trees. They can be roughly divided into three categories string-to-tree models . Galley et al. 2006 Marcu et al. 2006 Shen et al. 2008 tree-to-string models . Liu et al. 2006 Huang et al. 2006 and tree-to-tree models . Eisner 2003 Ding and Palmer 2005 Cowan et al. 2006 Zhang et al. 2008 . By modeling the syntax of both source and target languages tree-to-tree approaches have the potential benefit of providing rules linguistically better motivated. However while string-to-tree and tree-to-string models demonstrate promising results in empirical evaluations tree-to-tree models have still been underachieving. We believe that tree-to-tree models face two major challenges. First tree-to-tree models are more vulnerable to parsing errors. Obtaining syntactic annotations in quantity usually entails running

TÀI LIỆU LIÊN QUAN