Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Sub-Sentence Division for Tree-Based Machine Translation"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Tree-based statistical machine translation models have made significant progress in recent years, especially when replacing 1-best trees with packed forests. However, as the parsing accuracy usually goes down dramatically with the increase of sentence length, translating long sentences often takes long time and only produces degenerate translations. We propose a new method named subsentence division that reduces the decoding time and improves the translation quality for tree-based translation. Our approach divides long sentences into several sub-sentences by exploiting tree structures. Large-scale experiments on the NIST 2008 Chinese-toEnglish test set show that our approach achieves an absolute improvement of 1.1. | Sub-Sentence Division for Tree-Based Machine Translation Hao Xiong Wenwen Xu Haitao Mi Yang Liu and Qun Liu Key Lab. of Intelligent Information Processing Key Lab. of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences P.O. Box 2704 Beijing 100190 China xionghao xuwenwen htmi yliu liuqun @ict.ac.cn Abstract Tree-based statistical machine translation models have made significant progress in recent years especially when replacing 1-best trees with packed forests. However as the parsing accuracy usually goes down dramatically with the increase of sentence length translating long sentences often takes long time and only produces degenerate translations. We propose a new method named subsentence division that reduces the decoding time and improves the translation quality for tree-based translation. Our approach divides long sentences into several sub-sentences by exploiting tree structures. Large-scale experiments on the NIST 2008 Chinese-to-English test set show that our approach achieves an absolute improvement of 1.1 BLEU points over the baseline system in 50 less time. 1 Introduction Tree-based statistical machine translation models in days have witness promising progress in recent years such as tree-to-string models Liu et al. 2006 Huang et al. 2006 tree-to-tree models Quirk et al. 2005 Zhang et al. 2008 . Especially when incorporated with forest the correspondent forest-based tree-to-string models Mi et al. 2008 Zhang et al. 2009 tree-to-tree models Liu et al. 2009 have achieved a promising improvements over correspondent treebased systems. However when we translate long sentences we argue that two major issues will be raised. On one hand parsing accuracy will be lower as the length of sentence grows. It will inevitably hurt the translation quality Quirk and Corston-Oliver 2006 Mi and Huang 2008 . On the other hand decoding on long sentences will be time consuming especially for forest approaches. So splitting long .