Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. | Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations Bing Zhao and Young-Suk Lee and Xiaoqiang Luo and Liu Li IBM T.J. Watson Research and Carnegie Mellon University zhaob ysuklee xiaoluo @us.ibm.com and liul@andrew.cmu.edu Abstract We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures expose such decisions to the decoding process and find the least expensive transformation operation to better model word reordering. In particular we integrate synchronous binarizations verb regrouping removal of redundant parse nodes and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST-08 evaluations by 1.3 absolute BLEU which is statistically significant. 1 Introduction Most syntax-based machine translation models with synchronous context free grammar SCFG have been relying on the off-the-shelf monolingual parse structures to learn the translation equivalences for string-to-tree tree-to-string or tree-to-tree grammars. However state-of-the-art monolingual parsers are not necessarily well suited for machine translation in terms of both labels and chunks brackets. For instance in Arabic-to-English translation we find only 45.5 of Arabic NP-SBJ structures are mapped to the English NP-SBJ with machine alignment and parse trees and only 60.1 of NP-SBJs are mapped with human alignment and parse trees as in 2. The chunking is of more concern at best only 57.4 source chunking decisions are translated contiguously on the target side. To translate the rest of the chunks one has to frequently break the .