tailieunhanh - Báo cáo khoa học: "N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination"

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. . | N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation comparison and system combination Maxim Khalilov and José . Fonollosa Universitat Politècnica de Catalunya Campus Nord UPC 08034 Barcelona Spain khalilov adrian @ Abstract In this paper we compare and contrast two approaches to Machine Translation MT the CMU-UKA Syntax Augmented Machine Translation system SAMT and UPC-TALP N-gram-based Statistical Machine Translation SMT . SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a step-by-step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task tokens in the training corpus . Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally we combine the output of both systems to yield significant improvements in translation quality. 1 Introduction There is an ongoing controversy regarding whether or not information about the syntax of language can benefit MT or contribute to a hybrid system. Classical IBM word-based models were recently augmented with a phrase translation capability as shown in Koehn et al. 2003 or in more recent implementation the MOSES MT sys-tem1 Koehn et al. 2007 . In parallel to the phrasebased approach the N-gram-based approach appeared Marino et al. 2006 . It stemms from 1 moses the Finite-State Transducers paradigm and is extended to the log-linear modeling framework as shown in Marino et al. 2006 . A system following this approach deals with bilingual units called tuples which are composed of one or more words from the source language and zero or .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.