Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "On-line Language Model Biasing for Statistical Machine Translation"

Kiều Trang 70 5 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. | On-line Language Model Biasing for Statistical Machine Translation Sankaranarayanan Ananthakrishnan Rohit Prasad and Prem Natarajan Raytheon BBN Technologies Cambridge MA 02138 U.S.A. sanantha rprasad pnataraj @bbn.com Abstract The language model LM is a critical component in most statistical machine translation SMT systems serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance none of the techniques has thus far been shown to be feasible for on-line systems. In this paper we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM as well as consistent improvements in SMT performance across language pairs English-Dari and English-Pashto . 1 Introduction While much of the focus in developing a statistical machine translation SMT system revolves around the translation model TM most systems do not emphasize the role of the language model LM . The latter generally follows a n-gram structure and is estimated from a large monolingual corpus of target sentences. In most systems the LM is independent of the test input i.e. fixed n-gram probabilities determine the likelihood of all translation hypotheses regardless of the source input. The views expressed are those of the author and do not reflect the official policy or position of s f i 5 Some previous work exists in LM adaptation for SMT. Snover et al. 2008 used a cross-lingual information retrieval CLIR system to select a subset of target documents comparable to the source document bias LMs estimated from these subsets were interpolated with a static

TÀI LIỆU LIÊN QUAN

báo cáo hóa học: " Financial Research Support for Ecotoxicology and Environmental Chemistry in Germany – Results of an Online Survey Fördersituation ökotoxikologischer und umweltchemischer Forschung in Deutschland – Ergebnisse einer Online-Befragung"

Báo cáo y học: " Massively multiplayer online role-playing games: comparing characteristics of addict vs non-addict online recruited gamers in a French adult population"

Báo cáo khoa học: "A System for Detecting Subgroups in Online Discussions"

Báo cáo khoa học: "Online Plagiarism Detection Through Exploiting Lexical, Syntactic, and Semantic Information"

Báo cáo khoa học: "Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions"

Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection"

Báo cáo khoa học: "Fast Online Lexicon Learning for Grounded Language Acquisition"

Báo cáo khoa học: "Mining Reﬁnements to Online Instructions from User Generated Content"

Báo cáo khoa học: "Sentence Dependency Tagging in Online Question Answering Forums"

Báo cáo khoa học: "Integrating surprisal and uncertain-input models in online sentence comprehension: formal techniques and empirical results"

crossorigin="anonymous">

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.