tailieunhanh - Báo cáo khoa học: "Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation"

We investigate the problem of domain adaptation for parallel data in Statistical Machine Translation (SMT). While techniques for domain adaptation of monolingual data can be borrowed for parallel data, we explore conceptual differences between translation model and language model domain adaptation and their effect on performance, such as the fact that translation models typically consist of several features that have different characteristics and can be optimized separately. We also explore adapting multiple (4–10) data sets with no a priori distinction between in-domain and out-of-domain data except for an in-domain development set. . | Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation Rico Sennrich Institute of Computational Linguistics University of Zurich Binzmuhlestr. 14 CH-8050 Zurich sennrich@ Abstract We investigate the problem of domain adaptation for parallel data in Statistical Machine Translation SMT . While techniques for domain adaptation of monolingual data can be borrowed for parallel data we explore conceptual differences between translation model and language model domain adaptation and their effect on performance such as the fact that translation models typically consist of several features that have different characteristics and can be optimized separately. We also explore adapting multiple 4-10 data sets with no a priori distinction between in-domain and out-of-domain data except for an in-domain development set. 1 Introduction The increasing availability of parallel corpora from various sources welcome as it may be leads to new challenges when building a statistical machine translation system for a specific domain. The task of determining which parallel texts should be included for training and which ones hurt translation performance is tedious when performed through trial-and-error. Alternatively methods for a weighted combination exist but there is conflicting evidence as to which approach works best and the issue of determining weights is not adequately resolved. The picture looks better in language modelling where model interpolation through perplexity minimization has become a widespread method of domain adaptation. We investigate the applicability of this method for translation models and discuss possible applications. We move the focus away from a binary combination of in-domain and out-of-domain data. If we can scale up the number of models whose contributions we weight this reduces the need for a priori knowledge about the fitness1 of each potential training text and opens new research opportunities for .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.