Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Mixing Multiple Translation Models in Statistical Machine Translation"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach, ensemble decoding, which combines a number of translation systems dynamically at the decoding step. In this paper, we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain. Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation. 1 Introduction translation model adaptation, because various measures such as. | Mixing Multiple Translation Models in Statistical Machine Translation Majid Razmara1 George Foster2 Baskaran Sankaran1 Anoop Sarkar1 1 Simon Fraser University 8888 University Dr. Burnaby BC Canada razmara baskaran anoop @sfu.ca 2 National Research Council Canada 283 Alexandre-Tache Blvd Gatineau QC Canada george.foster@nrc.gc.ca Abstract Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain. We propose a novel approach ensemble decoding which combines a number of translation systems dynamically at the decoding step. In this paper we evaluate performance on a domain adaptation setting where we translate sentences from the medical domain. Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models the current state-of-the-art for domain adaptation in machine translation. 1 Introduction Statistical machine translation SMT systems require large parallel corpora in order to be able to obtain a reasonable translation quality. In statistical learning theory it is assumed that the training and test datasets are drawn from the same distribution or in other words they are from the same domain. However bilingual corpora are only available in very limited domains and building bilingual resources in a new domain is usually very expensive. It is an interesting question whether a model that is trained on an existing large bilingual corpus in a specific domain can be adapted to another domain for which little parallel data is present. Domain adaptation techniques aim at finding ways to adjust an out-of-domain OUT model to represent a target domain in-domain or IN . Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems the language model and the translation model. However language model adaptation is a more straight-forward .