Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Using Noisy Bilingual Data for Statistical Machine Translation"

Kim Lan 47 4 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

SMT systems rely on sufficient amount of parallel corpora to train the translation model. This paper investigates possibilities to use word-to-word and phrase-to-phrase translations extracted not only from clean parallel corpora but also from noisy comparable corpora. Translation results for a Chinese to English translation task are given. | Using Noisy Bilingual Data for Statistical Machine Translation Stephan Vogel Interactive Systems Lab Language Technologies Institute Carnegie Mellon University vogel @cs.emu.edu Abstract SMT systems rely on sufficient amount of parallel corpora to train the translation model. This paper investigates possibilities to use word-to-word and phrase-to-phrase translations extracted not only from clean parallel corpora but also from noisy comparable corpora. Translation results for a Chinese to English translation task are given. 1 Introduction Statistical machine translation systems typically use a translation model trained on bilingual data and a language model for the target language trained on perhaps some larger monolingual data. Often the amount of clean parallel data is limited. This leads to the question of whether translation quality can be improved by using additional noisier bilingual data. Some approaches like Fung and MxKeown 1997 have been developed to extract word translations from non-parallel corpora. In Munteanu and Marcu 2002 bilingual suffix trees are used to extract parallel sequences of words from a comparable corpus. 95 of those phrase translation pairs were judged to be correct. However no results where reported if these additional translation correspondences resulted in improved translation quality. 2 The SMT System Statistical translation as introduced in Brown et al. 1993 is based on word-to-word translations. The SMT system used in this study relies on multiword to multi-word translations. The term phrase translations will be used throughout this paper without implying that these multi-word translation pairs are phrases in some linguistic sense. Phrase translations can be extracted from the Viterbi alignment of the alignment model. Phrase translation pairs are seen only a few times. Actually most of the longer phrases are seen only once in even the larger corpora. Using relative frequency to estimate the translation probability would make most

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: " Using the reduced La(Co,Cu)O3 nanoperovskites as catalyst precursors for CO hydrogenation"

báo cáo khoa học: " Improving benchmarking by using an explicit framework for the development of composite indicators: an example using pediatric quality of care"

Báo cáo y học: "Improving benchmarking by using an explicit framework for the development of composite indicators: an example using pediatric quality of care"

Báo cáo y học: "The effectiveness of hand-disinfection by a flow water system using electrolytic products of sodium chloride, compared with a conventional method using alcoholic solution in an"

BÁO CÁO NGHIÊN CỨU KHOA HỌC KỸ THUẬT: 75 USING IN VITRO PROPAGATION TO PRESERVE Glyptostrobus pensilis (Staunton ex.)

Báo cáo khoa học: "Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation"

Báo cáo khoa học: "Historical Change in Language Using Monte Carlo Techniques"

Báo cáo khoa học: "Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia"

Báo cáo khoa học: "Classifying French Verbs Using French and English Lexical Resources"

Báo cáo khoa học: "Text Segmentation by Language Using Minimum Description Length"