tailieunhanh - Báo cáo khoa học: "Crowdsourcing Translation: Professional Quality from Non-Professionals"

Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent, low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically, we solicit redundant translations and edits to them, and automatically select the best output among them. We propose a set of features that model both the translations and the translators, such as country of residence, LM perplexity of the translation, edit rate from the other translations, and (optionally) calibration against professional translators. . | Crowdsourcing Translation Professional Quality from Non-Professionals Omar F. Zaidan and Chris Callison-Burch Dept. of Computer Science Johns Hopkins University Baltimore MD 21218 USA ozaidan ccb @ Abstract Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically we solicit redundant translations and edits to them and automatically select the best output among them. We propose a set of features that model both the translations and the translators such as country of residence LM perplexity of the translation edit rate from the other translations and optionally calibration against professional translators. Using these features to score the collected translations we are able to discriminate between acceptable and unacceptable translations. We recreate the NIST 2009 Urdu-to-English evaluation set with Mechanical Turk and quantitatively show that our models are able to select translations within the range of quality that we expect from professional translators. The total cost is more than an order of magnitude lower than professional translation. 1 Introduction In natural language processing research translations are most often used in statistical machine translation SMT where systems are trained using bilingual sentence-aligned parallel corpora. SMT owes its existence to data like the Canadian Hansards which by law must be published in both French and English . SMT can be applied to any language pair for which there is sufficient data and it has been shown to produce state-of-the-art results for language pairs like 1220 Arabic-English where there is ample data. However large bilingual parallel corpora exist for relatively few languages pairs. There are various options for creating new training resources for new language pairs. These .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
20    248    2    20-04-2024
37    137    0    20-04-2024
7    126    0    20-04-2024
40    96    0    20-04-2024
6    94    0    20-04-2024
165    85    0    20-04-2024
4    82    0    20-04-2024
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.