tailieunhanh - Báo cáo khoa học: "Error Detection for Statistical Machine Translation Using Linguistic Features"

Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from N best lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. | Error Detection for Statistical Machine Translation Using Linguistic Features Deyi Xiong Min Zhang Haizhou Li Human Language Technology Institute for Infocomm Research 1 Fusionopolis Way 21-01 Connexis Singapore 138632. dyxiong mzhang hli @ Abstract Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features such as word posterior probabilities calculated from N-best lists or word lattices. We propose to incorporate two groups of linguistic features which convey information from outside machine translation systems into error detection lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1 linguistic features alone outperform word posterior probability based confidence estimation in error detection and 2 linguistic features can further provide complementary information when combined with word confidence scores which collectively reduce the classification error rate by and improve the F measure by . 1 Introduction Translation hypotheses generated by a statistical machine translation SMT system always contain both correct parts . words n-grams phrases matched with reference translations and incorrect parts. Automatically distinguishing incorrect parts from correct parts is therefore very desirable not only for post-editing and interactive machine translation Ueffing and Ney 2007 but also for SMT itself either by rescoring hypotheses in the N -best list using the probability of correctness calculated for each hypothesis Zens and Ney 2006 or by generating new hypotheses using N-best lists from one SMT system or multiple sys tems Akibay et al. 2004 Jayaraman and Lavie 2005 . In this paper we restrict the parts to words. That is we detect errors at the word level

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.