tailieunhanh - Some propositions to improve the prediction capability of word confidence estimation for machine translation

Word Confidence Estimation (WCE) is the task of predicting the correct and incorrect words in the MT Dealing with this problem, this paper proposes some ideas to build a binary estimator and then enhance its prediction capability. | VNU Journal of Science: Comp. Science & Com. Eng. Vol. 30, No. 3 (2014) 36–49 Some Propositions to Improve the Prediction Capability of Word Confidence Estimation for Machine Translation Ngoc Quang Luong, Laurent Besacier, Benjamin Lecouteux Laboratoire d’Informatique de Grenoble, 41, Rue des Math´ematiques, UJF - BP53, F-38041 Grenoble Cedex 9, France Abstract Word Confidence Estimation (WCE) is the task of predicting the correct and incorrect words in the MT Dealing with this problem, this paper proposes some ideas to build a binary estimator and then enhance its prediction capability. We integrate a number of features of various types (system-based, lexical, syntactic and semantic) into the conventional feature set, to build our classifier. After the experiment with all features, we deploy a “Feature Selection” strategy to filter the best performing ones. Next, we propose a method that combines multiple “weak” classifiers to build a strong “composite” classifier by taking advantage of their complementarity. Experimental results show that our propositions helped to achieve a better performance in term of F-score. Finally, we test whether WCE output can play any role in improving the sentence level confidence estimation system. © 2014 Published by VNU Journal of Science. Manuscript communication: received 15 December 2013, revised 04 April 2014, accepted 07 April 2014 Corresponding author: Luong Ngoc Quang, quangngocluong@ Keywords: Machine Translation, Confidence Measure, Confidence Estimation, Conditional Random Fields, Boosting 1. Introduction Statistical Machine Translation (SMT) systems in recent years have marked impressive breakthroughs with numerous commendable achievements, as they produced more and more user-acceptable outputs. Nevertheless the users still face with some open questions: are these translations ready to be published as they are? Are they worth to be corrected or do they require retranslation? It is undoubtedly that

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.