tailieunhanh - Báo cáo khoa học: "Cross-Language Document Summarization Based on Machine Translation Quality Prediction"

A straightforward way for cross-language document summarization is to translate the summary from the source language to the target language by using machine translation services. However, though machine translation techniques have been advanced a lot, the machine translation quality is far from satisfactory, and in many cases, the translated texts are hard to understand. | Cross-Language Document Summarization Based on Machine Translation Quality Prediction Xiaojun Wan Huiying Li and Jianguo Xiao Institute of Compute Science and Technology Peking University Beijing 100871 China Key Laboratory of Computational Linguistics Peking University MOE China wanxiaojun lihuiying xiaojianguo @ Abstract Cross-language document summarization is a task of producing a summary in one language for a document set in a different language. Existing methods simply use machine translation for document translation or summary translation. However current machine translation services are far from satisfactory which results in that the quality of the cross-language summary is usually very poor both in readability and content. In this paper we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. First the translation quality of each English sentence in the document set is predicted with the SVM regression method and then the quality score of each sentence is incorporated into the summarization process. Finally the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. Experimental results demonstrate the effectiveness and usefulness of the proposed approach. 1 Introduction Given a document or document set in one source language cross-language document summarization aims to produce a summary in a different target language. In this study we focus on Eng-lish-to-Chinese document summarization for the purpose of helping Chinese readers to quickly understand the major content of an English document or document set. This task is very important in the field of multilingual information access. Till now most previous work focuses on monolingual document summarization but cross-language document summarization has re ceived little attention in the past years. A straightforward way for cross-language document .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN