tailieunhanh - Báo cáo khoa học: "Discovery of Topically Coherent Sentences for Extractive Summarization"

Extractive methods for multi-document summarization are mainly governed by information overlap, coherence, and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts, to generate topically coherent and non-redundant summaries. Based on human evaluations our models generate summaries with higher linguistic quality in terms of coherence, readability, and redundancy compared to benchmark systems. . | Discovery of Topically Coherent Sentences for Extractive Summarization Asli Celikyilmaz Microsoft Speech Labs Mountain View CA 94041 asli@ Dilek Hakkani-Tur Microsoft Speech Labs Microsoft Research Mountain View CA 94041 dilek@ Abstract Extractive methods for multi-document summarization are mainly governed by information overlap coherence and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts to generate topically coherent and non-redundant summaries. Based on human evaluations ourmod-els generate summaries with higher linguistic quality in terms of coherence readability and redundancy compared to benchmark systems. Although our system is unsupervised and optimized for topical coherence we achieve a ROUGE on the DUC-07 test set roughly in the range of state-of-the-art supervised models. 1 Introduction A query-focused multi-document summarization model produces a short-summary text of a set of documents which are retrieved based on a user s query. An ideal generated summary text should contain the shared relevant content among set of documents only once plus other unique information from individual documents that are directly related to the user s query addressing different levels of detail. Recent approaches to the summarization task has somewhat focused on the redundancy and coherence issues. In this paper we introduce a series of new generative models for multiple-documents based on a discovery of hierarchical topics and their correlations to extract topically coherent sentences. Prior research has demonstrated the usefulness of sentence extraction for generating summary text 491 taking advantage of surface level features such as word repetition position in text cue phrases etc Radev 2004 Nenkova and Vanderwende 2005a Wan and Yang 2006 Nenkova et al. 2006 . Because documents have pre-defined structures . sections .

TỪ KHÓA LIÊN QUAN