tailieunhanh - Báo cáo khoa học: "Language Independent Extractive Summarization"

We demonstrate TextRank – a system for unsupervised extractive summarization that relies on the application of iterative graphbased ranking algorithms to graphs encoding the cohesive structure of a text. An important characteristic of the system is that it does not rely on any language-specific knowledge resources or any manually constructed training data, and thus it is highly portable to new languages or domains. | Language Independent Extractive Summarization Rada Mihalcea Department of Computer Science and Engineering University of North Texas rada@ Abstract We demonstrate TextRank - a system for unsupervised extractive summarization that relies on the application of iterative graphbased ranking algorithms to graphs encoding the cohesive structure of a text. An important characteristic of the system is that it does not rely on any language-specific knowledge resources or any manually constructed training data and thus it is highly portable to new languages or domains. 1 Introduction Given the overwhelming amount of information available today on the Web and elsewhere techniques for efficient automatic text summarization are essential to improve the access to such information. Algorithms for extractive summarization are typically based on techniques for sentence extraction and attempt to identify the set of sentences that are most important for the understanding of a given document. Some of the most successful approaches to extractive summarization consist of supervised algorithms that attempt to learn what makes a good summary by training on collections of summaries built for a relatively large number of training documents . Hirao et al. 2002 Teufel and Moens 1997 . However the price paid for the high performance of such supervised algorithms is their inability to easily adapt to new languages or domains as new training data are required for each new type of data. TextRank Mi-halcea and Tarau 2004 Mihalcea 2004 is specifi cally designed to address this problem by using an extractive summarization technique that does not require any training data or any language-specific knowledge sources. TextRank can be effectively applied to the summarization of documents in different languages without any modifications of the algorithm and without any requirements for additional data. Moreover results from experiments performed on standard data sets have demonstrated that .