tailieunhanh - Báo cáo khoa học: "Extractive Summarization using Inter- and Intra- Event Relevance"

Event-based summarization attempts to select and organize the sentences in a summary with respect to the events or the sub-events that the sentences describe. Each event has its own internal structure, and meanwhile often relates to other events semantically, temporally, spatially, causally or conditionally. In this paper, we define an event as one or more event terms along with the named entities associated, and present a novel approach to derive intra- and inter- event relevance using the information of internal association, semantic relatedness, distributional similarity and named entity clustering. . | Extractive Summarization using Inter- and Intra- Event Relevance Wenjie Li Mingli Wu and Qin Lu Department of Computing The Hong Kong Polytechnic University cswjli csmlwu csluqin @comp . Wei Xu and Chunfa Yuan Department of Computer Science and Technology Tsinghua University vivian00 cfyuan @ Abstract Event-based summarization attempts to select and organize the sentences in a summary with respect to the events or the sub-events that the sentences describe. Each event has its own internal structure and meanwhile often relates to other events semantically temporally spatially causally or conditionally. In this paper we define an event as one or more event terms along with the named entities associated and present a novel approach to derive intra- and inter- event relevance using the information of internal association semantic relatedness distributional similarity and named entity clustering. We then apply PageRank ranking algorithm to estimate the significance of an event for inclusion in a summary from the event relevance derived. Experiments on the DUC 2001 test data shows that the relevance of the named entities involved in events achieves better result when their relevance is derived from the event terms they associate. It also reveals that the topic-specific relevance from documents themselves outperforms the semantic relevance from a general purpose knowledge base like Word-Net. 1. Introduction Extractive summarization selects sentences which contain the most salient concepts in documents. Two important issues with it are how the concepts are defined and what criteria should be used to judge the salience of the concepts. Existing work has typically been based on techniques that extract key textual elements such as keywords also known as significant terms as weighed by their tf idf score or concepts such as events or entities with linguistic and or statistical analysis. Then sentences are selected according to either the .