tailieunhanh - Báo cáo khoa học: "Investigations on Event-Based Summarization"
We investigate independent and relevant event-based extractive mutli-document summarization approaches. In this paper, events are defined as event terms and associated event elements. With independent approach, we identify important contents by frequency of events. With relevant approach, we identify important contents by PageRank algorithm on the event map constructed from documents. Experimental results are encouraging. | Investigations on Event-Based Summarization Mingli Wu Department of Computing The Hong Kong Polytechnic University Kowloon Hong Kong csmlwu@ Abstract We investigate independent and relevant event-based extractive mutli-document summarization approaches. In this paper events are defined as event terms and associated event elements. With independent approach we identify important contents by frequency of events. With relevant approach we identify important contents by PageRank algorithm on the event map constructed from documents. Experimental results are encouraging. 1 Introduction With the growing of online information it is inefficient for a computer user to browse a great number of individual news documents. Automatic summarization is a powerful way to overcome such difficulty. However the research literature demonstrates that machine summaries need to be improved further. The previous research on text summarization can date back to Luhn 1958 and Edmundson 1969 . In the following periods some researchers focus on extraction-based summarization as it is effective and simple. Others try to generate abstractions but these works are highly domaindependent or just preliminary investigations. Recently query-based summarization has received much attention. However it is highly related to information retrieval another research subject. In this paper we focus on generic summarization. News reports are crucial to our daily life. In this paper we focus on effective summarization approaches for news reports. Extractive summarization is widely investigated in the past. It extracts part of document s based on some weighting scheme in which dif ferent features are exploited such as position in document term frequency and key phrases. Recent extraction approaches may also employ machine learning approaches to decide which sentences or phrases should be extracted. They achieve preliminary success in different application and wait to be improved further. Previous
đang nạp các trang xem trước