tailieunhanh - Báo cáo khoa học: "Text Summarization Model based on Maximum Coverage Problem and its Variant"

We discuss text summarization in terms of maximum coverage problem and its variant. We explore some decoding algorithms including the ones never used in this summarization formulation, such as a greedy algorithm with performance guarantee, a randomized algorithm, and a branch-andbound method. On the basis of the results of comparative experiments, we also augment the summarization model so that it takes into account the relevance to the document cluster. | Text Summarization Model based on Maximum Coverage Problem and its Variant Hiroya Takamura and Manabu Okumura Precision and Intelligence Laboratory Tokyo Institute of Technology 4259 Nagatsuta Midori-ku Yokohama 226-8503 takamura@ oku@ Abstract We discuss text summarization in terms of maximum coverage problem and its variant. We explore some decoding algorithms including the ones never used in this summarization formulation such as a greedy algorithm with performance guarantee a randomized algorithm and a branch-and-bound method. On the basis of the results of comparative experiments we also augment the summarization model so that it takes into account the relevance to the document cluster. Through experiments we showed that the augmented model is superior to the best-performing method of DUC 04 on ROUGE-1 without stopwords. 1 Introduction Automatic text summarization is one of the tasks that have long been studied in natural language processing. This task is to create a summary or a short and concise document that describes the content of a given set of documents Mani 2001 . One well-known approach to text summarization is the extractive method which selects some linguistic units . sentences from given documents in order to generate a summary. The extractive method has an advantage that the grammaticality is guaranteed at least at the level of the linguistic units. Since the actual generation of linguistic expressions has not achieved the level of the practical use we focus on the extractive method in this paper especially the method based on the sentence extraction. Most of the extractive summarization methods rely on sequentially solving binary classification problems of determining whether each sentence should be selected or not. In such sequential methods however the viewpoint regarding whether the summary is good as a whole is not taken into consideration although a summary conveys information as a whole. We represent text .

TỪ KHÓA LIÊN QUAN