tailieunhanh - Báo cáo khoa học: "Multi-Document Summarization using Sentence-based Topic Models"

Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, . the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations. . | Multi-Document Summarization using Sentence-based Topic Models Dingding Wang 1 Shenghuo Zhu 2 Tao Li1 Yihong Gong 2 1. School of Computer Science Florida International University Miami FL 33199 2. NEC Laboratories America Cupertino CA 95014 USA. dwang003 taoli @ zsh ygong @ Abstract Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However the knowledge on the document side . the topics embedded in the documents can help the context understanding and guide the sentence selection in the summarization procedure. In this paper we propose a new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations. An efficient variational Bayesian algorithm is derived for model parameter estimation. Experimental results on benchmark data sets show the effectiveness of the proposed model for the multi-document summarization task. 1 Introduction With the continuing growth of online text resources document summarization has found wide-ranging applications in information retrieval and web search. Many multi-document summarization methods have been developed to extract the most important sentences from the documents. These methods usually represent the documents as term-sentence matrices where each row represents a sentence and each column represents a term or graphs where each node is a sentence and each edge represents the pairwise relationship among corresponding sentences and ranks the sentences according to their scores calculated by a set of predefined features such as term frequencyinverse sentence frequency TF-ISF Radev et al. 2004 Lin and Hovy 2002 sentence or term position Yih et al. 2007 and number of key- words Yih et al. 2007 . Typical existing summarization methods include centroid-based methods . MEAD Radev et al. 2004 graph-ranking based methods . .

TÀI LIỆU LIÊN QUAN