tailieunhanh - Báo cáo khoa học: "Multi-Document Summarization using Sentence-based Topic Models"

Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, . the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations. . | Multi-Document Summarization using Sentence-based Topic Models Dingding Wang 1 Shenghuo Zhu 2 Tao Li1 Yihong Gong 2 1. School of Computer Science Florida International University Miami FL 33199 2. NEC Laboratories America Cupertino CA 95014 USA. dwang003 taoli @ zsh ygong @ Abstract Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However the knowledge on the document side . the topics embedded in the documents can help the context understanding and guide the sentence selection in the summarization procedure. In this paper we propose a new Bayesian sentence-based topic model for summarization by making use of both the term-document and term-sentence associations. An efficient variational Bayesian algorithm is derived for model parameter estimation. Experimental results on benchmark data sets show the effectiveness of the proposed model for the multi-document summarization task. 1 Introduction With the continuing growth of online text resources document summarization has found wide-ranging applications in information retrieval and web search. Many multi-document summarization methods have been developed to extract the most important sentences from the documents. These methods usually represent the documents as term-sentence matrices where each row represents a sentence and each column represents a term or graphs where each node is a sentence and each edge represents the pairwise relationship among corresponding sentences and ranks the sentences according to their scores calculated by a set of predefined features such as term frequencyinverse sentence frequency TF-ISF Radev et al. 2004 Lin and Hovy 2002 sentence or term position Yih et al. 2007 and number of key- words Yih et al. 2007 . Typical existing summarization methods include centroid-based methods . MEAD Radev et al. 2004 graph-ranking based methods . .

TÀI LIỆU MỚI ĐĂNG
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.