tailieunhanh - Báo cáo khoa học: "A Latent Topic Extracting Method based on Events in a Document and its Application"

Recently, several latent topic analysis methods such as LSI, pLSI, and LDA have been widely used for text analysis. However, those methods basically assign topics to words, but do not account for the events in a document. With this background, in this paper, we propose a latent topic extracting method which assigns topics to events. | A Latent Topic Extracting Method based on Events in a Document and its Application Risa Kitajima Ochanomizu University Ichiro Kobayashi Ochanomizu University koba@ Abstract Recently several latent topic analysis methods such as LSI pLSI and LDA have been widely used for text analysis. However those methods basically assign topics to words but do not account for the events in a document. With this background in this paper we propose a latent topic extracting method which assigns topics to events. We also show that our proposed method is useful to generate a document summary based on a latent topic. 1 Introduction Recently several latent topic analysis methods such as Latent Semantic Indexing LSI Deerwester et al. 1990 Probabilistic LSI pLSI Hofmann 1999 and Latent Dirichlet Allocation LDA Blei et al. 2003 have been widely used for text analysis. However those methods basically assign topics to words but do not account for the events in a document. Here we define a unit of informing the content of document at the level of sentence as an Event 1 and propose a model that treats a document as a set of Events. We use LDA as a latent topic analysis method and assign topics to Events in a document. To examine our proposed method s performance on extracting latent topics from a document we compare the accuracy of our method to that of the conventional methods through a common document retrieval task. Furthermore as an application of our method we apply it to a query-biased document summarization Tombros and Sanderson 1For the definition of an Event see Section 3. 30 1998 Okumura and Mochizuki 2000 Berger and Mittal 2000 to verify that the method is useful for various applications. 2 Related Studies Suzuki et al. 2010 proposed a flexible latent topics inference in which topics are assigned to phrases in a document. Matsumoto et al. 2005 showed that the accuracy of document classification will be improved by introducing a feature .

TỪ KHÓA LIÊN QUAN