tailieunhanh - Báo cáo khoa học: "Markov Random Topic Fields"
Most approaches to topic modeling assume an independence between documents that is frequently violated. We present an topic model that makes use of one or more user-specified graphs describing relationships between documents. These graph are encoded in the form of a Markov random field over topics and serve to encourage related documents to have similar topic structures. Experiments on show upwards of a 10% improvement in modeling performance. of the form of the distance metric used to specify the edge potentials. . | Markov Random Topic Fields Hal Daume III School of Computing University of Utah Salt Lake City UT 84112 me@ Abstract Most approaches to topic modeling assume an independence between documents that is frequently violated. We present an topic model that makes use of one or more user-specified graphs describing relationships between documents. These graph are encoded in the form of a Markov random field over topics and serve to encourage related documents to have similar topic structures. Experiments on show upwards of a 10 improvement in modeling performance. 1 Introduction One often wishes to apply topic models to large document collections. In these large collections we usually have meta-information about how one document relates to another. Perhaps two documents share an author perhaps one document cites another perhaps two documents are published in the same journal or conference. We often believe that documents related in such a way should have similar topical structures. We encode this in a probabilistic fashion by imposing an undirected Markov random field MRF on top of a standard topic model see Section 3 . The edge potentials in the MRF encode the fact that connected documents should share similar topic structures measured by some parameterized distance function. Inference in the resulting model is complicated by the addition of edge potentials in the MRF. We demonstrate that a hybrid Gibbs Metropolis-Hastings sampler is able to efficiently explore the posterior distribution see Section 4 . In experiments Section 5 we explore several variations on our basic model. The first is to explore the importance of being able to tune the strength of the potentials in the MRF as part of the inference procedure. This turns out to be of utmost importance. The second is to study the importance of the form of the distance metric used to specify the edge potentials. Again this has a significant impact on performance. Finally we consider the use of multiple graphs .
đang nạp các trang xem trước