tailieunhanh - Báo cáo khoa học: "Efficient Tree-Based Topic Modeling"

Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However, its expressive power comes at the cost of more complicated inference. We extend the S PARSE LDA (Yao et al., 2009) inference scheme for latent Dirichlet allocation (LDA) to tree-based topic models. | Efficient Tree-Based Topic Modeling Yuening Hu Department of Computer Science University of Maryland College Park ynhu@ Jordan Boyd-Graber lSchool and UMIACS University of Maryland College Park jbg@ Abstract Topic modeling with a tree-based prior has been used for a variety of applications because it can encode correlations between words that traditional topic modeling cannot. However its expressive power comes at the cost of more complicated inference. We extend the SparseLDA Yao et al. 2009 inference scheme for latent Dirichlet allocation LDA to tree-based topic models. This sampling scheme computes the exact conditional distribution for Gibbs sampling much more quickly than enumerating all possible latent variable assignments. We further improve performance by iteratively refining the sampling distribution only when needed. Experiments show that the proposed techniques dramatically improve the computation time. 1 Introduction Topic models exemplified by latent Dirichlet allocation LDA Blei et al. 2003 discover latent themes present in text collections. Topics discovered by topic models are multinomial probability distributions over words that evince thematic coherence. Topic models are used in computational biology computer vision music and of course text analysis. One of LDA s virtues is that it is a simple model that assumes a symmetric Dirichlet prior over its word distributions. Recent work argues for structured distributions that constrain clusters Andrzejewski et al. 2009 span languages Jagarlamudi and Daume III 2010 or incorporate human feedback Hu et al. 2011 to improve the quality and flexibility of topic modeling. These models all use different tree-based prior distributions Section 2 . These approaches are appealing because they preserve conjugacy making inference using Gibbs sampling Heinrich 2004 straightforward. While straightforward inference isn t cheap. Particularly for interactive settings Hu et al. 2011 efficient .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.