Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a lowdimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars (AGs) are a hierarchical, non-parameteric Bayesian extension of PCFGs. . | PCFGs Topic Models Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names Mark Johnson Department of Computing Macquarie University mjohnson@science.mq.edu.au Abstract This paper establishes a connection between two apparently very different kinds of probabilistic models. Latent Dirichlet Allocation LDA models are used as topic models to produce a lowdimensional representation of documents while Probabilistic Context-Free Grammars PCFGs define distributions over trees. The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG so Bayesian inference for PCFGs can be used to infer Topic Models as well. Adaptor Grammars AGs are a hierarchical non-parameteric Bayesian extension of PCFGs. Exploiting the close relationship between LDA and PCFGs just described we propose two novel probabilistic models that combine insights from LDA and AG models. The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. The second extension builds on the first one to learn aspects of the internal structure of proper names. 1 Introduction Over the last few years there has been considerable interest in Bayesian inference for complex hierarchical models both in machine learning and in computational linguistics. This paper establishes a theoretical connection between two very different kinds of probabilistic models Probabilistic Context-Free Grammars PCFGs and a class of models known as Latent Dirichlet Allocation Blei et al. 2003 Griffiths and Steyvers 2004 models that have been used for a variety of tasks in machine learning. Specifically we show that an LDA model can be expressed as a certain kind of PCFG so Bayesian inference for PCFGs can be used to learn LDA topic models as well. The importance of this observation is primarily theoretical as current Bayesian inference algorithms for PCFGs are less efficient than those for LDA inference. However once this link is