tailieunhanh - Báo cáo khoa học: "A Note on the Implementation of Hierarchical Dirichlet Processes"

Department of Cognitive and Linguistic Sciences Brown University Providence, RI, USA correction, the approximation is poor for hierarchical models, which are commonly used for NLP applications. We derive an improved O(1) formula that gives exact values for the expected counts in non-hierarchical models. For hierarchical models, where our formula is not exact, we present an efficient method for sampling from the HDP (and related models, such as the hierarchical PitmanYor process) that considerably decreases the memory footprint of such models as compared to the naive implementation. . | A Note on the Implementation of Hierarchical Dirichlet Processes Phil Blunsom pblunsom@ Sharon Goldwater sgwater@ Trevor Cohn tcohn@ Mark Johnson markjohnson@ Department of Informatics University of Edinburgh Edinburgh EH8 9AB Uk Department of Cognitive and Linguistic Sciences Brown University Providence RI USA Abstract The implementation of collapsed Gibbs samplers for non-parametric Bayesian models is non-trivial requiring considerable book-keeping. Goldwater et al. 2006a presented an approximation which significantly reduces the storage and computation overhead but we show here that their formulation was incorrect and even after correction is grossly inaccurate. We present an alternative formulation which is exact and can be computed easily. However this approach does not work for hierarchical models for which case we present an efficient data structure which has a better space complexity than the naive approach. 1 Introduction Unsupervised learning of natural language is one of the most challenging areas in NLP. Recently methods from nonparametric Bayesian statistics have been gaining popularity as a way to approach unsupervised learning for a variety of tasks including language modeling word and morpheme segmentation parsing and machine translation Teh et al. 2006 Goldwater et al. 2006a Goldwater et al. 2006b Liang et al. 2007 Finkel et al. 2007 DeNero et al. 2008 . These models are often based on the Dirichlet process DP Ferguson 1973 or hierarchical Dirichlet process HDP Teh et al. 2006 with Gibbs sampling as a method of inference. Exact implementation of such sampling methods requires considerable bookkeeping of various counts which motivated Goldwater et al. 2006a henceforth GGJ06 to develop an approximation using expected counts. However we show here that their approximation is flawed in two respects 1 It omits an important factor in the expectation and 2 Even after correction the approximation is poor for .