tailieunhanh - Báo cáo khoa học: "A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction"

In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, providing an elegant and principled means of incorporating lexical characteristics. | A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction Phil Blunsom Department of Computer Science University of Oxford Trevor Cohn Department of Computer Science University of Sheffield Abstract In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior providing an elegant and principled means of incorporating lexical characteristics. Central to our approach is a new type-based sampling algorithm for hierarchical Pitman-Yor models in which we track fractional table counts. In an empirical evaluation we show that our model consistently out-performs the current state-of-the-art across 10 languages. 1 Introduction Unsupervised part-of-speech PoS induction has long been a central challenge in computational linguistics with applications in human language learning and for developing portable language processing systems. Despite considerable research effort progress in fully unsupervised PoS induction has been slow and modern systems barely improve over the early Brown et al. 1992 approach Christodoulopoulos et al. 2010 . One popular means of improving tagging performance is to include supervision in the form of a tag dictionary or similar however this limits portability and also comprimises any cognitive conclusions. In this paper we present a novel approach to fully unsupervised PoS induction which uniformly outperforms the existing state-of-the-art across all our corpora in 10 different languages. Moreover the performance of our unsupervised model approaches 865 that of many existing semi-supervised systems despite our method not receiving any human input. In this paper we present a Bayesian hidden Markov model HMM which uses a non-parametric prior to infer a latent tagging for a .

TỪ KHÓA LIÊN QUAN