tailieunhanh - Báo cáo khoa học: "A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging∗"

Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. . | A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging Sharon Goldwater Department of Linguistics Stanford University sgwater@ Thomas L. Griffiths Department of Psychology UC Berkeley tom_griffiths@ Abstract Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically this is done using maximum-likelihood estimation MLE of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM yet its accuracy is closer to that of a state-of-the-art discriminative model Smith and Eisner 2005 up to 14 percentage points better than MLE. We find improvements both when training from data alone and using a tagging dictionary. 1 Introduction Unsupervised learning of linguistic structure is a difficult problem. Recently several new model-based approaches have improved performance on a variety of tasks Klein and Manning 2002 Smith and This work was supported by grants NSF 0631518 and ONR MURI N000140510388. We would also like to thank Noah Smith for providing us with his data sets. 744 Eisner 2005 . Nearly all of these approaches have one aspect in common the goal of learning is to identify the set of model parameters that maximizes some objective function. Values for the hidden variables in the model are then chosen based on the learned parameterization. Here we propose a different approach based on Bayesian statistical principles rather than searching for an .