tailieunhanh - A Comparison of Event Models for Naive Bayes Text Classication
Recent approaches to text classication have used two di erent rst-order probabilistic models for classication, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (. Larkey and Croft 1996; Koller and Sahami 1997). Others use a multinomial model, that is, a uni-gram language model with integer word counts (. Lewis and Gale 1994; Mitchell 1997). This paper aims to clarify the confusion by describing the di erences and details of these two models, and by empirically comparing their classication performance on ve text corpora. We nd that the multi-variate Bernoulli performs well with small vocabulary sizes, but. | A Comparison of Event Models for Naive Bayes Text Classification Andrew McCallum mccallum@justresearch .com Í Just Research 4616 Henry Street Pittsburgh PA 15213 Kamal Nigam knigam@cs. cmu. edu t School of Computer Science Carnegie Mellon University Pittsburgh PA 15213 Abstract Recent approaches to text classification have used two different first-order probabilistic models for classification both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model that is a Bayesian Network with no dependencies between words and binary word features . Larkey and Croft 1996 Koller and Sahami 1997 . Others use a multinomial model that is a uni-gram language model with integer word counts . Lewis and Gale 1994 Mitchell 1997 . This paper aims to clarify the confusion by describing the differences and details of these two models and by empirically comparing their classification performance on five text corpora. We find that the multi-variate Bernoulli performs well with small vocabulary sizes but that the multinomial performs usually performs even better at larger vocabulary sizes providing on average a 27 reduction in error over the multi-variate Bernoulli model at any vocabulary size. Introduction Simple Bayesian classifiers have been gaining popularity lately and have been found to perform surprisingly well Friedman 1997 Friedman et al. 1997 Sahami 1996 Langley et al. 1992 . These probabilistic approaches make strong assumptions about how the data is generated and posit a probabilistic model that embodies these assumptions then they use a collection of labeled training examples to estimate the parameters of the generative model. Classification on new examples is performed with Bayes rule by selecting the class that is most likely to have generated the example. The naive Bayes classifier is the simplest of these models in that it assumes that all attributes of the examples are independent of each other given the context of the class. This is the .
đang nạp các trang xem trước