tailieunhanh - manning schuetze statisticalnlp phần 10

Mục đích của chương này là cung cấp cho học sinh quan tâm đến việc phân loại cho NLP một số điểm định hướng. Một chuyên sâu gần đây giới thiệu về học máy ). So sánh của một số thuật toán học áp dụng cho phân loại văn bản có thể được tìm thấy và cộng sự năm 1996.) | Further Reading 607 Further Reading The purpose of this chapter is to give the student interested in classification for NLP some orientation points. A recent in-depth introduction to machine learning is Mitchell 1997 . Comparisons of several learning algorithms applied to text categorization can be found in Yang Lewis et al. 1996 and Schiitze et al. 1995 . The features and the data representation based on the features used in this chapter can be downloaded from the book website. Some important classification techniques which we have not covered are logistic regression and linear discriminant analysis et al. 1995 decision lists where an ordered list of rules that change the classification is learned Yarowsky 1994 winnow a mistake-driven online linear threshold learning algorithm Dagan et al. 1997a and the Rocchio algorithm Rocchio 1971 Schapire et al. 1998 . Naive Bayes Another important classification technique Naive Bayes was introduced in section . See Domingos and Pazzani 1997 for a discussion of its properties in particular the fact that it often does surprisingly well even when the feature independence assumed by Naive Bayes does not hold. Other examples of the application of decision trees to NLP tasks are parsing Magerman 1994 and tagging 1994 . The idea of using held out training data to train a linear interpolation over all the distributions between a leaf node and the root was used both by 1994 and earlier work at IBM. Rather than simply using cross-validation to determine an optimal tree size an alternative is to grow multiple decision trees and then to average the judgements of the individual trees. bagging Such techniques go under names like bagging and boosting and have reBOOSTING cently been widely explored and found to be quite successful Breiman 1994 Quinlan 1996 . One of the first papers to apply decision trees to text categorization is Lewis and Ringuette 1994 . imum entropy Jelinek 1997 ch. 13-14 provides an in-depth introduction

TỪ KHÓA LIÊN QUAN