tailieunhanh - Báo cáo khoa học: "When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging"

This study presents a novel approach to the problem of system portability across different domains: a sentiment annotation system that integrates a corpus-based classifier trained on a small set of annotated in-domain data and a lexicon-based system trained on WordNet. The paper explores the challenges of system portability across domains and text genres (movie reviews, news, blogs, and product reviews), highlights the factors affecting system performance on out-of-domain and smallset in-domain data, and presents a new system consisting of the ensemble of two classifiers with precision-based vote weighting, that provides significant gains in accuracy and recall over the corpus-based. | When Specialists and Generalists Work Together Overcoming Domain Dependence in Sentiment Tagging Alina Andreevskaia Concordia University Montreal Quebec andreev@ Sabine Bergler Concordia University Montreal Canada bergler@ Abstract This study presents a novel approach to the problem of system portability across different domains a sentiment annotation system that integrates a corpus-based classifier trained on a small set of annotated in-domain data and a lexicon-based system trained on Word-Net. The paper explores the challenges of system portability across domains and text genres movie reviews news blogs and product reviews highlights the factors affecting system performance on out-of-domain and smallset in-domain data and presents a new system consisting of the ensemble of two classifiers with precision-based vote weighting that provides significant gains in accuracy and recall over the corpus-based classifier and the lexicon-based system taken individually. 1 Introduction One of the emerging directions in NLP is the development of machine learning methods that perform well not only on the domain on which they were trained but also on other domains for which training data is not available or is not sufficient to ensure adequate machine learning. Many applications require reliable processing of heterogeneous corpora such as the World Wide Web where the diversity of genres and domains present in the Internet limits the feasibility of in-domain training. In this paper sentiment annotation is defined as the assignment of positive negative or neutral sentiment values to texts sentences and other linguistic units. Recent experiments assessing system portability across different domains conducted by Aue and Gamon 2005 demonstrated that sentiment annotation classifiers trained in one domain do not perform well on other domains. A number of methods has been proposed in order to overcome this system portability limitation by using .