tailieunhanh - Báo cáo khoa học: "Aspect Extraction through Semi-Supervised Modeling"
Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. | Aspect Extraction through Semi-Supervised Modeling Arjun Mukherjee Department of Computer Science University of Illinois at Chicago Chicago IL 60607 USA arjun4787@ Bing Liu Department of Computer Science University of Illinois at Chicago Chicago IL 60607 USA liub@ Abstract Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them or extract and categorize them using unsupervised topic modeling. By categorizing we mean the synonymous aspects should be clustered into the same category. In this paper we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously. This setting is important because categorizing aspects is a subjective task. For different application purposes different categorizations may be needed. Some form of user guidance is desired. In this paper we propose two statistical models to solve this seeded problem which aim to discover exactly what the user wants. Our experimental results show that the two proposed models are indeed able to perform the task effectively. 1 Introduction Aspect-based sentiment analysis is one of the main frameworks for sentiment analysis Hu and Liu 2004 Pang and Lee 2008 Liu 2012 . A key task of the framework is to extract aspects of entities that have been commented in opinion documents. The task consists of two sub-tasks. The first subtask extracts aspect terms from an opinion corpus. The second sub-task clusters synonymous aspect terms into categories where each category 339 represents a single aspect which we call an aspect category. Existing research has proposed many methods for aspect extraction. They largely fall into two main types. The first type only extracts aspect terms without grouping them into categories although a subsequent step may be used for the grouping see Section 2 . The second type uses
đang nạp các trang xem trước