tailieunhanh - Author Profiling of Vietnamese Forum Posts - An Investigation on Content-based Features
In this paper, we investigate the author profiling task for Vietnamese forum posts to predict demographic attributes, such as gender, age, occupation, and location of the author. Although we conducted the experiments on different types of features, including style-based and content-based features, we focused more on analyzing the effects of content-based features. | VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 1 (2017) 37-46 Author Profiling of Vietnamese Forum Posts - An Investigation on Content-based Features Duong Tran Duc1,*, Pham Bao Son2, Tan Hanh1 1 Posts and Telecommunications Institute of Technology, Hanoi, Vietnam 2 VNU University of Engineering and Technology Abstract In this paper, we investigate the author profiling task for Vietnamese forum posts to predict demographic attributes, such as gender, age, occupation, and location of the author. Although we conducted the experiments on different types of features, including style-based and content-based features, we focused more on analyzing the effects of content-based features. We used machine learning approaches to perform classification tasks on datasets we collected from popular forums in Vietnamese. The results show that these kinds of features work well on such a kind of short and free style messages as forum posts, in which, content-based features achieved much better results than style-based features. Received 28 June 2016; Revised 10 December 2016 & 08 February 2017; Accepted 18 February 2017 Keywords: Author profiling, machine learning, content-based features. 1. Introduction* people do not provide their personal information or input the incorrect/unclear data. As a result, the task of automatically classifying the author’s properties such as gender, age, location, occupation, etc. becomes important and essential. Applications of this task can be in commercial field, in which providers can know which types of users like or do not like their products/services (for target marketing and product development). For the social research domain, researchers also want to know the profile of people who have a specific opinion about some social issues (when doing a social survey). It can also be used to support the court, in term of identifying if a text was created by a criminal or not [1]. Profiling the author of forum posts is also .
đang nạp các trang xem trước