tailieunhanh - Báo cáo khoa học: "Automatically Assessing the Post Quality in Online Discussions on Software"
Assessing the quality of user generated content is an important problem for many web forums. While quality is currently assessed manually, we propose an algorithm to assess the quality of forum posts automatically and test it on data provided by . We use state-of-the-art classification techniques and experiment with five feature classes: Surface, Lexical, Syntactic, Forum specific and Similarity features. We achieve an accuracy of 89% on the task of automatically assessing post quality in the software domain using forum specific features. Without forum specific features, we achieve an accuracy of 82%. . | Automatically Assessing the Post Quality in Online Discussions on Software Markus Weimer and Iryna Gurevych and Max Muhlhauser Ubiquitous Knowledge Processing Group Division of Telecooperation Darmstadt University of Technology Germany http mweimer gurevych max @ Abstract Assessing the quality of user generated content is an important problem for many web forums. While quality is currently assessed manually we propose an algorithm to assess the quality of forum posts automatically and test it on data provided by . We use state-of-the-art classification techniques and experiment with five feature classes Surface Lexical Syntactic Forum specific and Similarity features. We achieve an accuracy of 89 on the task of automatically assessing post quality in the software domain using forum specific features. Without forum specific features we achieve an accuracy of 82 . 1 Introduction Web leads to the proliferation of user generated content such as blogs wikis and forums. Key properties of user generated content are low publication threshold and a lack of editorial control. Therefore the quality of this content may vary. The end user has problems to navigate through large repositories of information and find information of high quality quickly. In order to address this problem many forum hosting companies like Google Groups1 and Nabble2 introduce rating mechanisms where users can rate the information manually on a scale from 1 low quality to 5 high quality . The ratings have been shown to be consistent with the user community by Lampe and Resnick 2004 . However the 1http 2http percentage of manually rated posts is very low in Nabble . Departing from this the main idea explored in the present paper is to investigate the feasibility of automatically assessing the perceived quality of user generated content. We test this idea for online forum discussions in the
đang nạp các trang xem trước