tailieunhanh - Báo cáo khoa học: "Metadata-Aware Measures for Answer Summarization in Community Question Answering"

This paper presents a framework for automatically processing information coming from community Question Answering (cQA) portals with the purpose of generating a trustful, complete, relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental results on data drawn from Yahoo! Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the. | Metadata-Aware Measures for Answer Summarization in Community Question Answering Mattia Tomasoni Dept. of Information Technology Uppsala University Uppsala Sweden Minlie Huang Dept. Computer Science and Technology Tsinghua University Beijing 100084 China aihuang@ Abstract This paper presents a framework for automatically processing information coming from community Question Answering cQA portals with the purpose of generating a trustful complete relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content UGC to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental results on data drawn from Yahoo Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method. 1 Introduction Community Question Answering cQA portals are an example of Social Media where the information need of a user is expressed in the form of a question for which a best answer is picked among the ones generated by other users. cQA websites are becoming an increasingly popular complement to search engines overnight a user can expect a human-crafted natural language answer tailored to her specific needs. We have to be aware though that User Generated Content UGC is often redundant noisy and untrustworthy Jeon et al. The research was conducted while the first author was visiting Tsinghua University. 2006 Wang et al. 2009b Suryanto et al. 2009 . Interestingly a great amount of information is embedded in the metadata generated as a byproduct of users action and interaction on Social Media. Much valuable information is contained in answers other than the chosen best

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN