Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Automatic Single-Document Key Fact Extraction from Newswire Articles"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper addresses the problem of extracting the most important facts from a news article. Our approach uses syntactic, semantic, and general statistical features to identify the most important sentences in a document. The importance of the individual features is estimated using generalized iterative scaling methods trained on an annotated newswire corpus. The performance of our approach is evaluated against 300 unseen news articles and shows that use of these features results in statistically significant improvements over a provenly robust baseline, as measured using metrics such as precision, recall and ROUGE. . | Automatic Single-Document Key Fact Extraction from Newswire Articles Itamar Kastner Department of Computer Science Queen Mary University of London UK itk1@dcs.qmul.ac.uk Christof Monz ISLA University of Amsterdam Amsterdam The Netherlands christof@science.uva.nl Abstract This paper addresses the problem of extracting the most important facts from a news article. Our approach uses syntactic semantic and general statistical features to identify the most important sentences in a document. The importance of the individual features is estimated using generalized iterative scaling methods trained on an annotated newswire corpus. The performance of our approach is evaluated against 300 unseen news articles and shows that use of these features results in statistically significant improvements over a provenly robust baseline as measured using metrics such as precision recall and ROUGE. 1 Introduction The increasing amount of information that is available to both professional users such as journalists financial analysts and intelligence analysts and lay users has called for methods condensing information in order to make the most important content stand out. Several methods have been proposed over the last two decades among which keyword extraction and summarization are the most prominent ones. Keyword extraction aims to identify the most relevant words or phrases in a document e.g. Witten et al. 1999 while summarization aims to provide a short commonly 100 words coherent full-text summary of the document e.g. McKeown et al. 1999 . Key fact extraction falls in between key word extraction and summarization. Here the challenge is to identify the most relevant facts in a document but not necessarily in a coherent full-text form as is done in summarization. Evidence of the usefulness of key fact extraction is CNN s web site which since 2006 has most of its news articles preceded by a list of story highlights see Figure 1. The advantage of the news highlights as opposed to .