tailieunhanh - Báo cáo khoa học: "Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics"

We describe a biographical multidocument summarizer that summarizes information about people described in the news. The summarizer uses corpus statistics along with linguistic knowledge to select and merge descriptions of people from a document collection, removing redundant descriptions. The summarization components have been extensively evaluated for coherence, accuracy, and non-redundancy of the descriptions produced. | Producing Biographical Summaries Combining Linguistic Knowledge with Corpus Statistics1 Barry Schiffman Columbia University 1214 Amsterdam Avenue New York NY 10027 USA Bschiff@ Inderjeet Mani2 The MITRE Corporation 11493 Sunset Hills Road Reston VA 20190 USA imani@ Kristian J. Concepcion The MITRE Corporation 11493 Sunset Hills Road Reston VA 20190 USA kjc9@ Abstract We describe a biographical multidocument summarizer that summarizes information about people described in the news. The summarizer uses corpus statistics along with linguistic knowledge to select and merge descriptions of people from a document collection removing redundant descriptions. The summarization components have been extensively evaluated for coherence accuracy and non-redundancy of the descriptions produced. 1 Introduction The explosion of the World Wide Web has brought with it a vast hoard of information most of it relatively unstructured. This has created a demand for new ways of managing this often unwieldy body of dynamically changing information. The goal of automatic text summarization is to take a partially-structured source text extract information content from it and present the most important content in a condensed form in a manner sensitive to the needs of the user and task Mani and Maybury 1999 . Summaries can be generic . aimed at a broad audience or topic-focused . tailored to the requirements of a particular user or group of users. Multi-Document Summarization MDS is by definition the extension of single-document summarization to collections of related documents. MDS can potentially help the user to see at a glance what a collection is about or to examine similarities and differences in the information content in the collection. Specialized multi-document summarization systems can be constructed for various applications here we discuss a biographical summarizer. Biographies can of course be long as in book-length biographies or short as in