tailieunhanh - Báo cáo khoa học: "Integrating cohesion and coherence for Automatic Summarization"

This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting summaries and cannot beat a dummy baseline consisting of the first sentence in the document. Nevertheless, we argue that this approach relies on basic linguistic mechanisms and is therefore genreindependent. . | Integrating cohesion and coherence for Automatic Summarization Laura Alonso i Alemany GRIAL Departament de Lingtiistica General Universitat de Barcelona lalonso@ Maria Fuentes Fort Departament d Informatica i Matemàtica Aplicada Universitat de Girona Abstract This paper presents the integration of cohesive properties of text with coherence relations to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus this integration yields only slight improvement in the resulting summaries and cannot beat a dummy baseline consisting of the first sentence in the document. Nevertheless we argue that this approach relies on basic linguistic mechanisms and is therefore genreindependent. 1 Motivation Text Summarization TS can be decomposed into three phases analysing the input text to obtain text representation transforming it into a summary representation and synthesizing an appropriate output form to generate the summary text. Much of the early work in summarization has been concerned with detecting relevant elements of text and presenting them in the shortest possible form . More recently an increasing attention has been devoted to the adequacy of the resulting texts to a human user. Well-formedness cohesion and coherence are cuưently under inspection not only because they improve the quality of a summary as a text but also because they can reduce the final summary by reducing the reading time and cost that is needed to process it. TS systems that performed best in last DUC contest DUC 2002 apply template-driven summarization by information-extraction procedures in the line of Schank and Abelson 1977 . This approach yields very good results in assessing relevance and keeping well-formedness but it is dependent on a clearly defined representation of the information

TỪ KHÓA LIÊN QUAN