tailieunhanh - Báo cáo khoa học: "Paragraph-, word-, and coherence-based approaches to sentence ranking: A comparison of algorithm and human performance"

For each of the sentences in the text, they provided a ranking of how important that sentence is with respect to the content of the text, on an integer scale from 1 (not important) to 7 (very important). The approaches we evaluated are a simple paragraph-based approach that serves as a baseline, two word-based algorithms, and two coherencebased approaches1. | Paragraph- word- and coherence-based approaches to sentence ranking A comparison of algorithm and human performance Florian WOLF Massachusetts Institute of Technology MIT NE20-448 3 Cambridge Center Cambridge MA 02139 USA fwolf@ Abstract Sentence ranking is a crucial part of generating text summaries. We compared human sentence rankings obtained in a psycholinguistic experiment to three different approaches to sentence ranking A simple paragraph-based approach intended as a baseline two word-based approaches and two coherence-based approaches. In the paragraph-based approach sentences in the beginning of paragraphs received higher importance ratings than other sentences. The word-based approaches determined sentence rankings based on relative word frequencies Luhn 1958 Salton Buckley 1988 . Coherence-based approaches determined sentence rankings based on some property of the coherence structure of a text Marcu 2000 Page et al. 1998 . Our results suggest poor performance for the simple paragraph-based approach whereas wordbased approaches perform remarkably well. The best performance was achieved by a coherence-based approach where coherence structures are represented in a non-tree structure. Most approaches also outperformed the commercially available MSWord summarizer. 1 Introduction Automatic generation of text summaries is a natural language engineering application that has received considerable interest particularly due to the ever-increasing volume of text information available through the internet. The task of a human generating a summary generally involves three subtasks Brandow et al. 1995 Mitra et al. 1997 1 understanding a text 2 ranking text pieces sentences paragraphs phrases etc. for importance 3 generating a new text the summary . Like most approaches to summarization we are concerned with the second subtask . Carlson et al. 2001 Goldstein et al. 1999 Gong Liu 2001 Jing et al. 1998 Edward GIBSON Massachusetts Institute of Technology MIT .

TỪ KHÓA LIÊN QUAN