Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Paragraph-, word-, and coherence-based approaches to sentence ranking: A comparison of algorithm and human performance"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

For each of the sentences in the text, they provided a ranking of how important that sentence is with respect to the content of the text, on an integer scale from 1 (not important) to 7 (very important). The approaches we evaluated are a simple paragraph-based approach that serves as a baseline, two word-based algorithms, and two coherencebased approaches1. | Paragraph- word- and coherence-based approaches to sentence ranking A comparison of algorithm and human performance Florian WOLF Massachusetts Institute of Technology MIT NE20-448 3 Cambridge Center Cambridge MA 02139 USA fwolf@mit.edu Abstract Sentence ranking is a crucial part of generating text summaries. We compared human sentence rankings obtained in a psycholinguistic experiment to three different approaches to sentence ranking A simple paragraph-based approach intended as a baseline two word-based approaches and two coherence-based approaches. In the paragraph-based approach sentences in the beginning of paragraphs received higher importance ratings than other sentences. The word-based approaches determined sentence rankings based on relative word frequencies Luhn 1958 Salton Buckley 1988 . Coherence-based approaches determined sentence rankings based on some property of the coherence structure of a text Marcu 2000 Page et al. 1998 . Our results suggest poor performance for the simple paragraph-based approach whereas wordbased approaches perform remarkably well. The best performance was achieved by a coherence-based approach where coherence structures are represented in a non-tree structure. Most approaches also outperformed the commercially available MSWord summarizer. 1 Introduction Automatic generation of text summaries is a natural language engineering application that has received considerable interest particularly due to the ever-increasing volume of text information available through the internet. The task of a human generating a summary generally involves three subtasks Brandow et al. 1995 Mitra et al. 1997 1 understanding a text 2 ranking text pieces sentences paragraphs phrases etc. for importance 3 generating a new text the summary . Like most approaches to summarization we are concerned with the second subtask e.g. Carlson et al. 2001 Goldstein et al. 1999 Gong Liu 2001 Jing et al. 1998 Edward GIBSON Massachusetts Institute of Technology MIT .