tailieunhanh - Báo cáo khoa học: "Automatic Evaluation of Linguistic Quality in Multi-Document Summarization"

To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. | Automatic Evaluation of Linguistic Quality in Multi-Document Summarization Emily Pitler Annie Louis Ani Nenkova Computer and Information Science University of Pennsylvania Philadelphia PA 19104 USA epitler lannie nenkova@ Abstract To date few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality the best results come from a set of syntactic features. Focus coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences coreference information and summarization specific features. Our best results are 90 accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70 for ranking summaries of a specific input. 1 Introduction Efforts for the development of automatic text sum-marizers have focused almost exclusively on improving content selection capabilities of systems ignoring the linguistic quality of the system output. Part of the reason for this imbalance is the existence of ROUGE Lin and Hovy 2003 Lin 2004 the system for automatic evaluation of content selection which allows for frequent evaluation during system development and for reporting results of experiments performed outside of the annual NIST-led evaluations the Document Understanding Conference DUC 1 and the Text Analysis Conference TAC 2. Few metrics however have been proposed for evaluating linguistic 1http 2http tac quality and none have been validated on data from NIST evaluations. In their pioneering work on automatic evaluation of summary coherence Lapata and Barzilay 2005 provide a .

TỪ KHÓA LIÊN QUAN