tailieunhanh - Báo cáo khoa học: "EVALUATING DISCOURSE PROCESSING ALGORITHMS"

How might one evaluate the In order to take steps towards establishing a methodology for evaluating Natural Language systems, we relative contributions of each of these factors or comconducted a case study. We attempt to evaluate two pare two approaches to the same problem? different approaches to anaphoric processing in disIn order to take steps towards establishing a course by comparing the accuracy and coverage of methodology for doing this type of comparison, we two published algorithms for finding the co-specifiers conducted a case study. . | EVALUATING DISCOURSE PROCESSING ALGORITHMS Marilyn A. Walker Hewlett Packard Laboratories Filton Rd. Bristol England BS12 6QZ . University of Pennsylvania lyn lwalker@ Abstract In order to take steps towards establishing a methodology for evaluating Natural Language systems we conducted a case study. We attempt to evaluate two different approaches to anaphoric processing in discourse by comparing the accuracy and coverage of two published algorithms for finding the co-specifiers of pronouns in naturally occurring texts and dialogues. We present the quantitative results of handsimulating these algorithms but this analysis naturally gives rise to both a qualitative evaluation and recommendations for performing such evaluations in general. We illustrate the general difficulties encountered with quantitative evaluation. These are problems with a allowing for underlying assumptions b determining how to handle underspecifications and c evaluating the contribution of false positives and error chaining. 1 Introduction In the course of developing natural language interfaces computational linguists are often in the position of evaluating different theoretical approaches to the analysis of natural language NL . They might want to a evaluate and improve on a current system b add a capability to a system that it didn t previously have c combine modules from different systems. Consider the goal of adding a discourse component to a system or evaluating and improving one that is already in place. A discourse module might combine theories on . centering or local focusing GJW83 Sid79 global focus Gro77 coherence relations Hob85 event reference Web86 in-tonational structure PH87 system vs. user be liefs P0I86 plan or intent recognition or production Coh78 AP86 SI81 control WS88 or complex syntactic structures Pri85 . How might one evaluate the relative contributions of each of these factors or compare two approaches to the same problem In order to take steps .

TỪ KHÓA LIÊN QUAN