tailieunhanh - Báo cáo khoa học: "Evaluation tool for rule-based anaphora resolution methods"

In this paper we argue that comparative evaluation in anaphora resolution has to be performed using the same pre-processing tools and on the same set of data. The paper proposes an evaluation environment for comparing anaphora resolution algorithms which is illustrated by presenting the results of the comparative evaluation of three methods on the basis of several evaluation measures. | Evaluation tool for rule-based anaphora resolution methods Catalina Barbu School of Humanities Languages and Social Sciences University of Wolverhampton Stafford Street Wolverhampton WV1 1SB United Kingdom Ruslan Mitkov School of Humanities Languages and Social Sciences University of Wolverhampton Stafford Street Wolverhampton WV1 1SB United Kingdom Abstract In this paper we argue that comparative evaluation in anaphora resolution has to be performed using the same pre-processing tools and on the same set of data. The paper proposes an evaluation environment for comparing anaphora resolution algorithms which is illustrated by presenting the results of the comparative evaluation of three methods on the basis of several evaluation measures. 1 Introduction The evaluation of any NLP algorithm or system should indicate not only its efficiency or performance but should also help us discover what a new approach brings to the current state of play in the field. To this end a comparative evaluation with other well-known or similar approaches would be highly desirable. We have already voiced concern Mitkov 1998a Mitkov 2000b that the evaluation of anaphora resolution algorithms and systems is bereft of any common ground for comparison due not only to the difference of the evaluation data but also due to the diversity of pre-processing tools employed by each anaphora resolution system. The evaluation picture would not be accurate even if we compared anaphora resolution systems on the basis of the same data since the pre-processing errors which would be carried over to the systems outputs might vary. As a way forward we have proposed the idea of the evaluation workbench Mitkov 2000b - an open-ended architecture which allows the incorporation of different algorithms and their comparison on the basis of the same pre-processing tools and the same data. Our paper discusses a particular configuration of this new evaluation environment .

TỪ KHÓA LIÊN QUAN