Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

We describe one approach to build an automatically trainable anaphora resolution system. In this approach, we use Japanese newspaper articles tagged with discourse information as training examples for a machine learning algorithm which employs the C4.5 decision tree algorithm by Quinlan (Quinlan, 1993). Then, we evaluate and compare the results of several variants of the machine learning-based approach with those of our existing anaphora resolution system which uses manually-designed knowledge sources. Finally, we compare our algorithms with existing theories of anaphora, in particular, Japanese zero pronouns. . | Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies Chinatsu Aone and Scott William Bennett Systems Research and Applications Corporation SRA 2000 15th Street North Arlington VA 22201 aonec@sra.com bennett@sra.com Abstract We describe one approach to build an automatically trainable anaphora resolution system. In this approach we use Japanese newspaper articles tagged with discourse information as training examples for a machine learning algorithm which employs the C4.5 decision tree algorithm by Quinlan Quinlan 1993 . Then we evaluate and compare the results of several variants of the machine learning-based approach with those of our existing anaphora resolution system which uses manually-designed knowledge sources. Finally we compare our algorithms with existing theories of anaphora in particular Japanese zero pronouns. 1 Introduction Anaphora resolution is an important but still difficult problem for various large-scale natural language processing NLP applications such as information extraction and machine translation. Thus far no theories of anaphora have been tested on an empirical basis and therefore there is no answer to the best anaphora resolution algorithm.1 Moreover an anaphora resolution system within an NLP system for real applications must handle degraded or missing input no NLP system has complete lexicons grammars or semantic knowledge and outputs perfect results and different anaphoric phenomena in different domains languages and applications. Thus even if there exists a perfect theory it might not work well with noisy input or it would not cover all the anaphoric phenomena. Walker Walker 1989 compares Brennan Friedman and Pollard s centering approach Brennan et al. 1987 with Hobbs algorithm Hobbs 1976 on a theoretical basis. These requirements have motivated US to develop robust extensible and trainable anaphora resolution systems. Previously Aone and McKee 1993 we reported our data-driven multilingual anaphora resolution