Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "A Machine Learning Approach to German Pronoun Resolution"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents a novel ensemble learning approach to resolving German pronouns. Boosting, the method in question, combines the moderately accurate hypotheses of several classifiers to form a highly accurate one. Experiments show that this approach is superior to a single decision-tree classifier. Furthermore, we present a standalone system that resolves pronouns in unannotated text by using a fully automatic sequence of preprocessing modules that mimics the manual annotation process. Although the system performs well within a limited textual domain, further research is needed to make it effective for open-domain question answering and text summarisation. . | A Machine Learning Approach to German Pronoun Resolution Beata Kouchnir Department of Computational Linguistics Tubingen University 72074 Tubingen Germany kouchnir@sfs.uni-tuebingen.de Abstract This paper presents a novel ensemble learning approach to resolving German pronouns. Boosting the method in question combines the moderately accurate hypotheses of several classifiers to form a highly accurate one. Experiments show that this approach is superior to a single decision-tree classifier. Furthermore we present a standalone system that resolves pronouns in unannotated text by using a fully automatic sequence of preprocessing modules that mimics the manual annotation process. Although the system performs well within a limited textual domain further research is needed to make it effective for open-domain question answering and text summarisation. 1 Introduction Automatic coreference resolution pronominal and otherwise has been a popular research area in Natural Language Processing for more than two decades with extensive documentation of both the rule-based and the machine learning approach. For the latter good results have been achieved with large feature sets including syntactic semantic grammatical and morphological information derived from handannotated corpora. However for applications that work with plain text e.g. question answering text summarisation this approach is not practical. The system presented in this paper resolves German pronouns in free text by imitating the manual annotation process with off-the-shelf language sofware. As the avalability and reliability of such software is limited the system can use only a small number of features. The fact that most German pronouns are morphologically ambiguous proves an additional challenge. The choice of boosting as the underlying machine learning algorithm is motivated both by its theoretical concept as well as its performance for other NLP tasks. The fact that boosting uses the method of ensemble learning .