tailieunhanh - Báo cáo khoa học: "Automatic Acquisition of Adjectival Subcategorization from Corpora"

This paper describes a novel system for acquiring adjectival subcategorization frames (SCFs) and associated frequency information from English corpus data. The system incorporates a decision-tree classifier for 30 SCF types which tests for the presence of grammatical relations (GRs) in the output of a robust statistical parser. It uses a powerful patternmatching language to classify GRs into frames hierarchically in a way that mirrors inheritance-based lexica. The experiments show that the system is able to detect SCF types with 70% precision and 66% recall rate. . | Automatic Acquisition of Adjectival Subcategorization from Corpora Jeremy Yallop Anna Korhonen and Ted Briscoe Computer Laboratory University of Cambridge 15 JJ Thomson Avenue Cambridge CB3 OFD UK yallop@ @ Abstract This paper describes a novel system for acquiring adjectival subcategorization frames scfs and associated frequency information from English corpus data. The system incorporates a decision-tree classifier for 30 SCF types which tests for the presence of grammatical relations GRs in the output of a robust statistical parser. It uses a powerful patternmatching language to classify GRs into frames hierarchically in a way that mirrors inheritance-based lexica. The experiments show that the system is able to detect SCF types with 70 precision and 66 recall rate. A new tool for linguistic annotation of scfs in corpus data is also introduced which can considerably alleviate the process of obtaining training and test data for subcategorization acquisition. 1 Introduction Research into automatic acquisition of lexical information from large repositories of unannotated text such as the web corpora of published text etc. is starting to produce large scale lexical resources which include frequency and usage information tuned to genres and sublanguages. Such resources are critical for natural language processing nlp both for enhancing the performance of Part of this research was conducted while this author was at the University of Edinburgh Laboratory for Foundations of Computer Science. state-of-art statistical systems and for improving the portability of these systems between domains. One type of lexical information with particular importance for NLP is subcategorization. Access to an accurate and comprehensive subcategorization lexicon is vital for the development of successful parsing technology . Carroll et al. 1998b important for many NLP tasks . automatic verb classification Schulte im Walde and Brew 2002