Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "TOWARDS THE AUTOMATIC IDENTIFICATION OF ADJECTIVAL SCALES: CLUSTERING ADJECTIVES ACCORDING TO MEANING"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

In this paper we present a method to group adjectives according to their meaning, as a first step towards the automatic identification of adjectival scales. We discuss the properties of adjectival scales and of groups of semantically related adjectives and how they imply sources of linguistic knowledge in text corpora. We describe how our system exploits this linguistic knowledge to compute a measure of similarity between two adjectives, using statistical techniques and without having access to any semantic information about the adjectives. . | TOWARDS THE AUTOMATIC IDENTIFICATION OF ADJECTIVAL SCALES CLUSTERING ADJECTIVES ACCORDING TO MEANING Vasileios Hatzivassiloglou Kathleen R. McKeown Department of Computer Science 450 Computer Science Building Columbia University New York N.Y. 10027 Internet vh@cs.columbia.edu kathy@cs.columbia.edu ABSTRACT In this paper we present a method to group adjectives according to their meaning as a first step towards the automatic identification of adjectival scales. We discuss the properties of adjectival scales and of groups of semantically related adjectives and how they imply sources of linguistic knowledge in text corpora. We describe how our system exploits this linguistic knowledge to compute a measure of similarity between two adjectives using statistical techniques and without having access to any semantic information about the adjectives. We also show how a clustering algorithm can use these similarities to produce the groups of adjectives and we present results produced by our system for a sample set of adjectives. We conclude by presenting evaluation methods for the task at hand and analyzing the significance of the results obtained. 1. INTRODUCTION As natural language processing systems become more oriented towards solving real-world problems like machine translation or spoken language understanding in a limited domain their need for access to vast amounts of knowledge increases. While a model of the general rules of the language at various levels morphological syntactic etc. can be hand-encoded knowledge which pertains to each specific word is harder to encode manually if only because of the size of the lexicon. Most systems currently rely on human linguists or lexicographers who compile lexicon entries by hand. This approach requires significant amounts of time and effort for expanding the system s lexicon. Furthermore if the compiled information depends in any way on the domain of the application the acquisition of lexical knowledge must be repeated .