tailieunhanh - Báo cáo khoa học: "Word classification based on combined measures of distributional and semantic similarity"

The paper addresses the problem of automatic enrichment of a thesaurus by classifying new words into its classes. The proposed classification method makes use of both the distributional data about a new word and the strength of the semantic relatedness of its target class to other likely candidate classes. | Word classification based on combined measures of distributional and semantic similarity Viktor Pekar Steffen Staab Bashkir State University 450000 Ufa Russia vpekar@ Institute AIFB University of Karlsruhe http WBS Learning Lab Lower Saxony http Abstract The paper addresses the problem of automatic enrichment of a thesaurus by classifying new words into its classes. The proposed classification method makes use of both the distributional data about a new word and the strength of the semantic relatedness of its target class to other likely candidate classes. 1 Introduction Today many NLP applications make active use of thesauri like WordNet which serve as background lexical knowledge for processing the semantics of words and documents. However maintaining a thesaurus so that it sufficiently covers the lexicon of novel text data requires a lot of time and effort which may be prohibitive in many settings. One possibility to semi- automatically enrich a thesaurus with new items is to exploit the distributional hypothesis. According to this approach the meaning of a new word is first represented as the totality of textual contexts where it occurs and then assigned to that semantic class which members exhibit similar occurrence patterns. The distributional approach was shown to be quite effective for tasks where new words need to be assigned to a limited number of classes up to 5 . Riloff and Shepherd 1997 Roark and Chamiak 1998 . However its application to numerous classes as would be the case with a thesaurus of a realistic size proves to be much more challenging. For example Alfonseca and Manandhar 2002 attain the learning accuracy1 of 38 when assigning new words to 46 WordNet concepts. In the present paper we propose a method that is particularly effective for the task of classifying words into numerous classes forming a hierarchy. The position of a class inside the hierarchy reflects the degree of its semantic .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN