tailieunhanh - Báo cáo khoa học: "Bridging the Gap between Dictionary and Thesaurus"

This paper presents an algorithm to integrate different lexical resources, through which we hope to overcome the individual inadequacy of the resources, and thus obtain some enriched lexical semantic information for applications such as word sense disambiguation. We used WordNet as a mediator between a conventional dictionary and a thesaurus. Preliminary results support our hypothesised structural relationship, which enables the integration, of the resources. These results also suggest that we can combine the resources to achieve an overall balanced degree of sense discrimination. . | Bridging the Gap between Dictionary and Thesaurus Oi Yee Kwong Computer Laboratory University of Cambridge New Museums Site Cambridge CB2 3QG . oyk20@ Abstract This paper presents an algorithm to integrate different lexical resources through which we hope to overcome the individual inadequacy of the resources and thus obtain some enriched lexical semantic information for applications such as word sense disambiguation. We used WordNet as a mediator between a conventional dictionary and a thesaurus. Preliminary results support our hypothesised structural relationship which enables the integration of the resources. These results also suggest that we can combine the resources to achieve an overall balanced degree of sense discrimination. 1 Introduction It is generally accepted that applications such as word sense disambiguation WSD machine translation MT and information retrieval IR require a wide range of resources to supply the necessary lexical semantic information. For instance Cal-zolari 1988 proposed a lexical database in Italian which has the features of both a dictionary and a thesaurus and Klavans and Tzoukermann 1995 tried to build a fuller bilingual lexicon by enhancing machine-readable dictionaries with large corpora. Among the attempts to enrich lexical information many have been directed to the analysis of dictionary definitions and the transformation of the implicit information to explicit knowledge bases for computational purposes Amsler 1981 Calzolari 1984 Chodorow et al. 1985 Markowitz et al. 1986 Klavans et al. 1990 Vossen and Copestake 1993 . Nonetheless dictionaries are also infamous of their non-standardised sense granularity and the taxonomies obtained from definitions are inevitably ad hoc. It would therefore be a good idea if we can unify our lexical semantic knowledge by some existing and widely exploited classifications such as the system in Roget s Thesaurus Roget 1852 which has remained intact for years and has been used in .

TỪ KHÓA LIÊN QUAN