tailieunhanh - Báo cáo khoa học: "Word Sense Disambiguation using Optimised Combinations of Knowledge Sources"
Word sense disambiguation algorithms, with few exceptions, have made use of only one lexical knowledge source. We describe a system which performs word sense disambiguation on all content words in free text by combining different knowledge sources: semantic preferences, dictionary definitions and subject/domain codes along with part-of-speech tags, optimised by means of a learning algorithm. We also describe the creation of a new sense tagged corpus by combining existing resources. | Word Sense Disambiguation using Optimised Combinations of Knowledge Sources Yorick Wilks and Mark Stevenson Department of Computer Science University of Sheffield Regent Court 211 Portobello Street Sheffield SI 4DP United Kingdom yorick marks @ Abstract Word sense disambiguation algorithms with few exceptions have made use of only one lexical knowledge source. We describe a system which performs word sense disambiguation on all content words in free text by combining different knowledge sources semantic preferences dictionary definitions and sub-ject domain codes along with part-of-speech tags optimised by means of a learning algorithm. We also describe the creation of a new sense tagged corpus by combining existing resources. Tested accuracy of our approach on this corpus exceeds 92 demonstrating the viability of all-word disambiguation rather than restricting oneself to a small sample. 1 Introduction This paper describes a system that integrates a number of partial sources of information to perform word sense disambiguation WSD of content words in general text at a high level of accuracy. The methodology and evaluation of WSD are somewhat different from those of other NLP modules and one can distinguish three aspects of this difference all of which come down to evaluation problems as does so much in NLP these days. First researchers are divided between a general method that attempts to apply WSD to all the content words of texts the option taken in this paper and one that is applied only to a small trial selection of texts words for example Schiitze 1992 Yarowsky 1995 . These researchers have obtained very high levels of success in excess of 95 close to the figures for other solved NLP modules the issue being whether these small word sample methods and techniques will transfer to general WSD over all content words. Others eg. Mahesh et al. 1997 Harley and Glennon 1997 have pursued the general option on the grounds that it is the real task and should
đang nạp các trang xem trước