tailieunhanh - Báo cáo khoa học: "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach"

In this paper, we present a new approach for word sense disambiguation (WSD) using an exemplar-based learning algorithm. This approach integrates a diverse set of knowledge sources to disambiguate word sense, including part of speech of neighboring words, morphological form, the unordered set of surrounding words, local collocations, and verb-object syntactic relation. We tested our WSD program, named LEXAS, on both a common data set used in previous work, as well as on a large sense-tagged corpus that we separately constructed. . | Integrating Multiple Knowledge Sources to Disambiguate Word Sense An Exemplar-Based Approach Hwee Tou Ng Defence Science Organisation 20 Science Park Drive Singapore 118230 Hian Beng Lee Defence Science Organisation 20 Science Park Drive Singapore 118230 Abstract In this paper we present a new approach for word sense disambiguation WSD using an exemplar-based learning algorithm. This approach integrates a diverse set of knowledge sources to disambiguate word sense including part of speech of neighboring words morphological form the unordered set of surrounding words local collocations and verb-object syntactic relation. We tested our WSD program named Lexas on both a common data set used in previous work as well as on a large sense-tagged corpus that we separately constructed. Lexas achieves a higher accuracy on the common data set and performs better than the most frequent heuristic on the highly ambiguous words in the large corpus tagged with the refined senses of Word Net. 1 Introduction One important problem of Natural Language Processing NLP is figuring out what a word means when it is used in a particular context. The different meanings of a word are listed as its various senses in a dictionary. The task of Word Sense Disambiguation WSD is to identify the correct sense of a word in context. Improvement in the accuracy of identifying the correct word sense will result in better machine translation systems information retrieval systems etc. For example in machine translation knowing the correct word sense helps to select the appropriate target words to use in order to translate into a target language. In this paper we present a new approach for WSD using an exemplar-based learning algorithm. This approach integrates a diverse set of knowledge sources to disambiguate word sense including part of speech POS of neighboring words morphological form the unordered set of surrounding words local collocations and