tailieunhanh - Báo cáo khoa học: "SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text"

This paper describes S ENSE L EARNER – a minimally supervised word sense disambiguation system that attempts to disambiguate all content words in a text using WordNet senses. We evaluate the accuracy of S ENSE L EARNER on several standard sense-annotated data sets, and show that it compares favorably with the best results reported during the recent S ENSEVAL evaluations. | SenseLearner Word Sense Disambiguation for All Words in Unrestricted Text Rada Mihalcea and Andras Csomai Department of Computer Science and Engineering University of North Texas rada@ ac0225@ Abstract This paper describes SenseLearner - a minimally supervised word sense disambiguation system that attempts to disambiguate all content words in a text using WordNet senses. We evaluate the accuracy of SenseLearner on several standard sense-annotated data sets and show that it compares favorably with the best results reported during the recent Senseval evaluations. 1 Introduction The task of word sense disambiguation consists of assigning the most appropriate meaning to a polysemous word within a given context. Applications such as machine translation knowledge acquisition common sense reasoning and others require knowledge about word meanings and word sense disambiguation is considered essential for all these applications. Most of the efforts in solving this problem were concentrated so far toward targeted supervised learning where each sense tagged occurrence of a particular word is transformed into a feature vector which is then used in an automatic learning process. The applicability of such supervised algorithms is however limited only to those few words for which sense tagged data is available and their accuracy is strongly connected to the amount of labeled data available at hand. Instead methods that address all words in unrestricted text have received significantly less attention. While the performance of such methods is usually exceeded by their supervised lexical-sample alternatives they have however the advantage of providing larger coverage. In this paper we present a method for solving the semantic ambiguity of all content words in a text. The algorithm can be thought of as a minimally supervised word sense disambiguation algorithm in that it uses a relatively small data set for training purposes and generalizes the concepts learned from

TÀI LIỆU MỚI ĐĂNG