tailieunhanh - Báo cáo khoa học: "Personalizing PageRank for Word Sense Disambiguation"

In this paper we propose a new graphbased method that uses the knowledge in a LKB (based on WordNet) in order to perform unsupervised Word Sense Disambiguation. Our algorithm uses the full graph of the LKB efficiently, performing better than previous approaches in English all-words datasets. We also show that the algorithm can be easily ported to other languages with good results, with the only requirement of having a wordnet. In addition, we make an analysis of the performance of the algorithm, showing that it is efficient and that it could be tuned to be faster. . | Personalizing PageRank for Word Sense Disambiguation Eneko Agirre and Aitor Soroa IXANLP Group University of the Basque Country Donostia Basque Contry @ Abstract In this paper we propose a new graphbased method that uses the knowledge in a LKB based on WordNet in order to perform unsupervised Word Sense Disambiguation. Our algorithm uses the full graph of the LKB efficiently performing better than previous approaches in English all-words datasets. We also show that the algorithm can be easily ported to other languages with good results with the only requirement of having a wordnet. In addition we make an analysis of the performance of the algorithm showing that it is efficient and that it could be tuned to be faster. 1 Introduction Word Sense Disambiguation WSD is a key enabling-technology that automatically chooses the intended sense of a word in context. Supervised WSD systems are the best performing in public evaluations Palmer et al. 2001 Snyder and Palmer 2004 Pradhan et al. 2007 but they need large amounts of hand-tagged data which is typically very expensive to build. Given the relatively small amount of training data available current state-of-the-art systems only beat the simple most frequent sense MFS baseline1 by a small margin. As an alternative to supervised systems knowledge-based WSD systems exploit the information present in a lexical knowledge base LKB to perform WSD without using any further corpus evidence. 1This baseline consists of tagging all occurrences in the test data with the sense of the word that occurs more often in the training data Traditional knowledge-based WSD systems assign a sense to an ambiguous word by comparing each of its senses with those of the surrounding context. Typically some semantic similarity metric is used for calculating the relatedness among senses Lesk 1986 McCarthy et al. 2004 . One of the major drawbacks of these approaches stems from the fact that senses are compared in a pairwise .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG