tailieunhanh - Báo cáo khoa học: "Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study"

Among the various approaches to WSD, the supervised learning approach is the most successful to date. In this approach, we first collect a corpus in which each occurrence of an ambiguous word w has been manually annotated with the correct sense, according to some existing sense inventory in a dictionary. This annotated corpus then serves as the training material for a learning algorithm. After training, a model is automatically learned and it is used to assign the correct sense to any previously unseen occurrence of w in a new context. . | Exploiting Parallel Texts for Word Sense Disambiguation An Empirical Study Hwee Tou Ng Bin Wang Yee Seng Chan Department of Computer Science National University of Singapore 3 Science Drive 2 Singapore 117543 nght wangbin chanys @ Abstract A central problem of word sense disambiguation WSD is the lack of manually sense-tagged data required for supervised learning. In this paper we evaluate an approach to automatically acquire sense-tagged training data from English-Chinese parallel corpora which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task. Our investigation reveals that this method of acquiring sense-tagged data is promising. On a subset of the most difficult SENSEVAL-2 nouns the accuracy difference between the two approaches is only and the difference could narrow further to if we disregard the advantage that manually sense-tagged data have in their sense coverage. Our analysis also highlights the importance of the issue of domain dependence in evaluating WSD programs. 1 Introduction The task of word sense disambiguation WSD is to determine the correct meaning or sense of a word in context. It is a fundamental problem in natural language processing NLP and the ability to disambiguate word sense accurately is important for applications like machine translation information retrieval etc. Corpus-based supervised machine learning methods have been used to tackle the WSD task just like the other NLP tasks. Among the various approaches to WSD the supervised learning approach is the most successful to date. In this approach we first collect a corpus in which each occurrence of an ambiguous word w has been manually annotated with the correct sense according to some existing sense inventory in a dictionary. This annotated corpus then serves as the training material for a learning algorithm. After training a model is automatically learned and it is used to assign the correct sense to any previously unseen