tailieunhanh - Báo cáo khoa học: "Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance"

Fine-grained sense distinctions are one of the major obstacles to successful Word Sense Disambiguation. In this paper, we present a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense hierarchies, namely the Oxford Dictionary of English. We assess the quality of the mapping and the induced clustering, and evaluate the performance of coarse WSD systems in the Senseval-3 English all-words task. | Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance Roberto Navigli Dipartimento di Informatica Universita di Roma La Sapienza Roma Italy navigli@ Abstract Fine-grained sense distinctions are one of the major obstacles to successful Word Sense Disambiguation. In this paper we present a method for reducing the granularity of the WordNet sense inventory based on the mapping to a manually crafted dictionary encoding sense hierarchies namely the Oxford Dictionary of English. We assess the quality of the mapping and the induced clustering and evaluate the performance of coarse WSD systems in the Senseval-3 English all-words task. 1 Introduction Word Sense Disambiguation WSD is undoubtedly one of the hardest tasks in the field of Natural Language Processing. Even though some recent studies report benefits in the use of WSD in specific applications . Vickrey et al. 2005 and Stokoe 2005 the present performance of the best ranking WSD systems does not provide a sufficient degree of accuracy to enable real-world language-aware applications. Most of the disambiguation approaches adopt the WordNet dictionary Fellbaum 1998 as a sense inventory thanks to its free availability wide coverage and existence of a number of standard test sets based on it. Unfortunately WordNet is a fine-grained resource encoding sense distinctions that are often difficult to recognize even for human annotators Edmonds and Kilgariff 1998 . Recent estimations of the inter-annotator agreement when using the WordNet inventory report figures of agreement in the preparation of the English all-words test set at Senseval-3 Snyder and Palmer 2004 and on the Open Mind Word Expert annotation exercise Chklovski and Mihalcea 2002 . These numbers lead us to believe that a credible upper bound for unrestricted fine-grained WSD is around 70 a figure that state-of-the-art automatic systems find it difficult to outperform. Furthermore even if a system were able .