tailieunhanh - Báo cáo khoa học: "A Method for Word Sense Disambiguation of Unrestricted Text"

Selecting the most appropriate sense for an ambiguous word in a sentence is a central problem in Natural Language Processing. In this paper, we present a method that attempts to disambiguate all the nouns, verbs, adverbs and adjectives in a text, using the senses provided in WordNet. | A Method for Word Sense Disambiguation of Unrestricted Text Rada Mihalcea and Dan I. Moldovan Department of Computer Science and Engineering Southern Methodist University Dallas Texas 75275-0122 rada moldovan @ Abstract Selecting the most appropriate sense for an ambiguous word in a sentence is a central problem in Natural Language Processing. In this paper we present a method that attempts to disambiguate all the nouns verbs adverbs and adjectives in a text using the senses provided in WordNet. The senses are ranked using two sources of information 1 the Internet for gathering statistics for word-word cooccurrences and 2 WordNet for measuring the semantic density for a pair of words. We report an average accuracy of 80 for the first ranked sense and 91 for the first two ranked senses. Extensions of this method for larger windows of more than two words are considered. 1 Introduction Word Sense Disambiguation WSD is an open problem in Natural Language Processing. Its solution impacts other tasks such as discourse reference resolution coherence inference and others. WSD methods can be broadly classified into three types 1. WSD that make use of the information provided by machine readable dictionaries Cowie et al. 1992 Miller et al. 1994 Agừre and Rigau 1995 Li et al. 1995 McRoy 1992 2. WSD that use information gathered from training on a corpus that has aheady been semantically disambiguated supervised training methods Gale et al. 1992 Ng and Lee 1996 3. WSD that use information gathered from raw corpora unsupervised training methods Yarowsky 1995 Resnik 1997 . There are also hybrid methods that combine several sources of knowledge such as lexicon information heuristics collocations and others McRoy 1992 Bruce and Wiebe 1994 Ng and Lee 1996 Rigau et al. 1997 . Statistical methods produce high accuracy results for small number of preselected words. A lack of widely available semantically tagged corpora almost excludes supervised learning methods. A .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.