tailieunhanh - Báo cáo khoa học: "An API for Measuring the Relatedness of Words in Wikipedia"

The API computes semantic relatedness by: 1. taking a pair of words as input; 2. retrieving the Wikipedia articles they refer to (via a disambiguation strategy based on the link structure of the articles); 3. computing paths in the Wikipedia categorization graph between the categories the articles are assigned to; 4. returning as output the set of paths found, scored according to some measure definition. The implementation includes path-length (Rada et al., 1989; Wu & Palmer, 1994; Leacock & Chodorow, 1998), information-content (Resnik, 1995; Seco et al., 2004) and text-overlap (Lesk, 1986; Banerjee & Pedersen, 2003) measures, as described. | An API for Measuring the Relatedness of Words in Wikipedia Simone Paolo Ponzetto and Michael Strube EML Research gGmbH Schloss-Wolfsbrunnenweg 33 69118 Heidelberg Germany http nlp Abstract 3 The Application Programming Interface We present an API for computing the semantic relatedness of words in Wikipedia. 1 Introduction The last years have seen a large amount of work in Natural Language Processing NLP using measures of semantic similarity and relatedness. We believe that the extensive usage of such measures derives also from the availability of robust and freely available software that allows to compute them Pedersen et al. 2004 WordNet Similarity . In Ponzetto Strube 2006 and Strube Ponzetto 2006 we proposed to take the Wikipedia categorization system as a semantic network which served as basis for computing the semantic relatedness of words. In the following we present the API we used in our previous work hoping that it will encourage further research in NLP using Wikipedia1. 2 Measures of Semantic Relatedness Approaches to measuring semantic relatedness that use lexical resources transform these resources into a network or graph and compute relatedness using paths in it see Budanitsky Hirst 2006 for an extensive review . For instance Rada et al. 1989 traverse MeSH a term hierarchy for indexing articles in Medline and compute semantic relatedness straightforwardly in terms of the number of edges between terms in the hierarchy. Jarmasz Szpakowicz 2003 use the same approach with Ro-get s Thesaurus while Hirst St-Onge 1998 apply a similar strategy to WordNet. The API computes semantic relatedness by 1. taking a pair of words as input 2. retrieving the Wikipedia articles they refer to via a disambiguation strategy based on the link structure of the articles 3. computing paths in the Wikipedia categorization graph between the categories the articles are assigned to 4. returning as output the set of paths found scored according to some measure .

TỪ KHÓA LIÊN QUAN