tailieunhanh - Báo cáo khoa học: "A Large Multilingual Lexical Knowledge Base"

We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names. . | UWN A Large Multilingual Lexical Knowledge Base Gerard de Melo ICSI Berkeley demelo@ Gerhard Weikum Max Planck Institute for Informatics weikum@ Abstract We present UWN a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships frame-semantic knowledge and language data. An online interface provides human access to the data while a software API enables applications to look up over 16 million words and names. 1 Introduction Semantic knowledge about words and named entities is a fundamental building block both in various forms of language technology as well as in enduser applications. Examples of the latter include word processor thesauri online dictionaries question answering and mobile services. Finding semantically related words is vital for query expansion in information retrieval Gong et al. 2005 database schema matching Madhavan et al. 2001 sentiment analysis Godbole et al. 2007 and ontology mapping Jean-Mary and Kabuka 2008 . Further uses of lexical knowledge include data cleaning Kedad and Métais 2002 visual object recognition Marszalek and Schmid 2007 and biomedical data analysis Rubin and others 2006 . Many of these applications have used English-language resources like WordNet Fellbaum 1998 . 151 However a more multilingual resource equipped with an easy-to-use API would not only enable us to perform all of the aforementioned tasks in additional languages but also to explore cross-lingual applications like cross-lingual IR Etzioni et al. 2007 and machine translation Chatterjee et al. 2005 . This paper describes a new API that makes lexical knowledge about millions of items in over 200 languages .

TỪ KHÓA LIÊN QUAN