tailieunhanh - Báo cáo khoa học: "Open Information Extraction using Wikipedia"

Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall. . | Open Information Extraction using Wikipedia Fei Wu University of Washington Seattle WA USA wufei@ Daniel S. Weld University of Washington Seattle WA USA weld@ Abstract Information-extraction IE systems seek to distill semantic relations from naturallanguage text but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner on the other hand aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform This paper presents WOE an open IE system which improves dramatically on TextRunner s precision and recall. The key to WOE s performance is a novel form of self-supervised learning for open extractors using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Like TextRunner WOE s extractor eschews lexicalized features and handles an unbounded set of semantic relations. WOE can operate in two modes when restricted to POS tag features it runs as quickly as TextRunner but when set to use dependency-parse features its precision and recall rise even higher. 1 Introduction The problem of information-extraction IE generating relational data from natural-language text has received increasing attention in recent years. A large high-quality repository of extracted tuples can potentially benefit a wide range of NLP tasks such as question answering ontology learning and summarization. The vast majority of IE work uses supervised learning of relationspecific examples. For example the WebKB project Craven et al. 1998 used labeled examples of the courses-taught-by relation to induce rules for identifying additional instances of the relation. While these methods can achieve high precision and recall they are limited by the availability of training data and are unlikely to scale to the thousands of relations found in text on the Web. An .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN