tailieunhanh - Báo cáo khoa học: "Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations"

Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. | Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations Raphael Hoffmann Congle Zhang Xiao Ling Luke Zettlemoyer Daniel S. Weld Computer Science Engineering University of Washington Seattle WA 98195 USA raphaelh clzhang xiaoling lsz weld @ Abstract Information extraction IE holds the promise of generating a large-scale knowledge base from the Web s natural language text. Knowledge-based weak supervision using structured data to heuristically label a training corpus works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling but their models assume relations are disjoint for example they cannot extract the pair Founded Jobs Apple andCEO-of Jobs Apple . This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Free-base. Experiments show that the approach runs quickly and yields surprising gains in accuracy at both the aggregate and sentence level. 1 Introduction Information-extraction IE the process of generating relational data from natural-language text continues to gain attention. Many researchers dream of creating a large repository of high-quality extracted tuples arguing that such a knowledge base could benefit many important tasks such as question answering and summarization. Most approaches to IE 541 use supervised learning of relation-specific examples which can achieve high precision and recall. Unfortunately however fully supervised methods are limited by the availability of training data and are unlikely to scale to the thousands of relations found on the Web. A more promising

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN