tailieunhanh - Báo cáo khoa học: "Learning to Extract Relations from the Web using Minimal Supervision"

We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation, bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision, and present experimental results demonstrating that our approach can reliably extract relations from web documents. | Learning to Extract Relations from the Web using Minimal Supervision Razvan C. Bunescu Department of Computer Sciences University of Texas at Austin 1 University Station C0500 Austin TX 78712 razvan@ Raymond J. Mooney Department of Computer Sciences University of Texas at Austin 1 University Station C0500 Austin TX 78712 mooney@ Abstract We present a new approach to relation extraction that requires only a handful of training examples. Given a few pairs of named entities known to exhibit or not exhibit a particular relation bags of sentences containing the pairs are extracted from the web. We extend an existing relation extraction method to handle this weaker form of supervision and present experimental results demonstrating that our approach can reliably extract relations from web documents. 1 Introduction A growing body of recent work in information extraction has addressed the problem of relation extraction RE identifying relationships between entities stated in text such as LivesIn Person Location or EmployedBy Person Company . Supervised learning has been shown to be effective for RE Zelenko et al. 2003 Culotta and Sorensen 2004 Bunescu and Mooney 2006 however annotating large corpora with examples of the relations to be extracted is expensive and tedious. In this paper we introduce a supervised learning approach to RE that requires only a handful of training examples and uses the web as a corpus. Given a few pairs of well-known entities that clearly exhibit or do not exhibit a particular relation such as CorpAcquired Google YouTube and not CorpAcquired Yahoo Microsoft a search engine is used to find sentences on the web that mention both of the entities in each of the pairs. 576 Although not all of the sentences for positive pairs will state the desired relationship many of them will. Presumably none of the sentences for negative pairs state the targeted relation. Multiple instance learning MIL is a machine learning framework that .