tailieunhanh - Báo cáo khoa học: "Distant supervision for relation extraction without labeled data"
Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACEstyle algorithms, and allowing the use of corpora of any size. Our experiments use Freebase, a large semantic database of several thousand relations, to provide distant supervision. For each pair of entities that appears in some Freebase relation, we find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. . | Distant supervision for relation extraction without labeled data Mike Mintz Steven Bills Rion Snow Dan Jurafsky Stanford University Stanford CA 94305 mikemintz sbills rion jurafsky @ Abstract Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora avoiding the domain dependence of ACEstyle algorithms and allowing the use of corpora of any size. Our experiments use Freebase a large semantic database of several thousand relations to provide distant supervision. For each pair of entities that appears in some Freebase relation we find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. Our algorithm combines the advantages of supervised IE combining 400 000 noisy pattern features in a probabilistic classifier and unsupervised IE extracting large numbers of relations from large corpora of any domain . Our model is able to extract 10 000 instances of 102 relations at a precision of . We also analyze feature performance showing that syntactic parse features are particularly helpful for relations that are ambiguous or lexically distant in their expression. 1 Introduction At least three learning paradigms have been applied to the task of extracting relational facts from text for example learning that a person is employed by a particular organization or that a geographic entity is located in a particular region . In supervised approaches sentences in a corpus are first hand-labeled for the presence of entities and the relations between them. The NIST Automatic Content Extraction ACE RDC 2003 and 2004 corpora for example include over 1 000 documents in which pairs of entities have been labeled with 5 to 7 major relation types and 23 to 24 subrelations totaling 16 771 relation instances. ACE systems then extract a wide variety .
đang nạp các trang xem trước