tailieunhanh - Báo cáo khoa học: "Reducing Wrong Labels in Distant Supervision for Relation Extraction"

Machine learning approaches have been developed to address relation extraction, which is the task of extracting semantic relations between entities expressed in text. Supervised approaches are limited in scalability because labeled data is expensive to produce. A particularly attractive approach, called distant supervision (DS), creates labeled data by heuristically aligning entities in text with those in a knowledge base, such as Freebase (Mintz et al., 2009). | Reducing Wrong Labels in Distant Supervision for Relation Extraction Shingo Takamatsu System Technologies Laboratories Sony Corporation 5-1-12 Kitashinagawa Shinagawa-ku Tokyo Issei Sato and Hiroshi Nakagawa Information Technology Center The University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo sato@r. n3@ Abstract In relation extraction distant supervision seeks to extract relations between entities from text by using a knowledge base such as Freebase as a source of supervision. When a sentence and a knowledge base refer to the same entity pair this approach heuristically labels the sentence with the corresponding relation in the knowledge base. However this heuristic can fail with the result that some sentences are labeled wrongly. This noisy labeled data causes poor extraction performance. In this paper we propose a method to reduce the number of wrong labels. We present a novel generative model that directly models the heuristic labeling process of distant supervision. The model predicts whether assigned labels are correct or wrong via its hidden variables. Our experimental results show that this model detected wrong labels with higher performance than baseline methods. In the experiment we also found that our wrong label reduction boosted the performance of relation extraction. 1 Introduction Machine learning approaches have been developed to address relation extraction which is the task of extracting semantic relations between entities expressed in text. Supervised approaches are limited in scalability because labeled data is expensive to produce. A particularly attractive approach called distant supervision DS creates labeled data by heuristically aligning entities in text with those in a knowledge base such as Freebase Mintz et al. 2009 . knowledge base relation entity 1 entity2 place_of_birth Michael Jackson Gary text automatic labeling Michael Jackson was born in Gary . place_of_birth Michael Jackson moved from

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.