tailieunhanh - Báo cáo khoa học: "Learning 5000 Relational Extractors"

Many researchers are trying to use information extraction (IE) to create large-scale knowledge bases from natural language text on the Web. However, the primary approach (supervised learning of relation-specific extractors) requires manually-labeled training data for each relation and doesn’t scale to the thousands of relations encoded in Web text. This paper presents LUCHS, a self-supervised, relation-specific IE system which learns 5025 relations — more than an order of magnitude greater than any previous approach — with an average F1 score of 61%. . | Learning 5000 Relational Extractors Raphael Hoffmann Congle Zhang Daniel S. Weld Computer Science Engineering University of Washington Seattle Wa-98195 UsA raphaelh clzhang weld @ Abstract Many researchers are trying to use information extraction IE to create large-scale knowledge bases from natural language text on the Web. However the primary approach supervised learning of relation-specific extractors requires manually-labeled training data for each relation and doesn t scale to the thousands of relations encoded in Web text. This paper presents LUCHS a self-supervised relation-specific IE system which learns 5025 relations more than an order of magnitude greater than any previous approach with an average F1 score of 61 . Crucial to LUCHS s performance is an automated system for dynamic lexicon learning which allows it to learn accurately from heuristically-generated training data which is often noisy and sparse. 1 Introduction Information extraction IE the process of generating relational data from natural-language text has gained popularity for its potential applications in Web search question answering and other tasks. Two main approaches have been attempted Supervised learning of relation-specific extractors . Freitag 1998 and Open IE self-supervised learning of unlexicalized relation-independent extractors . Textrunner Banko et al. 2007 . Unfortunately both methods have problems. Supervised approaches require manually-labeled training data for each relation and hence can t scale to handle the thousands of relations encoded in Web text. Open extraction is more scalable but has lower precision and recall. Furthermore open extraction doesn t canonicalize relations so any application using the output must deal with homonymy and synonymy. A third approach sometimes refered to as weak supervision is to heuristically match values from a database to text thus generating a set of training data for self-supervised learning of relationspecific .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.