tailieunhanh - Báo cáo khoa học: "Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition"

Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. | Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition Partha Pratim Talukdar Search Labs Microsoft Research Mountain View CA 94043 partha@ Fernando Pereira Google Inc. Mountain View CA 94043 pereira@ Abstract Graph-based semi-supervised learning SSL algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area. 1 Introduction Traditionally named-entity recognition NER has focused on a small number of broad classes such as person location organization. However those classes are too coarse to support important applications such as sense disambiguation semantic matching and textual inference in Web search. For those tasks we need a much larger inventory of specific classes and accurate classification of terms into those classes. While supervised learning methods perform well for traditional NER they are impractical for fine-grained classification because sufficient labeled data to train classifiers for all the classes is unavailable and would be very expensive to obtain. Research carried out while at the University of Pennsylvania Philadelphia PA USA. To overcome these difficulties seed-based information extraction methods have been developed over the years Hearst 1992 Riloff and Jones 1999 Etzioni et al. 2005 Talukdar et al. 2006 Van Durme and

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.