Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Entity Set Expansion using Topic information"
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper proposes three modules based on latent topics of documents for alleviating “semantic drift” in bootstrapping entity set expansion. These new modules are added to a discriminative bootstrapping algorithm to realize topic feature generation, negative example selection and entity candidate pruning. In this study, we model latent topics with LDA (Latent Dirichlet Allocation) in an unsupervised way. Experiments show that the accuracy of the extracted entities is improved by 6.7 to 28.2% depending on the domain. . | Entity Set Expansion using Topic information Kugatsu Sadamitsu Kuniko Saito Kenji Imamura and Genichiro Kikui NTT Cyber Space Laboratories NTT Corporation 1-1 Hikarinooka Yokosuka-shi Kanagawa 239-0847 Japan sadamitsu.kugatsu saito.kuniko imamura.kenji @lab.ntt.co.jp kikui@cse.oka-pu.ac.jp Abstract This paper proposes three modules based on latent topics of documents for alleviating semantic drift in bootstrapping entity set expansion. These new modules are added to a discriminative bootstrapping algorithm to realize topic feature generation negative example selection and entity candidate pruning. In this study we model latent topics with LDA Latent Dirichlet Allocation in an unsupervised way. Experiments show that the accuracy of the extracted entities is improved by 6.7 to 28.2 depending on the domain. 1 Introduction The task of this paper is entity set expansion in which the lexicons are expanded from just a few seed entities Pantel et al. 2009 . For example the user inputs a few words Apple Google and IBM and the system outputs Microsoft Facebook and Intel . Many set expansion algorithms are based on bootstrapping algorithms which iteratively acquire new entities. These algorithms suffer from the general problem of semantic drift . Semantic drift moves the extraction criteria away from the initial criteria demanded by the user and so reduces the accuracy of extraction. Pantel and Pennacchiotti 2006 proposed Espresso a relation extraction method based on the co-training bootstrapping algorithm with entities and attributes. Espresso alleviates semantic-drift by a sophisticated scoring system based on Presently with Okayama Prefectural University 726 pointwise mutual information PMI . Thelen and Riloff 2002 Ghahramani and Heller 2005 and Sarmento et al. 2007 also proposed original score functions with the goal of reducing semantic-drift. Our purpose is also to reduce semantic drift. For achieving this goal we use a discriminative method instead of a scoring .