tailieunhanh - Báo cáo khoa học: "Unsupervised Relation Disambiguation Using Spectral Clustering"

This paper presents an unsupervised learning approach to disambiguate various relations between name entities by use of various lexical and syntactic features from the contexts. It works by calculating eigenvectors of an adjacency graph’s Laplacian to recover a submanifold of data from a high dimensionality space and then performing cluster number estimation on the eigenvectors. Experiment results on ACE corpora show that this spectral clustering based approach outperforms the other clustering methods. . | Unsupervised Relation Disambiguation Using Spectral Clustering Jinxiu Chen1 Donghong Ji1 Chew Lim Tan2 Zhengyu Niu1 institute for Infocomm Research 2Department of Computer Science 21 Heng Mui Keng Terrace National University of Singapore 119613 Singapore 117543 Singapore jinxiu dhji zniu @ tancl@ Abstract This paper presents an unsupervised learning approach to disambiguate various relations between name entities by use of various lexical and syntactic features from the contexts. It works by calculating eigenvectors of an adjacency graph s Laplacian to recover a submanifold of data from a high dimensionality space and then performing cluster number estimation on the eigenvectors. Experiment results on ACE corpora show that this spectral clustering based approach outperforms the other clustering methods. 1 Introduction In this paper we address the task of relation extraction which is to find relationships between name entities in a given context. Many methods have been proposed to deal with this task including supervised learning algorithms Miller et al. 2000 Zelenko et al. 2002 Culotta and Soresen 2004 Kambhatla 2004 Zhou et al. 2005 semi-supervised learning algorithms Brin 1998 Agichtein and Gravano 2000 Zhang 2004 and unsupervised learning algorithm Hasegawa et al. 2004 . Among these methods supervised learning is usually more preferred when a large amount of labeled training data is available. However it is time-consuming and labor-intensive to manually tag a large amount of training data. Semi-supervised learning methods have been put forward to minimize the corpus annotation requirement. Most of semi-supervised methods employ the bootstrapping framework which only need to pre-define some initial seeds for any particular relation and then bootstrap from the seeds to acquire the relation. However it is often quite difficult to enumerate all class labels in the initial seeds and decide an optimal number of them. Compared with .

TÀI LIỆU MỚI ĐĂNG