tailieunhanh - Báo cáo khoa học: "Heterogeneous Transfer Learning for Image Clustering via the Social Web"
In this paper, we present a new learning scenario, heterogeneous transfer learning, which improves learning performance when the data can be in different feature spaces and where no correspondence between data instances in these spaces is provided. In the past, we have classified Chinese text documents using English training data under the heterogeneous transfer learning framework. In this paper, we present image clustering as an example to illustrate how unsupervised learning can be improved by transferring knowledge from auxiliary heterogeneous data obtained from the social Web. . | Heterogeneous Transfer Learning for Image Clustering via the Social Web Qiang Yang Hong Kong University of Science and Technology Clearway Bay Kowloon Hong Kong qyang@ Yuqiang Chen Gui-Rong Xue Wenyuan Dai Yong Yu Shanghai Jiao Tong University 800 Dongchuan Road Shanghai 200240 China yuqiangchen grxue dwyak yyu @ Abstract In this paper we present a new learning scenario heterogeneous transfer learning which improves learning performance when the data can be in different feature spaces and where no correspondence between data instances in these spaces is provided. In the past we have classified Chinese text documents using English training data under the heterogeneous transfer learning framework. In this paper we present image clustering as an example to illustrate how unsupervised learning can be improved by transferring knowledge from auxiliary heterogeneous data obtained from the social Web. Image clustering is useful for image sense disambiguation in query-based image search but its quality is often low due to imagedata sparsity problem. We extend PLSA to help transfer the knowledge from social Web data which have mixed feature representations. Experiments on image-object clustering and scene clustering tasks show that our approach in heterogeneous transfer learning based on the auxiliary data is indeed effective and promising. 1 Introduction Traditional machine learning relies on the availability of a large amount of data to train a model which is then applied to test data in the same feature space. However labeled data are often scarce and expensive to obtain. Various machine learning strategies have been proposed to address this problem including semi-supervised learning Zhu 2007 domain adaptation Wu and Diet-terich 2004 Blitzer et al. 2006 Blitzer et al. 2007 Arnold et al. 2007 Chan and Ng 2007 Daume 2007 Jiang and Zhai 2007 Reichart and Rappoport 2007 Andreevskaia and Bergler 2008 multi-task learning Caruana 1997 Re-ichart et al. .
đang nạp các trang xem trước