tailieunhanh - Báo cáo khoa học: "Ensemble Document Clustering Using Weighted Hypergraph Generated by NMF"

In this paper, we propose a new ensemble document clustering method. The novelty of our method is the use of Non-negative Matrix Factorization (NMF) in the generation phase and a weighted hypergraph in the integration phase. In our experiment, we compared our method with some clustering methods. Our method achieved the best results. | Ensemble Document Clustering Using Weighted Hypergraph Generated by NMF Hiroyuki Shinnou Minoru Sasaki Ibaraki University 4-12-1 Nakanarusawa Hitachi Ibaraki Japan 316-8511 shinnou msasaki @ Abstract In this paper we propose a new ensemble document clustering method. The novelty of our method is the use of Non-negative Matrix Factorization NMF in the generation phase and a weighted hypergraph in the integration phase. In our experiment we compared our method with some clustering methods. Our method achieved the best results. 1 Introduction In this paper we propose a new ensemble document clustering method using Non-negative Matrix Factorization NMF in the generation phase and a weighted hypergraph in the integration phase. Document clustering is the task of dividing a document s data set into groups based on document similarity. This is the basic intelligent procedure and is important in text mining systems M. W. Berry 2003 . As the specific application relevant feedback in IR where retrieved documents are clustered is actively researched Hearst and Pedersen 1996 Kummamuru et al. 2004 . In document clustering the document is represented as a vector which typically uses the bag of word model and the TF-IDF term weight. A vector represented in this manner is highly dimensional and sparse. Thus in document clustering a dimensional reduction method such as PCA or SVD is applied before actual clustering Boley et al. 1999 Deerwester et al. 1990 . Dimensional reduction maps data in a high-dimensional space into a 77 low-dimensional space and improves both clustering accuracy and speed. NMF is a dimensional reduction method Xu et al. 2003 that is based on the aspect model used in the Probabilistic Latent Semantic Indexing Hofmann 1999 . Because the axis in the reduced space by NMF corresponds to a topic the reduced vector represents the clustering result. For a given termdocument matrix and cluster number we can obtain the NMF result with an iterative .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.