tailieunhanh - Báo cáo hóa học: " Cluster Structure Inference Based on Clustering Stability with Applications to Microarray Data Analysis"

Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Cluster Structure Inference Based on Clustering Stability with Applications to Microarray Data Analysis | EURASIP Journal on Applied Signal Processing 2004 1 64-80 2004 Hindawi Publishing Corporation Cluster Structure Inference Based on Clustering Stability with Applications to Microarray Data Analysis Ciprian Doru Giurcaneanu Institute of Signal Processing Tampere University of Technology . Box 553 FIN-33101 Tampere Finland Email cipriand@ Ioan Tabus Institute of Signal Processing Tampere University of Technology . Box 553 FIN-33101 Tampere Finland Email tabus@ Received 28 February 2003 Revised 7 July 2003 This paper focuses on the stability-based approach for estimating the number of clusters K in microarray data. The cluster stability approach amounts to performing clustering successively over random subsets of the available data and evaluating an index which expresses the similarity of the successive partitions obtained. We present a method for automatically estimating K by starting from the distribution of the similarity index. We investigate how the selection of the hierarchical clustering HC method respectively the similarity index influences the estimation accuracy. The paper introduces a new similarity index based on a partition distance. The performance of the new index and that of other well-known indices are experimentally evaluated by comparing the true data partition with the partition obtained at each level of an HC tree. A case study is conducted with a publicly available Leukemia dataset. Keywords and phrases clustering stability number of clusters hierarchical clustering methods similarity indices partitiondistance microarray data. 1. INTRODUCTION The clustering algorithms are frequently used for analyzing the microarray data. While various clustering methods help the practitioner in bioinformatics to ascertain different characteristics in structural organization of microarray datasets the task of selecting the most appropriate algorithm for solving a particular problem is nontrivial. While various clustering methods are .

TÀI LIỆU LIÊN QUAN