tailieunhanh - Chapter 6: Unsupervised Learning – Clustering
Chapter 6: Unsupervised Learning – Clustering Introduction to unsupervised learning and clustering, Partitional clustering (k-Means algorithm), Hierarchical clustering, Expectation Maximization (EM) algorithm, Incremental Clustering. | Chapter 6 Unsupervised Learning – Clustering Assoc. Prof. Dr. Duong Tuan Anh Faculty of Computer Science and Engineering, HCMC Univ. of Technology 3/2015 Outline 1 Introduction to unsupervised learning and clustering 2 Partitional clustering (k-Means algorithm) 3 Hierarchical clustering 4 Expectation Maximization (EM) algorithm 5 Incremental Clustering 1. Introduction to clustering Clustering is the process of grouping a set of patterns. It generates a partition consisting of groups or clusters from a given collection of patterns. Representations or descriptions of the clusters formed are used in decision making – classification, prediction, outlier detection. A clustering-based classification scheme is very useful in solving large-scale pattern classification problems in data mining. Patterns to be clustered are either labeled or unlabeled. We have: Clustering algorithms which group sets of unlabeled patterns. These types of approaches are so popular that clustering is viewed as | Chapter 6 Unsupervised Learning – Clustering Assoc. Prof. Dr. Duong Tuan Anh Faculty of Computer Science and Engineering, HCMC Univ. of Technology 3/2015 Outline 1 Introduction to unsupervised learning and clustering 2 Partitional clustering (k-Means algorithm) 3 Hierarchical clustering 4 Expectation Maximization (EM) algorithm 5 Incremental Clustering 1. Introduction to clustering Clustering is the process of grouping a set of patterns. It generates a partition consisting of groups or clusters from a given collection of patterns. Representations or descriptions of the clusters formed are used in decision making – classification, prediction, outlier detection. A clustering-based classification scheme is very useful in solving large-scale pattern classification problems in data mining. Patterns to be clustered are either labeled or unlabeled. We have: Clustering algorithms which group sets of unlabeled patterns. These types of approaches are so popular that clustering is viewed as an unsupervised learning of unlabeled patterns. Algorithms which cluster labeled patterns. These types of approaches are practically important and are called supervised clustering. Supervised clustering is helpful in identifying clusters within collections of labeled patterns. Abstractions in the form of cluster representatives/ descriptions which are useful for efficient classification (., in data reduction for classification). Clustering The process of clustering is carried out so that patterns in the same cluster are similar in some sense and patterns in different clusters are dissimilar in a corresponding sense. Figure The Euclidean distance between any two points characterizes similarity: intra-cluster distance is small and inter-cluster distance is large. Centroid and medoid Clustering is useful for generating data abstraction. A cluster of points is represented by its centroid or its medoid. A centroid stands for the sample mean of the points in cluster C; it is .
đang nạp các trang xem trước