tailieunhanh - Data Mining and Knowledge Discovery Handbook, 2 Edition part 61

Data Mining and Knowledge Discovery Handbook, 2 Edition part 61. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 580 Daniel Barbara and Ping Chen can grow considerably specially for high-dimensionality data sets. However we need only to save boxes for which there is any population of points . empty boxes are not needed. The number of populated boxes at that level is in practical data sets considerably smaller that is precisely why clusters are formed in the first place . Let us denote by B the number of populated boxes in level L. Notice that B is likely to remain very stable throughout passes over the incremental step. Every time a point is assigned to a cluster we register that fact in a table adding a row that maps the cluster membership to the point identifier rows of this table are periodically saved to disk each cluster into a file freeing the space for new rows . The array of layers is used to drive the computation of the fractal dimension of the cluster using a box-counting algorithm. In particular we chose to use FD3 Sarraille and DiFalco 2004 an implementation of a box counting algorithm based on the ideas described in Liebovitch and Toth 1989 . Reshaping Clusters in Mid-Flight It is possible that the number and form of the clusters may change after having processed a set of data points using the step of Figure . This may occur because the data used in the initialization step does not accurately reflect the true distribution of the overall data set or because we are clustering an incoming stream of data whose distribution changes over time. There are two basic operations that can be performed splitting a cluster and merging two or more clusters into one. A good indication that a cluster may need to be split is given by how much the fractal dimension of the cluster has changed since its inception during the initialization step. This information is easy to keep and does not occupy much space. A large change may indicate that the points inside the cluster do not belong together. Notice that these points were included in that cluster because it was the .

TỪ KHÓA LIÊN QUAN