tailieunhanh - Database Mining: A Performance Perspective

This paper examines the memory system behavior of database management systems on simultaneous multi- threaded processors. Simultaneous multithreading (SMT) [4] is an architectural technique in which the processor issues instructions from multiple threads in a single cycle. For scientific workloads, SMT has been shown to substantially increase processor utilization through fine- grained sharing of all processor resources (the fetch and issue logic, the caches, the TLBs, and the functional units) among the executing threads [23]. However, SMT performance on commercial databases is still an open research question, and is of interest for three related rea- sons. First, a database workload is intrinsically multithreaded, providing a natural source of threads for an SMT processor. Second,. | Database Mining A Performance Perspective Rakesh Agrawal Tomasz Imielinski Arun Swami IBM Almaden Research Center 650 Harry Road San Jose CA 95120-6099 Abstract We present our perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology. We describe three classes of database mining problems involving classification associations and sequences and argue that these problems can be uniformly viewed as requiring discovery of rules embedded in massive data. We describe a model and some basic operations for the process of rule discovery. We show how the database mining problems we consider map to this model and how they can be solved by using the basic operations we propose. We give an example of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm not only is efficient in discovering classification rules but also has accuracy comparable to ID3 one of the current best classifiers. Index Terms database mining knowledge discovery classification associations sequences decision trees Current address Computer Science Department Rutgers University New Brunswick NJ 08903 1 1 Introduction Database technology has been used with great success in traditional business data processing. There is an increasing desire to use this technology in new application domains. One such application domain that is likely to acquire considerable significance in the near future is database mining 12 3 5 8 9 11 15 16 18 19 . An increasing number of organizations are creating ultra large data bases measured in gigabytes and even terabytes of business data such as consumer data transaction histories sales records etc. Such data forms a potential gold mine of valuable business information. Unfortunately the database systems of today offer little functionality to support such mining applications. At the same time statistical and machine learning techniques usually perform poorly when