tailieunhanh - Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications

Although the basic approach appears to be simple, devel- oping a comprehensive solution in which devices rely on the geo-location service to determine white spaces availability is non-trivial. This is because our design introduces some unique challenges that must be overcome. A key issue in a network such as SenseLess is that it must continue to afford the same protection to incumbents as spectrum sensing would. This challenge coupled with all WSDs having to rely on a database to discover white spaces is a significant departure from conventional network designs. Hence, architecting such a network raises the following chal- lenges: TV detection: Relying solely on a geo-location service to detect TVs may result. | Integrating Association Rule Mining with Relational Database Systems Alternatives and Implications Sunita Sarawagi Shiby Thomas Rakesh Agrawal sunita@ sthomas@ ragrawal@ IBM Almaden Research Center 650 Harry Road San Jose CA 95120 Abstract Data mining on large data warehouses is becoming increasingly important. In support of this trend we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include loosecoupling through a SQL cursor interface encapsulation of a mining algorithm in a stored procedure caching the data to a file system on-the-fly and mining tight-coupling using primarily user-defined functions and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions SQL-OR . Our evaluation of the different architectural alternatives shows that from a performance perspective the Cache-Mine option is superior although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization development ease portability and inter-operability. 1 Introduction An ever increasing number of organizations are installing large data warehouses using relational database technology. There is a huge demand for mining nuggets of knowledge from these data warehouses. The initial research on data mining was concentrated on defining new mining operations and developing algorithms for them. Most early mining