tailieunhanh - Decoupled Query Optimization for Federated Database Systems

The concept of an SQL user-defined function (UDF) is important for understanding SAS In-Database processing because it is the very mechanism that enables work from SAS programs to be executed inside the DBMS. A UDF is simply a packaged routine that can be invoked from SQL statements. Most DBMSs provide a framework for packaging code into executable modules, storing them inside the database, and making them available to clients of the DBMS. In short, UDFs make it possible to extend the SQL semantics implemented by the DBMS. More information about how UDFs implement SAS In-Database processing is presented later on. | Decoupled Query Optimization for Federated Database Systems Amol Deshpande Joseph M. Hellerstein Computer Science Division University of California Berkeley amol jmh @ Abstract We study the problem of query optimization in federated relational database systems. The nature of federated databases explicitly decouples many aspects of the optimization process often making it imperative for the optimizer to consult underlying data sources while doing costbased optimization. This not only increases the cost of optimization but also changes the trade-offs involved in the optimization process significantly. The dominant cost in the decoupled optimization process is the cost of costing that traditionally has been considered insignificant. The optimizer can only afford a few rounds of messages to the underlying data sources and hence the optimization techniques in this environment must be geared toward gathering all the required cost information with minimal communication. In this paper we explore the design space for a query optimizer in this environment and demonstrate the need for decoupling various aspects of the optimization process. We present minimum-communication decoupled variants of various query optimization techniques and discuss tradeoffs in their performance in this scenario. We have implemented these techniques in the Cohera federated database system and our experimental results somewhat surprisingly indicate that a simple two-phase optimization scheme performs fairly well as long as the physical database design is known to the optimizer though more aggressive algorithms are required otherwise. 1. Introduction The need for federated database services has increased dramatically in recent years. Within enterprises IT infrastructures are often decentralized as a result of mergers acquisitions and specialized corporate applications resulting in deployment of large federated databases. Perhaps more dramatically the Internet has enabled new .