tailieunhanh - Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems
SAS functions, SAS formats, the SAS DATA step language, and many SAS procedures (which are software modules that encapsulate a related set of functions—similar to the concept of an SQL stored procedure) are used in data preparation processes. These processes filter, aggregate, denormalize, and sample large tables in a variety of business processes. Of particular interest, with respect to running SAS inside the database, is the process of building analytic data marts, which are commonly built by analyzing tables that use the same or similar code repeatedly while changing only a few options or input parameters. The ability to. | Skew-Aware Automatic Database Partitioning in Shared-Nothing Parallel OLTP Systems Andrew Pavlo Brown University pavlo@ Carlo Curino Yahoo Research krl@ Stan Zdonik Brown University sbz@ ABSTRACT The advent of affordable shared-nothing computing systems portends a new class of parallel database management systems DBMS for on-line transaction processing OLTP applications that scale without sacrificing ACID guarantees 7 9 . The performance of these DBMSs is predicated on the existence of an optimal database design that is tailored for the unique characteristics of OLTP workloads 43 . Deriving such designs for modern DBMSs is difficult especially for enterprise-class OLTP systems since they impose extra challenges the use of stored procedures the need for load balancing in the presence of time-varying skew complex schemas and deployments with larger number of partitions. To this purpose we present a novel approach to automatically partitioning databases for enterprise-class OLTP systems that significantly extends the state of the art by 1 minimizing the number distributed transactions while concurrently mitigating the effects of temporal skew in both the data distribution and accesses 2 extending the design space to include replicated secondary indexes 4 organically handling stored procedure routing and 3 scaling of schema complexity data size and number of partitions. This effort builds on two key technical contributions an analytical cost model that can be used to quickly estimate the relative coordination cost and skew for a given workload and a candidate database design and an informed exploration of the huge solution space based on large neighborhood search. To evaluate our methods we integrated our database design tool with a high-performance parallel main memory DBMS and compared our methods against both popular heuristics and a state-of-the-art research prototype 17 . Using a diverse set of benchmarks we show that our .
đang nạp các trang xem trước