Đang chuẩn bị liên kết để tải về tài liệu:
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The SAS In-Database technology described in this paper relates to a variety of environments, including grid, blade servers, and event management containers. The focus here is on its application to database management systems. The goal of the SAS In-Database initiative is not only to achieve deeper technical integration with database providers, but to also extend this integration to a unique and differentiated value proposition that blends the best SAS data integration and analytics with the core strengths of databases. This paper outlines the technology areas within the SAS® Intelligence Platform that are suitable for deep DBMS integration and. | HadoopDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads Azza Abouzeid1 Kamil Bajda-Pawlikowski1 Daniel Abadi1 Avi Silberschatz1 Alexander Rasin2 1Yale University 2Brown University azza kbajda dna avi @cs.yale.edu alexr@cs.brown.edu ABSTRACT The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines and moving towards cheaper lower-end commodity hardware typically arranged in a shared-nothing MPP architecture often in a virtualized environment inside public or private clouds . At the same time the amount of data that needs to be analyzed is exploding requiring hundreds to thousands of machines to work in parallel to perform the analysis. There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them well-suited to perform such analysis. On the other hand others argue that MapReduce-based systems are better suited due to their superior scalability fault tolerance and flexibility to handle unstructured data. In this paper we explore the feasibility of building a hybrid system that takes the best features from both technologies the prototype we built approaches parallel databases in performance and efficiency yet still yields the scalability fault tolerance and flexibility of MapReduce-based systems. 1. INTRODUCTION The analytical database market currently consists of 3.98 billion 25 of the 14.6 billion database software market 21 27 and is growing at a rate of 10.3 annually 25 . As business bestpractices trend increasingly towards basing decisions off data and hard facts rather than instinct and theory the corporate thirst for systems that can manage process and granularly analyze data is becoming insatiable. .