tailieunhanh - Calvin: Fast Distributed Transactions for Partitioned Database Systems
In classical data warehousing terms, organizing data is called data integration. Because there is such a high volume of big data, there is a tendency to organize data at its original storage location, thus saving both time and money by not moving around large volumes of data. The infrastructure required for organizing big data must be able to process and manipulate data in the original storage location; support very high throughput (often in batch) to deal with large data processing steps; and handle a large variety of data formats, from unstructured to structured. Apache Hadoop is a new. | Calvin Fast Distributed Transactions for Partitioned Database Systems Alexander Thomson Yale University thomson@ Thaddeus Diamond Yale University diamond@ Shu-Chun Weng Yale University scweng@ Kun Ren Yale University kun@ Philip Shao Yale University shao-philip@ Daniel J. Abadi Yale University dna@ ABSTRACT Many distributed storage systems achieve high data access throughput via partitioning and replication each system with its own advantages and tradeoffs. In order to achieve high scalability however today s systems generally reduce transactional support disallowing single transactions from spanning multiple partitions. Calvin is a practical transaction scheduling and data replication layer that uses a deterministic ordering guarantee to significantly reduce the normally prohibitive contention costs associated with distributed transactions. Unlike previous deterministic database system prototypes Calvin supports disk-based storage scales near-linearly on a cluster of commodity machines and has no single point of failure. By replicating transaction inputs rather than effects Calvin is also able to support multiple consistency levels including Paxos-based strong consistency across geographically distant replicas at no cost to transactional throughput. Categories and Subject Descriptors Distributed Systems Distributed databases Database Management Systems concurrency distributed databases transaction processing General Terms Algorithms Design Performance Reliability Keywords determinism distributed database systems replication transaction processing Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise to republish to post on servers or to .
đang nạp các trang xem trước