tailieunhanh - When Database Systems Meet the Grid

Lastly, there is a new form of science emerging. Each scientific discipline is generating huge data volumes, for example, from accelerators (physics), telescopes (astronomy), remote sensors (earth sciences), and DNA microarrays (biology). Simulations are also generating massive datasets. Organizing, analyzing and summarizing these huge scientific datasets stands as a real DBMS challenge. So is the positioning and transfer of data. | When Database Systems Meet the Grid María A. Nieto-Santisteban Alexander S. Szalay Aniruddha R. Thakar William J. O Mullane Johns Hopkins University Jim Gray Microsoft Research James Annis Experimental Astrophysics Fermilab August 2004 Revised December 2004 Technical Report MSR-TR-2004-81 Microsoft Research Microsoft Corporation One Microsoft Way Redmond WA 98052 1 When Database Systems Meet the Grid A ĨqfÍq a NTiQ QTTÍÌ n1 Ti TV CrfGXT2 Ạ 1 avQ Q Tdl aw1 TaTVIỂ C A nnic3 A T ifl 1 a D T tialz av1 Mana A. INietU-oalltiSteUall Jim Gray Alexander. S. Szalay James Alims Aimuddna R. iUaKal and William J. O Mullane1 1. Johns Hopkins University Baltimore MD USA 2. Microsoft Research San Francisco CA USA 3. Experimental Astrophysics Fermilab Batavia IL USA nieto szalay thakar womullan@ gray@ annis@ Abstract We illustrate the benefits of combining database systems and Grid technologies for data-intensive applications. Using a cluster of SQL servers we reimplemented an existing Grid application that finds galaxy clusters in a large astronomical database. The SQL implementation runs an order of magnitude faster than the earlier Tcl-C-file-based implementation. We discuss why and how Grid applications can take advantage of database systems. Keywords Very Large Databases Grid Applications Data Grids e-Science Virtual Observatory. 1. Introduction Science faces a data avalanche. Breakthroughs in instruments detector and computer technologies are creating multi-Terabyte data archives in many disciplines. Analysis of all this information requires resources that no single institution can afford to provide. In response to this demand Grid computing has emerged as an important research area differentiated from clusters and distributed computing. Many definitions of the Grid and Grid systems have been given 17 . In the context of this paper we think of the Grid as the infrastructure and set of protocols that enable the integrated collaborative use of .