tailieunhanh - High Performance Computing on Vector Systems-P3
High Performance Computing on Vector Systems-P3: In March 2005 about 40 scientists from Europe, Japan and the US came together the second time to discuss ways to achieve sustained performance on supercomputers in the range of Teraflops. The workshop held at the High Performance Computing Center Stuttgart (HLRS) was the second of this kind. The first one had been held in May 2004. | Over 10 TFLOPS Eigensolver on the Earth Simulator 53 Table 1. Hardware configuration and the best performed applications of the ES at March 2005 The number of nodes 640 8PE s node total 5120PE s PE VU Mul Add x8pipes Superscalar unit Main memory bandwidth 10TB 16GB node 256GB s node Interconnection Metal-cable Crossbar s 1way Theoretical peak performance 64GFLOPS node 8GFLOPS PE Linpack TOP500 List of the peak 7 The fastest real application of the peak 8 Complex number calculation mainly FFT Our goal Over 10TFLOPS of the peak 9 Real number calculation Numerical algebra 3 Numerical Algorithms The core of our program is to calculate the smallest eigenvalue and the corresponding eigenvector for Hv Av where the matrix is real and symmetric. Several iterative numerical algorithms . the power method the Lanczos method the conjugate gradient method CG and so on are available. Since the ES is public resource and a use of hundreds of nodes is limited the most effective algorithm must be selected before large-scale simulations. Lanczos Method The Lanczos method is one of the subspace projection methods that creates a Krylov sequence and expands invariant subspace successively based on the procedure of the Lanczos principle 10 see Fig. 1 a . Eigenvalues of the projected invariant subspace well approximate those of the original matrix and the subspace can be represented by a compact tridiagonal matrix. The main recurrence part of this algorithm repeats to generate the Lanczos vector vi 1 from Vj-i and Vj as seen in Fig. 1 a . In addition an N-word buffer is required for storing an eigenvector. Therefore the memory requirement is 3N words. As shown in Fig 1 a the number of iterations depends on the input matrix however it is usually fixed by a constant number m. In the following we choose a smaller empirical fixed number . 200 or 300 as an iteration count. Preconditioned Conjugate Gradient Method .
đang nạp các trang xem trước