Đang chuẩn bị liên kết để tải về tài liệu:
Parallel Programming: for Multicore and Cluster Systems- P20
Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Parallel Programming: for Multicore and Cluster Systems- P20: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | 182 4 Performance Analysis of Parallel Programs r k ck 12 aj bj j r k-1 1 so that processor Pk stores value ck. To get the final result c p 1 ck a singleaccumulation operation is performed and one of the processors stores this value. The parallel execution time of the implementation depends on the computation time and the communication time. To build a function T p n we assume that the execution of an arithmetic operation needs a time units and that sending a floating-point value to a neighboring processor in the interconnection network needs ft time units. The parallel computation time for the partial scalar product is 2ra since about r addition operations and r multiplication operations are performed. The time for a single-accumulation operation depends on the specific interconnection network and we consider the linear array and the hypercube as examples. See also Sect. 2.5.2 for the definition of these direct networks. 4.4.1.1 Linear Array In the linear array the optimal processor as root node for the single-accumulation operation is the node in the middle since it has a distance no more than p 2 from every other node. Each node gets a value from its left or right neighbor in time ft adds the value to the local value in time a and sends the results to its right or left in the next step. This results in the communication time p a ft . In total the parallel execution time is T p n 2na P a ft . 4.13 p2 The function T p n shows that the computation time decreases with increasing number of processors p but that the communication time increases with increasing number of processors. Thus this function exhibits the typical situation in a parallel program that an increasing number of processors does not necessarily lead to faster programs since the communication overhead increases. Usually the parallel execution time decreases for increasing p until the influence of the communication overhead is too large and then the parallel execution time increases again. The value .