Đang chuẩn bị liên kết để tải về tài liệu:
Parallel Programming: for Multicore and Cluster Systems- P19

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Parallel Programming: for Multicore and Cluster Systems- P19: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | 1 72 4 Performance Analysis of Parallel Programs A multi-broadcast operation is also implemented as for the array but in p 2J steps. In the first step each processor sends its message in both directions. In the following steps k 2 k p 2J each processor sends the messages received in the opposite directions. Since the diameter is fp 2 the time p results. Figure 4.3 illustrates a multi-broadcast operation for p 6 processors. Fig. 4.3 Implementation of a multi-broadcast operation on a ring with six nodes. The message sent out by node i is denoted by pi i 1 . 6 The scatter operation also needs time p since it cannot be faster than a single-broadcast operation and it is not slower than a multi-broadcast operation. For a total exchange the ring is divided into two sets of p 2 nodes each for p even . Each node of one of the subsets sends p 2 messages into the other subset across two links. This results in p2 8 time steps since one message needs one time step to be sent along one link. The time is p2 . 4.3.1.5 Mesh For a d-dimensional mesh with p nodes and p nodes in each dimension the diameter is d p 1 d 1 and thus a single-broadcast operation can be executed in time p1 d . For the scatter operation an upper bound is p since a linear array with p nodes can be embedded into the mesh and a scatter operation needs time p on the array. A scatter operation also needs at least time p 1 since p 1 messages have to be sent along the d outgoing links of the root node which takes f p--1 time steps. The time p for the multi-broadcast operation results in a similar way. For the total exchange we consider a mesh with an even number of nodes and subdivide the mesh into two submeshes of dimension d 1 with p 2 nodes each. Each node of a submesh sends p 2 messages into the other submesh which have to be sent over the links connecting both submeshes. These are p d 1 links. Thus at least p dr time steps are needed because of p2 4p d 1 4p d 4p d1 . To show that a total exchange can be .