tailieunhanh - Parallel Programming: for Multicore and Cluster Systems- P12

Parallel Programming: for Multicore and Cluster Systems- P12: Innovations in hardware architecture, like hyper-threading or multicore processors, mean that parallel computing resources are available for inexpensive desktop computers. In only a few years, many standard software products will be based on concepts of parallel programming implemented on such hardware, and the range of applications will be much broader than that of scientific computing, up to now the main application area for parallel computing | Levels of Parallelism 101 The array assignment uses the old values of a 0 n-1 and a 2 n 1 whereas the for loop uses the old value only for a i 1 for a i-1 the new value is used which has been computed in the preceding iteration. Data parallelism can also be exploited for MIMD models. Often the SPMD model Single Program Multiple Data is used which means that one parallel program is executed by all processors in parallel. Program execution is performed asynchronously by the participating processors. Using the SPMD model data parallelism results if each processor gets a part of a data structure for which it is responsible. For example each processor could get a part of an array identified by a lower and an upper bound stored in private variables of the processor. The processor ID can be used to compute for each processor its part assigned. Different data distributions can be used for arrays see Sect. for more details. Figure shows a part of an SPMD program to compute the scalar product of two vectors. In practice most parallel programs are SPMD programs since they are usually easier to understand than general MIMD programs but provide enough expressiveness to formulate typical parallel computation patterns. In principle each processor can execute a different program part depending on its processor ID. Most parallel programs shown in the rest of the book are SPMD programs. Data parallelism can be exploited for both shared and distributed address spaces. For a distributed address space the program data must be distributed among the processors such that each processor can access the data that it needs for its computations directly from its local memory. The processor is then called the owner of its local data. Often the distribution of data and computation is done in the same way such that each processor performs the computations specified in the program on the local_size size p local-lower me local-size local-upper me 1 local_size - 1 local_sum for i .