tailieunhanh - Hardware Acceleration of EDA Algorithms- P9

Hardware Acceleration of EDA Algorithms- P9: Single-threaded software applications have ceased to see significant gains in performance on a general-purpose CPU, even with further scaling in very large scale integration (VLSI) technology. This is a significant problem for electronic design automation (EDA) applications, since the design complexity of VLSI integrated circuits (ICs) is continuously growing. In this research monograph, we evaluate custom ICs, field-programmable gate arrays (FPGAs), and graphics processors as platforms for accelerating EDA algorithms, instead of the general-purpose singlethreaded CPU | Our Approach 143 Now by definition CD k CD i D i k CD j D j k and CD i CD a D a i CD b D b i From the first property discussed for CD CD a FD a s-a-0 a 1010 and by definition CD b 0000. By substitution and similarly computing CD i and CD j we compute CD k 0010. The implementation of the computation of detectabilities and cumulative detectabilities in FSIM and GFTABLE is different since in GFTABLE all computations for computing detectabilities and cumulative detectabilities are done on the GPU with every kernel executed on the GPU launched with T threads. Thus a single kernel in GFTABLE computes T times more data compared to the corresponding computation in FSIM . In FSIM the backtracing is performed in a topological manner from the output of the FFR to its inputs and is not scheduled for gates driving zero critical lines in the packet. We found that this pruning reduces the number of gate evaluations by 42 in FSIM based on tests run on four benchmark circuits . In GFTABLE however T times more patterns are evaluated at once and as a result no reduction in the number of scheduled gate evaluations were observed for the same four benchmarks. Hence in GFTABLE we perform a brute-force backtracing on all gates in an FFR. As an example the pseudocode of the kernel which evaluates the cumulative detectability at output k of a 2-input gate with inputs i and j is provided in Algorithm 11. The arguments to the kernel are the pointer to global memory CD where cumulative detectabilities are stored pointer to global memory D where detectabilities to the immediate dominator are stored the gate_id of the gate being evaluated k and its two inputs i and j . Let the thread s unique threadID be tx. The data in CD and D indexed at a location tx i x T and tx j x T and the result computed as per CD k CD i D i k CD j D j k is stored in CD indexed at location tx k x T . Our implementation has a similar kernel for 2- 3- and 4-input gates in our library. Algorithm 11 Pseudocode of the .