tailieunhanh - Chapter04-ILP3

| Computer Architecture Chapter 4 Instruction-Level Parallelism - 3 Prof. Jerry Breecher CS 240 Fall 2003 Chap. 4 - ILP 3 Chapter Overview Compiler Techniques for Exposing ILP Static Branch Prediction Static Multiple Issue: VLIW Advanced Compiler Support for ILP Hardware Support for Exposing more Parallelism Chap. 4 - ILP 3 Ideas To Reduce Stalls Chapter 3 Chapter 4 Chap. 4 - ILP 3 Instruction Level Parallelism Compiler Techniques for Exposing ILP Static Multiple Issue: VLIW Advanced Compiler Support for ILP Hardware Support for Exposing more Parallelism How can compilers recognize and take advantage of ILP? Chap. 4 - ILP 3 Simple Loop and its Assembler Equivalent for (i=1; i Chapter Overview Compiler Techniques for Exposing ILP Static Branch Prediction Static Multiple Issue: VLIW Advanced Compiler Support for ILP Hardware Support for Exposing more Parallelism Chap. 4 - ILP 3 Ideas To Reduce Stalls Chapter 3 Chapter 4 Chap. 4 - ILP 3 Instruction Level Parallelism Compiler Techniques for Exposing ILP Static Multiple Issue: VLIW Advanced Compiler Support for ILP Hardware Support for Exposing more Parallelism How can compilers recognize and take advantage of ILP? Chap. 4 - ILP 3 Simple Loop and its Assembler Equivalent for (i=1; i Loop: LD F0,0(R1) ;F0=vector element ADDD F4,F0,F2 ;add scalar in F2 SD 0(R1),F4 ;store result SUBI R1,R1,8 ;decrement pointer 8B (DW) BNEZ R1,Loop ;branch R1!=zero NOP ;delayed branch slot FP Loop Hazards Compilers and ILP Pipeline Scheduling and Loop Unrolling Instruction Instruction Latency in producing result using result clock cycles FP ALU op Another FP ALU op 3 FP ALU op Store double 2 Load double FP ALU op 1 Load double Store double 0 Integer op Integer op 0 Where are the stalls? Chap. 4 - ILP 3 FP Loop Showing Stalls 10 clocks: Rewrite code to minimize stalls? 1 Loop: LD F0,0(R1) ;F0=vector element 2 stall 3 ADDD F4,F0,F2 ;add scalar in F2 4 stall 5 stall 6 SD 0(R1),F4 ;store result 7 SUBI R1,R1,8 ;decrement pointer 8Byte (DW) 8 stall 9 BNEZ R1,Loop ;branch R1!=zero 10 stall ;delayed branch slot Compilers and ILP Pipeline Scheduling and Loop Unrolling Instruction Instruction Latency in producing .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG