tailieunhanh - An Efficient Non-Blocking Data Cache for Soft Processors

Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency, simple, blocking caches are used. Such caches are not appropriate for processor designs such as runahead and out-of-order execution that require non-blocking caches to tolerate main memory latencies. Conventional nonblocking caches are expensive and slow on FPGAs as they use content-addressable memories (CAMs). This work exploits key properties of runahead execution and demonstrates an FPGA-friendly non-blocking cache design that does not require CAMs. A non-blocking 4KB cache operates at 329MHz on Stratix III FPGAs while it uses only 270 logic elements. A 32KB non-blocking cache operates at 278Mhz and uses 269 logic elements. | An Efficient Non-Blocking Data Cache for Soft Processors Kaveh Aasaraai and Andreas Moshovos Department of Electrical and Computer Engineering University of Toronto Toronto ON Canada faasaraai moshovosg@ Abstract Soft processors often use data caches to reduce the gap between processor and main memory speeds. To achieve high efficiency simple blocking caches are used. Such caches are not appropriate for processor designs such as runahead and out-of-order execution that require non-blocking caches to tolerate main memory latencies. Conventional nonblocking caches are expensive and slow on FPGAs as they use content-addressable memories CAMs . This work exploits key properties of runahead execution and demonstrates an FPGA-friendly non-blocking cache design that does not require CAMs. A non-blocking 4KB cache operates at 329MHz on Stratix III FPGAs while it uses only 270 logic elements. A 32KB non-blocking cache operates at 278Mhz and uses 269 logic elements. Keywords-Soft Processor Data Cache Non-Blocking Runa-head I. INTRODUCTION Soft processors implemented over reconfigurable logic are increasingly being used in embedded system applications. Historically applications evolve in their computation needs and structure. Embedded applications are not immune to this trend. Accordingly it is likely that soft processors will be called upon to execute applications with unstructured instruction level parallelism. Previous work has shown that for such programs a 1-way OoO processor in an FPGA environment has the potential to outperform a 2- or even a 4-way superscalar processor 1 . Unfortunately conventional OoO processor implementations are tuned for custom logic implementation and rely heavily on content addressable memories multiported register files and wide multi-source and multi-destination datapaths. Such structures exhibit poor efficiency when implemented in an FPGA fabric. It is an open question whether it is possible to design an FPGA-friendly soft .

TỪ KHÓA LIÊN QUAN