Hardware and Software Mechanisms for Reducing Load Latency

Front Cover
University of Wisconsin--Madison, 1996 - Computer storage devices - 372 pages
Abstract: "As processor demands quickly outpace memory, the performance of load instructions becomes an increasingly critical component to good system performance. This thesis contributes four novel load latency reduction techniques, each targeting a different component of load latency: address calculation, data cache access, address translation, and data cache misses. The contributed techniques are as follows: Fast Address Calculation employs a stateless set index predictor to allow address calculation to overlap with data cache access. The design eliminates the latency of address calculation for many loads. Zero-Cycle Loads combine fast address calculation with an early-issue mechanism to produce pipeline designs capable of hiding the latency of many loads that hit in the data cache. High-Bandwidth Address Translation develops address translation mechanisms with better latency and area characteristics than a multi-ported TLB. The new designs provide multiple-issue processors with effective alternatives for keeping address translation off the critical path of data cache access. Cache-conscious Data Placement is a profile- guided data placement optimization for reducing the frequency of data cache misses. The approach employs heuristic algorithms to find variable placement solutions that decrease inter-variable conflict, and increase cache line utilization and block prefetch. Detailed design descriptions and experimental evaluations are provided for each approach, confirming the designs as cost-effective and practical solutions for reducting load latency."

From inside the book

What people are saying - Write a review

We haven't found any reviews in the usual places.


Experimental Framework
Fast Address Calculation
ZeroCycle Loads

6 other sections not shown

Common terms and phrases

Bibliographic information