Hardware and Software Mechanisms for Reducing Load Latency

University of Wisconsin--Madison, 1996 - Computer architecture - 372 pages

Abstract: "As processor demands quickly outpace memory, the performance of load instructions becomes an increasingly critical component to good system performance. This thesis contributes four novel load latency reduction techniques, each targeting a different component of load latency: address calculation, data cache access, address translation, and data cache misses. The contributed techniques are as follows: Fast Address Calculation employs a stateless set index predictor to allow address calculation to overlap with data cache access. The design eliminates the latency of address calculation for many loads. Zero-Cycle Loads combine fast address calculation with an early-issue mechanism to produce pipeline designs capable of hiding the latency of many loads that hit in the data cache. High-Bandwidth Address Translation develops address translation mechanisms with better latency and area characteristics than a multi-ported TLB. The new designs provide multiple-issue processors with effective alternatives for keeping address translation off the critical path of data cache access. Cache-conscious Data Placement is a profile- guided data placement optimization for reducing the frequency of data cache misses. The approach employs heuristic algorithms to find variable placement solutions that decrease inter-variable conflict, and increase cache line utilization and block prefetch. Detailed design descriptions and experimental evaluations are provided for each approach, confirming the designs as cost-effective and practical solutions for reducting load latency."

From inside the book

24 pages matching cache block in this book

Where's the rest of this book?

Results 1-3 of 24

HighBandwidth Address Translation	10

Experimental Framework	13

Fast Address Calculation	19

10 other sections not shown

Common terms and phrases

addition address translation alignment allocation approach architecture array Assembler Format bandwidth bank baseline better block branch BRIC byte cache misses cache performance cache-conscious data placement Chapter codes compiler complete Computer cycle data cache data cache access decode stage designs detailed effective address eliminating entry execution experiments fast address calculation fetch Figure FPALIGN function global heap impact implementation improve in-order issue increased Instruction Format integer International less limited load latency mechanism memory mode offset Opcode operations optimizations path performance piggybacked pipeline placed placement algorithm pointer portion ports possible prediction pretranslation Proceedings programs provides rates reducing reference register values replacement requests result running Semantics shown shows simulator single software support speculative speedups stack stack pointer Table technique tolerating variable placement virtual WRITE zero-cycle loads

Bibliographic information

Title	Hardware and Software Mechanisms for Reducing Load Latency Volume 1311 of Computer Sciences technical report, University of Wisconsin--Madison Computer Sciences Department
Author	Todd M. Austin
Publisher	University of Wisconsin--Madison, 1996
Original from	the University of Wisconsin - Madison
Digitized	Jun 17, 2008
Length	372 pages

Export Citation	BiBTeX EndNote RefMan

About Google Books - Privacy Policy - Terms of Service - Information for Publishers - Report an issue - Help - Google Home

Books

Hardware and Software Mechanisms for Reducing Load Latency

From inside the book

Contents

Common terms and phrases

Bibliographic information