A Data Flow Approach to Thread Caching |
Contents
Hardware Model | 4 |
Architecture for Latency Tolerance | 5 |
Issues in Coarse Grain Dataflow on RISC MultiProcessors 22 2 2 2 | 22 |
Copyright | |
4 other sections not shown
Common terms and phrases
access patterns Address Bus algorithm Allocation applications block branch delay slot cache line cache miss coarse grain column compiler Computer Architecture Conference on Parallel context switch cost data cache data structure dataflow approach dataflow graphs datum DAXPY elements fetch operations Figure FORTRAN full/empty bit global granularity Iannucci IEEE implementation indirect branch initial instruction streams iteration long latency loop index loop slicing macropipeline main memory matrix multiplication memory access memory latency memory location memory references method microprocessor multiprocessor multiprocessor system multithreading Neumann object cache operands overhead Parallel Processing partitioning pipeline Prefetch Unit primary cache problem processor node proposed register windows remote memory RISC routine scheduling Section set of threads shared memory simulator stalled Supercomputer superscalar synchronization event synchronization point Synchronization Storage Buffer synchronization variable task techniques thread execution thread management unit TMU port vector von Neumann architecture