Portability and performance for parallel processing
A portable high-performance program cannot be oblivious to the machine on which it runs. And yet unless software portability increases, parallel processing will remain too expensive to deliver its performance promises. Portability of software and performance of hardware and communication are the significant indicators of the success of parallelism in computing. And these two indicators are often in conflict. Bringing together in one volume different approaches to solving this tension will give insights to developers on both sides of the software/hardware divide, and provides a critical debate for those measuring, modelling and investing in high performance technology.
18 pages matching cache misses in this book
Results 1-3 of 18
What people are saying - Write a review
We haven't found any reviews in the usual places.
An Architecture Independent Programming Model for
from Theory to Practice
A Practical SharedMemory Programming Perfor
6 other sections not shown
achieve address space all-to-all annotations applications arithmetic bandwidth benchmark bitonic sort block broadcast cache misses CICO CMSSL communication compiler complex Connection Machine Connection Machine systems cost data distribution data parallel data structures dependence dot product efficient elements ensemble equations example execution Figure Fortran function global grid hardware high performance High Performance Fortran hypercube implementation inner loop iPSC iteration Jeanne Ferrante layout Lennart Johnsson locality of reference LogP main memory matrix measured memory access memory hierarchy message-passing Mflop/s MIMD molecules multiplication multiprocessors node number of processors operands operations optimal Orca parallel algorithms parallel computing parallel programs parameters PARMACS partitioning Phase Abstractions PIPS portable PRAM prediction prediction interval problem programming languages programming model radix sort routines sample sort scalable scalar sequence shared shared-memory specific speedup statement subthreads superthread Table thread values variables vector