Compiling Parallel Loops for High Performance Computers: Partitioning, Data Assignment and Remapping

Front Cover
Springer Science & Business Media, Oct 31, 1992 - Computers - 159 pages
4. 2 Code Segments . . . . . . . . . . . . . . . 96 4. 3 Determining Communication Parameters . 99 4. 4 Multicast Communication Overhead 103 4. 5 Partitioning . . . . . . 103 4. 6 Experimental Results . 117 4. 7 Conclusion. . . . . . . 121 5 COLLECTIVE PARTITIONING AND REMAPPING FOR MULTIPLE LOOP NESTS 125 5. 1 Introduction. . . . . . . . . 125 5. 2 Program Enclosure Trees. . 128 5. 3 The CPR Algorithm . . 132 5. 4 Experimental Results. . 141 5. 5 Conclusion. . 146 BIBLIOGRAPHY. 149 INDEX . . . . . . . . 157 LIST OF FIGURES Figure 1. 1 The Butterfly Architecture. . . . . . . . . . 5 1. 2 Example of an iterative data-parallel loop . . 7 1. 3 Contiguous tiling and assignment of an iteration space. 13 2. 1 Communication along a line segment. . . 24 2. 2 Access pattern for the access offset, (3,2). 25 2. 3 Decomposing an access vector along an orthogonal basis set of vectors. . . . . . . . . . . . . . . . . . . 26 2. 4 An analysis of communication patterns. 29 2. 5 Decomposing a vector along two separate basis sets of vectors. 31 2. 6 Cache lines aligning with borders. 33 2. 7 Cache lines not aligned with borders. 34 2. 8 nh is the difference of nd and nb. 42 2. 9 nh is the sum of nd and nb. 42 2. 10 The ADAPT system. 44 2. 11 Code segment used in experiments. . 46 2. 12 Execution rates for various partitions. 47 2. 13 Execution time of partitions on Multimax. 48 2. 14 Performance increase as processing power increases. 49 2. 15 Percentage miss ratios for various aspect ratios and line sizes.
 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Contents

INTRODUCTION
1
11 Model Assumptions
3
12 Related Work
10
13 Overview
13
CONTIGUOUS LOOP PARTITIONS FOR NEIGHBORHOOD COMMUNICATION
19
22 Quantifying Communication
21
23 Compensation for Cache Line Size
31
24 Partition Construction
35
37 Conclusion
90
CYCLIC LOOP PARTITIONS FOR LINEARLY VARYING LOOPS
93
42 Code Segments
96
43 Determining Communication Parameters
99
44 Multicast Communication Overhead
103
46 Experimental Results
117
47 Conclusion
121
COLLECTIVE PARTITIONING AND REMAPPING FOR MULTIPLE LOOP NESTS
125

25 Experimental Evaluation of ADP
44
26 Conclusion
53
CONTIGUOUS DATA ASSIGNMENTS FOR NEIGHBORHOOD COMMUNICATION
55
32 Data Assignments
58
33 Exploiting Overlap
68
34 Software Redundancy
71
35 ADAPT
73
36 Experimental Results
83
52 Program Enclosure Trees
128
53 The CPR Algorithm
132
54 Experimental Results
141
55 Conclusion
146
BIBLIOGRAPHY
149
INDEX
157
Copyright

Other editions - View all

Common terms and phrases

Popular passages

Page 153 - Steele, Jr. Data optimization: Allocation of arrays to reduce communication on SIMD machines.
Page 152 - K. Gallivan, W. Jalby and D. Gannon, "On the Problem of Optimizing Data Transfers for Complex Memory Systems,
Page 154 - J. Ramanujam and P. Sadayappan, "CompileTime Techniques for Data Distribution in Distributed Memory Machines" , IEEE Transactions on Parallel and Distributed Systems, Vol.
Page 154 - CD Polychronopoulos. On Program Restructuring, Scheduling, and Communication for Parallel Processor Systems.
Page 150 - D. Callahan and K. Kennedy. Compiling programs for distributed-memory multiprocessors.
Page 153 - A performance comparison of the IBM RS/6000 and the Astronautics ZS-1", IEEE Computer, January 1991. [25] Smith WM, Abraham SG, Davidson ES, "The Effects of Memory Latency and Fine-Grain Parallelism on Astronautics ZS-1 Performance", Proceedings of the 23rd Annual Hawaii Intl.
Page 152 - M. Gupta and P. Banerjee. Automatic data partitioning on distributed memory multiprocessors.

Bibliographic information