Analysis and Performance of Computer Instruction Sets

Front Cover
Stanford Linear Accelerator Center, Stanford University, 1978 - Computer architecture - 167 pages
 

What people are saying - Write a review

We haven't found any reviews in the usual places.

Other editions - View all

Common terms and phrases

Popular passages

Page 48 - The results presented here are derived from the analysis of seven benchmark jobs written at SLAC. Except for one (LINSY2) they were all production jobs written for purposes other than performance evaluation. To avoid biasing the results with artifacts from specific languages or programs, we purposely chose the three most used language compilers and programs compiled by them. (1) FORTC is a compilation by the IBM Fortran-H optimizing compiler. (2) FORTGO is the execution of the FORTRAN program compiled...
Page 31 - ... 20%). Wherever possible, the number of I/O operations was reduced by increasing the file blocking factors, but we did not otherwise alter the operation of the programs. Despite this effort, the SVC time correction remained the factor which introduced the largest error in the measurements. We also added a FORTRAN numerical analysis program from which the I/O parts were excised, so that few supervisor services were requested. Since supervisor-state and user-state instructions share the same cache,...
Page 49 - Model validation Verification basically consists of comparing the time predicted by our model for each benchmark job with the corrected real execution time. The time predicted for each benchmark, Tpred, consists of the following terms: Tins, the total time predicted from the timing formulas, which does not include the cache miss penalty. M * Tmiss, where M is the number of cache misses as reported by the cache simulator, and Tmiss is the cache miss penalty. The number of cache misses includes the...
Page 36 - ... of non-linear formulas are sufficiently infrequent to justify this special treatment, but the effect on timing values is too important to ignore them. A simpler approach would assume that the product of the averages is a sufficient estimate of the average product, but the potential error is great. The formulas are encoded as a string of records, each corresponding to the coefficient of a term in a subcase of a timing formula for a particular instruction; there are a total of 3200 variable names...
Page 26 - ... of the trace data. One standard solution is the use of samples rather than complete traces, but then the verification of the predicted CPU time is nearly impossible. Since the timing formulas do not include the effects of cache memory misses, the cache memory is simulated for each machine. The cache penalty is added to the instruction execution time to obtain the expected program execution time. To verify the model the expected time is compared to the operating system accounting time corrected...
Page 88 - ... can be composed, as illustrated by the byte instructions of the PDP-10. An immediate improvement could be obtained if compilers were to replace these instructions by faster equivalents when they are available, but this would require tailoring the compilers to specific models of the computer series. Cache Effects The correction due to cache misses ranges from 1% to 5% for IBM, but from 3% to 19% for Amdahl, indicating that the memory subsystem is a major bottleneck for the Amdahl machine. In some...
Page 30 - ... the time penalty for the misses is too large to be neglected. If the miss ratio is 5%, with a 480 nsec penalty for a miss, 2 memory requests per instruction, and an average instruction execution time of 300 nsec (reasonable values for the 370/168) then the time for the cache misses represents 16% of the execution time. Two other cache organization features must be considered in the cache penalty correction. For IBM, stores always access main memory ( "store-through" ) which may cause extra delays....
Page 84 - ... for both operands in the range 1 to 256. One of the characteristics of these instructions that makes their implementation very difficult is that overlapped operands are allowed and must be treated a byte at a time. This allows, for example, a single byte to be propagated throughout a string by a move instruction whose destination address is one greater than the source address, since the fields are processed left to right. Lower performance machines in the 370 family implement these instructions...
Page 83 - ... among the most frequent, they contribute much more to the CPU time than their frequency would suggest because of their long execution time. For the FORTGO program for example, the 0.67% of instructions which are STM account for 6.66% of the IBM execution time and 4.59% of the Amdahl execution time. Character Instructions. The second group of storage-to-storage (SS) Instructions are those which specify a source and destination location for a character string and a single length for both operands...
Page 35 - ... linear in their variables. Typical examples are the decimal arithmetic instructions, where the duration depends on the product of the lengths or the average value of the digits used. For these we compute the appropriate products of variables at the time the program is analyzed, and average these values for use by the other programs in an equivalent linear form. These cases of non-linear formulas are sufficiently infrequent to justify this special treatment, but the effect on timing values is...

Bibliographic information