High-Performance Computing: 6th International Symposium, ISHPC 2005, Nara, Japan, September 7-9, 2005, First International Workshop on Advance Low Power Systems, ALPS 2006, Revised Selected PapersThis is the joint post-proceedings of the 6th International Symposium on High Performance Computing (ISHPC-VI) and the First International Workshop on Advanced Low Power Systems 2006 (ALPS2006). The post-proceedings also contain the papers presented at the Second HPF International Workshop: - periences and Progress (HiWEP2005) and the Workshop on Applications for PetaFLOPS Computing (APC2005), which are workshops of ISHPC-VI. ISHPC-VI, HiWEP2005 and APC2005 were held in Nara, Japan during September 7–9, 2005. Fifty-eight papers from 11 countries were submitted to ISHPC-VI. After the reviews of the submitted papers, the ISHPC-VI Program Committee selected 15 regular (12-page) papers for oral presentation. In ad- tion, several other papers with favorable reviews were recommended for poster presentation, and 14 short (8-page) papers were also selected. Twenty-eight papers out of 29 ISHPC-VI papers are contained in the post-proceedings. Hi- WEP2005 and APC2005 received eight and ten submissions, with six and eight papers being accepted for oral presentation after reviews, respectively. All the HiWEP2005 and APC2005 papers are included in the post-proceedings. ALPS2006 was held in Cairns, Australia on July 1, 2006 in conjunction with the ACM 20th International Conference on Supercomputing. The number of submitted papers was 15, and eight papers were accepted for oral presentation. The post-proceedings contain six of the eight papers. |
Contents
Multiple Stream Prediction | 1 |
A Compiler Technique for Transforming Nonuniform Iteration Spaces | 17 |
Folding Active List for High Performance and Low Power | 33 |
Reducing Misspeculation Penalty in TraceLevel Speculative Multithreaded Architectures | 43 |
Exploiting Execution Locality with a Decoupled KiloInstruction Processor | 56 |
Decoupled StateExecute Architecture | 68 |
A Scalable Methodology for Computing FaultFree Paths in InfiniBand Torus Networks | 79 |
Using a Way Cache to Improve Performance of SetAssociative Caches | 93 |
Development of an Interactive Visual Data Mining System for Atmospheric Science | 279 |
A Calculus Effectively Performing Event Formation with Visualization | 287 |
A Similarity Evaluation Method for VolumeData Sets by Using Critical Point Graph | 295 |
Hybrid Parallelization and Flat Parallelization inHPF High Performance Fortran | 305 |
Mapping Normalization Technique on the HPF Compiler fhpf | 315 |
Development of Electromagnetic ParticleSimulation Code in an Open System | 329 |
Development of ThreeDimensional Neoclassical Transport Simulation Code with High Performance Fortran on a VectorParallel Computer | 344 |
Distributed Parallelization of Exact Charge Conservative Particle Simulation Code by High Performance Fortran | 358 |
Design of Fast Collective Communication Functions on Clustered Workstations with Ethernet and Myrinet | 105 |
Dynamic Load Balancing in MPI Jobs | 117 |
Workload Characterization of Stateful Networking Applications | 130 |
Using Recursion to Boost ATLASs Performance | 142 |
Towards Generic Solver of Combinatorial Optimization Problems with Autonomous Agents in P2P Networks | 152 |
New Evaluation Index of Incomplete Cholesky Preconditioning Effect | 164 |
A Topological Approach to Visual Exploration of TimeVarying Volume Data | 176 |
CrossLine A Globally Adaptive Control Method of Interconnection Network | 191 |
The Bandwidth Expansion Effectiveness of Cache Levels Block Prefetch | 199 |
Implementation and Evaluation of the Mechanisms for Low Latency Communication on DIMMnet2 | 211 |
Computationally Efficient Parallel MatrixMatrix Multiplication on the Torus | 219 |
A New Dynamic Load Balancing Technique for Parallel Modified PrefixSpan with Distributed Worker Paradigm and Its Performance Evaluation | 227 |
PerformanceBased Loop Scheduling on Grid Environments | 238 |
Reconfigurable Middleware for Grid Environment | 246 |
An Enhanced StreamBased Communication Mechanism | 254 |
Performance of Coupled Parallel Finite Element Analysis in Grid Computing Environment | 262 |
PhotoRealistic Visualization for the Blast Wave of TNT Explosion by GridBased Rendering | 271 |
Pipelined Parallelization in HPF Programs on the Earth Simulator | 365 |
Sampling of Protein Conformations withComputers to Predict the Native Structure | 374 |
Spacecraft Plasma Environment Analysis ViaLarge Scale 3D Plasma Particle Simulation | 383 |
PetaFLOPS Computing and ComputationalNanotechnology on Industrial Issues | 393 |
Exact Diagonalization for Ultra Largescale Hamiltonian Matrix | 402 |
Numerical Simulation of Combustion Dynamicsat ISTAJAXA | 414 |
Realization of a Computer SimulationEnvironment Based on ITBL and a Large ScaleGW Calculation Performed on This Platform | 427 |
Computations of Global Seismic WavePropagation in Three Dimensional Earth Model | 434 |
Lattice QCD Simulations as an HPC Challenge | 444 |
EnergyEfficient Embedded System Design at90nm and Below A SystemLevel Perspective | 452 |
Empirical Study for Optimization ofPowerPerformance with OnChip Memory | 466 |
Performance Evaluation of Compiler ControlledPower Saving Scheme | 480 |
Program Phase Detection BasedDynamic Control Mechanismsfor Pipeline Stage Unification Adoption | 494 |
Reducing Energy in Instruction Caches by UsingMultiple Line Buffers with Prediction | 508 |
522 | |
Other editions - View all
Common terms and phrases
active list algorithm analysis applications array average benchmarks Berlin Heidelberg 2008 block branch predictor cache architecture cache miss calculation communication compiler component configuration cycles data transfer distributed dynamic Earth Simulator efficient energy consumption environment evaluation execution fetch Figure flame Fortran function grid computing High Performance High Performance Fortran HPF processes IEEE implementation InfiniBand interface ISHPC iteration space Japan L1 cache Labarta latency lattice QCD line buffer LNCS load balancing loop mapping matrix method Microarchitecture MPICH multiple Netfiles nodes on-chip memory optimal packet paper Parallel Computing parameters particles PC cluster pipeline plasma prediction prefetch Proc processor proposed reduce register file routing Sato Eds scheduling scheme Section selected sequence shown shows SO-DIMM spacecraft Springer-Verlag Berlin Heidelberg stream predictor structure subroutines Supercomputing superscalar switches task technique tion Torus trace unification degree values vector visualization worker process workload