## High Performance Computing for Computational Science - VECPAR 2008: 8th International Conference, Toulouse, France, June 24-27, 2008. Revised Selected PapersJosé M. Laginha M. Palma, Patrick Amestoy, Michel Daydé, Marta Mattoso, Joao Correira Lopes VECPAR is an international conference series dedicated to the promotion and advancement of all aspects of high-performance computing for computational science, as an industrial technique and academic discipline, extending the fr- tier of both the state of the art and the state of practice. The audience and participants of VECPAR are researchers in academic departments, government laboratories and industrial organizations. There is now a permanent website for the conference series inhttp://vecpar.fe.up.pt where the history of the conference is described. The8theditionofVECPARwasorganizedinToulouse(France),June24–27, 2008. It was the third time the conference was celebrated outside Porto after Valencia (Spain) in2004andRio deJaneiro(Brazil)in 2006. Theconferenceprogrammeconsistedof6invitedtalksand53acceptedpapers out of 73 contributions that were initially submitted. The major themes are divided into: – Large-ScaleSimulationsandNumericalAlgorithmsinComputerScienceand Engineering (aerospace, earth, environment, ?nance, geoscience) – Computing in Healthcare and Biosciences – Multiscale and Multiphysics Problems – Cluster Computing and Grid Computing – Data-Centric and High-Productivity Computing – Cooperative Engineering – Problem-Solving Environments – Imaging and Graphics Twoworkshops,inadditiontotutorials,wereorganizedbeforetheconference: HPDGrid 2008—InternationalWorkshoponHigh-PerformanceDataMana- ment in Grid Environments on June 24 and Sparse Days at CERFACS on June 23 and 24. The most signi?cant contributions are made available in the present book, edited after the conference and after a second review of all orally presented papers at VECPAR 2008 and at the workshops. |

### What people are saying - Write a review

We haven't found any reviews in the usual places.

### Contents

An Overview of High Performance Computing and Challenges for the Future | 1 |

Parallelization of SphereDecoding Methods | 2 |

Solver Using Optimized Libraries and Parallel Computation | 13 |

Parallelisation of the CFD Code of a CFDNWP Coupled System for the Simulation of Atmospheric Flows over Complex Terrain | 27 |

Parallelization of the Hamiltonian MatrixVector Multiplication | 39 |

Application to Linear Solvers | 46 |

The Rise of the Commodity Vectors | 53 |

MOPS A Morphodynamical Prediction System on Cluster Computers | 63 |

Design Tuning and Evaluation of Parallel Multilevel ILU Preconditioners | 314 |

On the IO Volume in OutofCore Multifrontal Methods with a Flexible Allocation Scheme | 328 |

Parallel Eigensolvers for a Discretized Radiative Transfer Problem | 336 |

High Performance Computing and the Progress of Weather and Climate Forecasting | 349 |

Asteroid Based upon an Unstructured MPI SpectralElement Method Blocking and Nonblocking Communication Strategies | 350 |

A Simulation of Seismic Wave Propagation at High Resolution in the Inner Core of the Earth on 2166 Processors of MareNostrum | 364 |

Using a Global Parameter for Gaussian Affinity Matrices in Spectral Clustering | 378 |

Comparing Some Methods and Preconditioners for Hydropower Plant Flow Simulations | 391 |

Implementing a Parallel NetCDF Interface for Seamless Remote IO Using Multidimensional Data | 69 |

Vectorized AES Core for Highthroughput Secure Environments | 83 |

A Matrix Inversion Method with YMLOmniRPC on a Large Scale Platform | 95 |

Can We Connect Existing Production Grids into a World Wide Grid? | 109 |

A List Scheduling Algorithm for Scheduling Multiuser Jobs on Clusters | 123 |

Data Locality Aware Strategy for TwoPhase Collective IO | 137 |

A GridAware Web Portal with Advanced Service Trading for Linear Algebra Calculations | 150 |

Resource Matching in Nondedicated Multicluster Environments | 160 |

A Parallel Incremental Learning Algorithm for Neural Networks with Fault Tolerance | 174 |

HighPerformance Query Processing of a RealWorld OLAP Database with ParGRES | 188 |

Improving Search Engines Performance on Multithreading Processors | 201 |

Accomplishments and Challenges in Code Development for Parallel and Multimechanics Simulations | 214 |

An AlgorithmbyBlocks for SuperMatrix Band Cholesky Factorization | 228 |

for Detecting the Global Convergence in Asynchronous Iterative Algorithms | 240 |

A Parallel Implementation of the Trace Minimization Eigensolver | 255 |

A Load Balancing Knapsack Algorithm for Parallel Fuzzy cMeans Cluster Analysis | 269 |

Scalable Parallel 3d FFTs for Electronic Structure Codes | 280 |

Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms | 287 |

A Parallel Matrix Scaling Algorithm | 301 |

Computational Models of the Human Body for Medical Image Analysis | 405 |

Attaining High Performance in GeneralPurpose Computations on Current Graphics Processors | 406 |

Optimised Computational Functional Imaging for Arteries | 420 |

Memory Locality Exploitation Strategies for FFT on the CUDA Architecture | 430 |

Large Eddy Simulation of Combustion on Massively Parallel Machines | 444 |

Variational Multiscale LES and Hybrid RANSLES Parallel Simulation of Complex Unsteady Flows | 465 |

Vortex Methods for Massively Parallel Computer Architectures | 479 |

On the Implementation of Boundary Element Engineering Codes on the Cell Broadband Engine | 490 |

Open Problems and New Issues from the P2P Perspective | 505 |

Data Management Concerns in a Pervasive Grid | 506 |

Distributed Transaction Routing in a Large Scale Network | 521 |

An Efficient FineGrain Data Access Scheme | 532 |

BLAST Distributed Execution on Partitioned Databases with Primary Fragments | 544 |

Testing Architectures for Large Scale Systems | 555 |

Data Replication and the Storage Capacity of Data Grids | 567 |

Text Mining Grid Services for Multiple Environments | 576 |

Using Stemming Algorithms on a Grid Environment | 588 |

594 | |

### Common terms and phrases

algorithm applications approach architecture bandwidth Berlin Heidelberg 2008 BLAS block cache Cholesky factorization cluster coeﬃcients combustion communication convergence core database deﬁned diagonal diﬀerent distributed domain dynamic eﬀect eﬃcient eigenvalues elements environment equations evaluation execution factorization ﬁeld ﬁgure ﬁle ﬁrst ﬂow function global Grid Computing IEEE implementation input interface iterations J.M.L.M. Palma LAPACK linear algebra linear systems LNCS load balancing matrix memory mesh metadata method middleware Multicluster multiple multithreaded nodes number of processors Nvidia obtained OLAP OmniRPC OpenMP operations optimal P-Grid parallel computing parameters ParGRES partition performance preconditioner problem programming proposed provides query replication scalability ScaLAPACK scale scheduling Schur complement scientiﬁc server signiﬁcant SIMD simulation solution solve solver sparse sparse matrix speciﬁc speedup storage strategy synchronization Table tasks techniques testers text mining threads tion turbulent VECPAR vector veriﬁed