This track contained several interesting talks that are summarized below, but not published in the ParCo Proceedings.
However, copies of the slides from several of these presentations
can be found on the web at this site:
http://www.idi.ntnu.no/~elster/eurogpu09
and eventually also at
http://www.eurogpu.org/eurogpu09
OpenCL, a new standard for GPU programming
(PDF of slides)
by Francois Bodin(Caps Enterprice) generated a lot of interest.
His presentation gave an overview of OpenCL for programming
Graphics Processing Units (GPUs). OpenCL is an initiative launched
by Apple to ensure application portability across various types of GPUs.
It aims at being an open standard (royalty free and vendor neutral)
developed by the Khronos OpenCL working group
(http://www.khronos.org).\
OpenCL which is based on the ISO C99,
shares many features with CUDA and exposes data and task parallelism.
Parallel Hybrid Computing
(PDF of slides)
by Stephane Bihan (Caps Entreprise, Rennes) presented HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a non-intrusive manner while preserving legacy codes.
Efficient Use of Hybrid Computing Clusters for Nanosciences
(PDF of slides)
by Lugi Genovese (ESFR, Grenoble), Matthieu Ospici (BULL, UJF/LIG, CEA, Grenoble), Jean Francois Mehaut (UJF/INRIA, Grenoble), and Thierry Deutsch (CEA, Grenoble) included their study of the programming and the utilization of hybrid clusters in the field of computational physics. These massively parallel computers are composed of a fast network (Infiniband) connecting classical nodes with multicore Intel processors and accelerators. In our case, the accelerators used are GPUs from NVIDIA. They first analyzed some ways to use with efficiency CPUs cores and GPUs together in a code (BiGDFT, http://inac.cea.fr/L_Sim/BigDFT\
) without hotspot routines. Starting from this analysis, they have designed a new library: S_GPU, used to share GPUs between the CPU cores of a node. The implementation and the usage of S_GPU was described. They then evaluated and compared performances between S_GPU and others approaches to share GPUs with CPUs. This performance evaluation was based on BigDFT, an ab-initio simulation software designed to take advantage of massively hybrid parallel clusters as the Titane cluster (CCRT). Their experiments was performed on both one hybrid node as well as on a large number of nodes of their hybrid cluster.
Cosmological Reionisation Powered by Multi-GPUs
(PDF of slides)
by Dominique Aubert (Universite de Strasbourg and Romain Teyssieer(CEA) took the simulated distribution of gas and stars in the early Universe and modelled the propagation of ionising radiation and its effect on the gas. This modeling will help to understand the radio observations in a near future and the impact of this first stellar light on the formation of galaxies. Their code explicitely solves a set of conservative equations on a fixed grid in a similar manner to hydrodynamics and it follows the evolution of a fluid made of photons. However due to typical velocities close to the speed of light, the stringent CFL condition implies that a very large number of timesteps must be computed, making the code intrinsically slow. However, they ported it to a GPU architecture using CUDA which accelerating their code by a factor close to 80. Furthermore, by using an MPI layer, they also expanded it to a multi-GPU version. CUDATON is currently running on 128 GPUs installed on the new CCRT calculator of the French atomic agency (CEA). The code is able to perform 60 000 timesteps on a 10243 grid in 2.5 hours (elapsed). For comparison, the largest calculation made so far on the same topic involved a 4003 grid and required 11 000 cores to be run. Such a boost in the performance demonstrates the relevance of multi-gpu calculations for computational cosmology. It also opens bright perspectives for a systematic exploration of the impact of the physical ingredients on high resolution simulations since the these calculations are extremely fast to complete.
Accelerating a Depth Imaging Seismic Application on GPUs:
Status and Perspectives by
(NO PDF -- abstract below only -- please contact author for more info)
Henri Calandra(TOTAL, Pau) described how the extraordinary challenge that the oil and gas industry must face for hydrocarbon exploration requires the development of leading edge technologies to recover an accurate representation of the subsurface. Seismic modeling and Reverse Time Migration (RTM) based on the full wave equation discretization, are tools of major importance since they give an accurate representation of complex wave propagation areas. Unfortunately, they are highly compute intensive. He first presented the challenges in O&G and the need in terms of computing power for solving seismic depth imaging problems. He then showed how GPUs can be part of the solution and the the solutions developed at TOTAL.
Debugging for GPUs with DDT,
(PDF of slides)
David Lecomber (Allinea Ltd, Bristol, UK) described how evelopers are experimenting with CUDA, OpenCL and others to port (or rewrite) their code to take advantage of this technology, but are discovering there is more to programming than writing code. Finding bugs and optimizing performance are essential tasks - particularly so with a new complex model of program execution. He reveiwed the state of play - exploring what is possible now, and what is being done by Allinea and others to improve the lot of GPU developers who need to debug or optimize their codes.