NOTUR Emerging Technologies:
THE CLUSTER PROJECT -- 2003

NOTE: This web page/ report is a live document. I.e. comments may continually be added.


Old snapshot of IDI's
research cluster "ClustIs"

Our stand at NOTUR 2003, Oslo


Project Highlights/Table of Contents:

  • Project leader: Anne C. Elster
  • Participants: Researchers, students and staff at NTNU and The University of Tromsø. See list of people that was involved.
  • Time-line: November 2002-January 2004
  • Budget: NOK 1 million
  • Background
  • Project Goal
  • Project Description
  • Organization
  • List of Activities
  • Summary (including milestones and budget)
  • Links to project documents and reports .

    Background

    NOTUR (The Norwegian High Performance Computing Consortium) has a subprogram entitled Emerging Technologies (ET) that has now been split into two smaller programs:

    This Webpage is dedicated to the accomplishments within the subproject Cluster Technologies.

    Cluster technologies are here defined as the technologies that enable a group of independent computers (e.g. PCs, work stations, SMPs) to work together as single distributed memory systems. Since traditional special-purpose hardware for compute servers is generally considered much more expensive than building and scaling up cluster systems that use general purpose parts (with comparable amounts of memory and CPU power), cluster systems may be attractive as potential compute servers for future high performance (HPC) applications.

    Project Goal

    The goal of this project was to analyze cluster technologies' suitability for HPC in the context of NOTUR. The results will provide a foundation for decisions regarding future HPC programs.

    Project Description

    This project profiled and analyzed some of the most interesting NOTUR applications to see how well they may port to future compute-oriented clusters. We looked at various kinds of clusters (e.g. PCs, work stations, SMPs) and how usable each may be as dedicated application servers, potential display servers, etc, compared to a more traditional supercomputer system with high-bandwidth interconnects and a single system image (supercomputing systems that have shared-memory addressing).

    Some general issues considered:

    Evaluation of new algorithms and methods with respect to future resources as well as numerical testing of generic operations, were included. We also looked at cluster related tools, including furthering our current work on execution monitoring for clusters. Security, stability and operational cost issues was discussed.

    Based on our findings, we sought cooperation with relevant cluster activities in Norway and elsewhere where appropriate regarding, for instance, exchange of computer resources to get more diverse test beds. (E.g. NTNU exchanged cycles with Linköping). Ties to the Grid Technology program were also established.

    Organization

    This project was a collaboration between researchers and Computer Center personnel at NTNU and The University of Tromsø (UiT).
    See list of people that was involved.

    The project leader reported to the project leader of NOTUR.

    Activities

    This project included the following activities:

    A1 Profiling and tuning of selected applications

    By looking at how well selected HPC application which currently run on large SMPs port to current clusters, we will be able to make a prediction of how well future clusters may replace top-of-the line SMP compute facilities.

    A1.1 Physics and Chemistry (Protomol and PICs codes)

    Primary participants: Anne C. Elster and her students (Computer Science/NTNU)

    Budget: NOK 100.000 for summer student support.

    Paul Sack, a former undergraduate student of Elster at The University of Texas at Austin, ported a physics code to a cluster during the summer of 2002 as part of the precursor to this project. (His work was financed through the Computing Center (ITEA) at NTNU). His contribution led to a talk and a report that highlights some of the difficulties associated with porting such code to cluster systems:

    This activity was an extension of this work.

    The applications were selected due to their interest in the application community and access to code authors.

    Elster and her student Åsmund Østvold looked at porting Protomol, a molecular dynamics code currently running on our HPC systems, parallelized by colleagues in Bergen. Several challenges were uncovered, including the difficulty in porting a code that uses one-sided communication routines such as MPI put and get that relies on DMA (direct memory access) features, not present on cluster (at least not yet). Timing issues were also uncovered. Details of our findings can be found in Østvold's report "Porting of Protomol from SMP to a Computational Cluster". The report is in Norwegian only.

    See links to cluster project documents and reports.

    Elster and her students Snorre Boasson and Jan Christian Meyer also looked at a PIC (Particle-in-Cell) code, an electrostatic code that Elster wrote for an SMP machine that was rewritten using MPI for clusters as part of this project.

    See links to cluster project documents and reports.

    A preliminary report was presented at our stand at NOTUR 2003 in Oslo May 14-15, 2003.

    A1.2a Profiling and user analysis of Amber, Dalton and Gaussian

  • Primary participants: Tor Johansen and staff/students (Computing Center/UiT)
  • Budget: NOK 150.000 for staff support. NOTE: Less than NOK 50.000 spent due to staff constraints.

    The project expected that UiT staff have and/or will be porting the following applications from their current SMP system (Athelon) to their Itanium cluster as part of their current efforts on their cluster:

  • Amber -- a well-known molecular dynamics code
  • Dalton -- a Norwegian competitor to Gaussian (see also A1.2b)
  • Gaussian -- a much-used SMP application that currently scales only to 4 processors

    Due to staff time-constraints this activity limited itself to focusing on porting Amber as part of this project and a summary given in Norwegian below (no report in English provided).

    Gaussian proved to be a mistake (see below), and Dalton was basically covered by A1.2b.

    Results summarized in Norwegian by the activity leader:

    Som sitert i mailen du sendte så ble prosjektet fakturert for 120 tiner, til sammen kr. 43 200.- i 2003. Av dette var 80 timer for initiell porting, optimalisering og testing av Amber. 40 timer var for videre testign av Amber, der vi benyttet Scalis MPI (Scampi).

    Kort oppsummert så er var resultatene fra del 1 at Amber er en kode som er svært godt egnet til å kjøre på Snowstorm, så godt egnet at det vel nå ikke er noen som har fått kjøretid på Nana til Amber.

    Kort oppsummert så var resultatene av del 2 at Amber faktisk kjører 10-25 % raskere ved bruk av Scampi enn standard MPI.

    De prosentvise forbedringne osv har jeg ikke i hodet, men de vil jeg ikke plage Roy med å hente ut før etter workshopen :-) Men vi har f.eks. sett, uavhenig av hvilken MPI vi brukte, at en 4 CPUers Amber generelt kjører raskere på 2 stk. 2 CPU maskiner enn på en 4 CPU maskin. Men graden av dette varierer naturligvis med hvilke typer jobber som kjøres (det er så mange forskjellige typer jobber man kan kjøre med Amber at vi på langt nær har vært innom alle, vi fokuserte på noen typer som er mye brukt hos oss). Grunnen til dette er at forskjellige typer jobber utnytter bedre karakteristikkene til infrastrukturen i 2-vegs maskinene i forhold til 4-vegs maskinene. Typisk så trives I/O intensive jobber bedre på 2-vegs maskinene enn på 4-vegs maskinene.

    Når det gjelder Gaussian så var det rett og slett en tabbe å ta den med på listen: standard Gaussien kjører kun i parallel på SMPer. Skal man ha en klyngebasert versjon så må man også ha Linda = $$$$. Linda er rett og slett for dyrt til at vi mener at innkjøp kan forsvares (men Morten Hanshugen i Oslo nevnte faktisk rett før påske at prisen var i ferd med å synke).

    Derfor trodde vi at vi var lure når vi byttet om til den koden som kanskje er den nest mest brukte kjemikoden i Norge: ADF. Problemet er bare at denne har vi enda ikke fått til å kjøre enda på Itanium... Vi har lagt ned en god del arbeid i portingen, og stadig kommet lengre. Men vi har faktisk ikke en kjørende versjon den dag i dag. Dette er veldig synd for dette er en kode som i teorien skal fungere bra på klynger.... Uansett, den jobben vi har gjort på ADF har vi tatt på Metasenter delen av prosjektet, da selve portingen er noe vi må gjøre uansett (for Amber var selve portingen til en kjørende kode en minimal jobb, der var det optimalisering og testing det ble brukt tid på).

    This original goal of this activity was to include:

  • Gathering early-user benchmarks that compare previous and current SMP runs with Itanium cluster runs.
  • User profiles. How are these codes used (no. of processors, lengths of runs etc)?
  • Comparisons with results from HPC systems at UiB, NTNU, DNMI and Linköping where feasible

    A preliminary report was presented as a poster at our stand at NOTUR 2003. The final report was to include more in-depth analyses, including a total cost analysis of running a cluster system vs. a large SMP. This is highlighted in the Norwegian summary above.

    A1.2b Optimization and tool-analysis of a commercial application -- Dalton

    Participants: Otto J. Anshus and post doc/students (Computer Science/UiT)

    Budget: NOK 150.000 for post doc/students.

    This activity included using state-of-art optimization techniques for a port of a popular application to a compute cluster. We selected Dalton for this effort since we ere able to work directly with this Norwegian vendor. Dalton is also a competitor to Gaussian, a very popular user application at all Norwegian HPC sites.

    This activity commenced in spring 2003 with main results made available by NOTUR 2003.

    This activity lead to the following activities and reports, including time estimates for Otto Anshus (OA) and John Markus Bjørndalen (JMB):

    A2. Execution monitoring

    Participants: Tore Larsen, Otto Anshus and students (Computer Science/UiT)

    Budget: NOK 100.000 for student support.

    This activity extended the current work of the Distributed Systems Group at UiT on execution monitoring and tools for clusters. These efforts include a special focus on applicability to future NOTUR activities. A survey of current technologies in the field is included. The activity also included an analysis of what may be necessary for using this technology as a compute server for a display wall.

    This activity led to the following activities and reports, including time estimates for Otto Anshus (OA) and John Markus Bjørndalen (JMB):

    See links to cluster project documents and reports .

    A3. Visualization servers, etc.

    Participants: Anne Elster and her student co-supervised with Torbjørn Hallgren, Torbjørn Vik (Comp. Science/NTNU)

    Budget: NOK 100.000 for student support This activity looked at how suitable a specialized cluster may be as a compute engine for visualization and other related applications.

    This activity commenced in January 2003 and ran throughout the project.

    A short summary follows in Norwegian. Also ee Vik's report re. Chromium vs. SGI visualization hardware listed at the end of this report.

    To forskjellige typer bruk av cluster: * off-line (ikke sanntids rendering). Dette er ofte såkalte "renderingfarms" med en drøss med maskiner som alle jobber på hver sin frame av en større animasjon. Typisk brukt i filmindustri og alle andre områder der man ikke trenger interaktivitet og/eller sanntids oppdatering. Alle større 3D modelleringsprogrammer som Lightwave, 3DStudio, Maya har funksjonalitet for dette. * on-line (eller realtime). Mest interessant fra et teknologisk synspunkt. Resten av teksten handler om denne.

    Cluster brukes innenfor interaktiv visualiseringsprogramvare for å øke ytelsen, for å muliggjøre større datasett, for å unngå begrensninger i lokal hardware. De fleste visualiseringscluster fungerer prinsipielt ved at en bruker sitter på en klientmaskin som i seg selv ikke har noe særlig kapasitet. Clusteret tar seg av all beregning og sender bare de ferdige bildene til klienten. Klientmaskinen sørger også for å ta imot input fra bruker og sende disse til cluster. Datasett for slik visualisering er ofte svært store, og, avhengig av situasjonen, brukes både polygonbasert og voxelbasert rendering.

    Hovedproblemet med å få clusters brukbare innenfor interaktive visualiseringsprogram er forsinkelser pga nettverk. Dette er som oftest det verste problemet. Dette løses ved å redusere tiden som brukes for å overføre bilder mellom cluster og klient. Det kan enten løses ved å redusere datamengden (komprimeringsmetoder) eller øke nettverksytelsen. Eller begge.

    Parallelitet i selve clusteret baseres på uavhengighetsforhold mellom forskjellige data. Det kan være uavhengigheter mellom forskjellige deler i samme datasett, eller det kan være uavhengigheter mellom forskjellige frames i et 4D datasett. Load-balancing blir ofte et problem i slike sammenhenger og er et viktig forskningsområde. Hvordan metode som brukes for load-balancing er som oftest svært kontekstavhengig.

    See also links to cluster project documents and reports .

    NOTE: [added November 2004]

    "Rocks", an open source high performance Linux cluster solution that is used by many sites to manage cluster system software ( http://www.rocksclusters.org/Rocks/ ) has since (fall 2004) announced Chromium support for their next version. We agree with Kelly Gaither, Associate Director of TACC, The University of Texas at Austin's HPC center, and manager of their Scientific Visualization, that this product will be crucial to making cluster solutions for visualization viable for production systems. Many challenges remain regarding using clusters for visualization beyond one-to-one display wall use.

    A4. Impact of future numerical algorithms and methods

    Participants: Einar Rønquist (Mathematical Sciences/NTNU) and his student Staff.

    Budget: NOK 100.000 for student support. NOTE: Only around NOK 50K was used in this project due to lack of students available. The remaining funds were transferred to activity A1.

    This activity's goal was to evaluate the impact of future HPC technologies on some selected numerical algorithms and computational strategies. The examples included an evaluation of higher order methods for the numerical solution of partial differential equations. A recently proposed novel computational approach based on parallelization in time of numerical algorithms was also evaluated. Some of the activity will include numerical tests of generic operations.

    This activity was preformed during the summer of 2003.

    The results from this work is given in the following report: "The Parareal Algorithm -- A survey of present work"

    A5. Interface with NOTUR ET Grid Project

    Participants: Anne C. Elster and her student Robin Holtet as well as colleagues, staff and students associated with the GRID project

    Budget: NOK 50.000

    This activity focused on collaborating efforts with the ET-Grid project. A testgrid using a local was set up at NTNU as part of the GRID.

    We here looked at how our results impact current Grid efforts. In particular, we wanted to look at heterogeneous clusters since many of the performance issues with such clusters will relate strongly to applications spread over a computational grid.

    A6. Project administration

    Participant: Anne C. Elster (Computer Science/NTNU)

    This activity included all administration and coordination of the project, including status report and the final report.

    This activity commenced immediately and continued throughout the project. The activities included among others:

    In addition the project participants attended several national and international meetings promoting NOTUR and this project as well as increasing our own knowledge of the field significantly.

    Summary

    This project started organizing in late fall 2002. It ran through January 2004. Most of the projects with later deadlines have summer students involved.

    Overall the project provided at lot of insights related to cluster computing as an emerging HPC technology as ca be seen from the many conclusions and reports provided in this document.

    It is an active and evolving area for HPC that Norway needs to stay on top of in the future. We are therefore pleased to see that this project is continued as part of the NOTUR 2004 Competency projects.

    See NOTUR 2004 Competency Project -- GRID, Cluster and Storage for details re. NTNU's involvement in this follow-up project.

    People involved in the NOTUR ET Cluster Project 2003

    Follow-up Project for 2004: NOTUR Competency projects on GRID, Cluster and Storage Technologies

    Part of NTNU's gang on March 25, 2004


    The Cluster and Grid ET projects showed their success in that an expanded follow-up project was created. This 2004 project is now a joint effort by NTNU, Univ. of Bergen, Univ. of Oslo, Univ. of Tromsø and UNINETT. Statoil will also be participating.

    Elster helped raise NOK 1 million at NTNU for these projects, which made NTNU the largest partner. These funds are now matched by the Research Council of Norway (RCN) which also adds in NOK 450K for expanded storage HW.

    See NOTUR 2004 Competency Project -- GRID, Cluster and Storage for details re. NTNU's involvement in this follow-up project.


    Links to cluster project documents and reports

    The reports are listed according to their associated activity number (A1-A6)

    Note that overview and status presentations are found under A6.

    A1: Profiling and tuning of selected applications

    A1.1: Physics and Chemistry codes (Protomol and PIC)

  • Østvold's report on Protomol
  • Boasson's report on PIC
  • Jan Christian Meyer's report on PIC

    A1.2a: Profiling and user analysis of Amber, Dalton and Gaussian

    -- See summary in A.1.2a project description above

    A1.2b:

  • Otto Anshus, John Markus Bjørndalen and Lars Ailo Bongo:
  • Ytelsesmålinger gjort på Dalton
  • Survey of optimizing techniques for parallel programs running on computer clusters.
  • Configurable collective communication in LAM-MPI

    A2: Execution monitoring

  • Performance Monitoring.
  • Lars Ailo Bongo, Otto Anshus and John Markus Bjørndalen: "Using a virtual event space to understand parallel application communication behaviour", NIK 2003
  • Survey of execution monitoring tools for computer clusters
  • Lars Ailo Bongo, Otto Anshus and John Markus Bjørndalen: "Experiences Visualizing Multi-cluster Parallel Applications", presnted at Simula 2003 Workshop, Oslo
  • Lars Ailo Bongo, Otto Anshus, John Markus Bjørndalen and Brian Vinter: "Dynamically Adapting Communication Behavior of Parallel Applications", NOTUR 2003 poster
  • Lars Ailo Bongo, Otto Anshus and John Markus Bjørndalen: "EventSpace: Exposing and Observing Communication Behavior of Parallel Applications", presented at EuroPar 2003

    A3:Visualization servers, etc.

  • Summary by Torbjørn Vik with focus on Chromium

    A4: Impact of future numerical algorithms and methods

    A5: Interface with Grid project

    -- See project description summary.

    A6: Administration -- Overview and status presentations

  • Notur Overview (2002)

  • Original Project Description (MS Word .doc) from fall 2002

  • Presentation at RCN (NFR) meeting, June 2003. These slides were also presented at NOTUR 2003.

  • Presentation for NOTUR board -- project status October 2003


    Back to Anne C. Elster's Home Page


    This page is maintained by : elster-at-idi.ntnu.no

    It was last updated On November 18, 2004. Comments welcome.